Before we tell you why one should use Apache POI, let us go through its quick overview. Apache POI is an open source API using which Java developers can work with Microsoft Office files. They can read, edit, or create new MS Office files like a word doc, an excel sheet, a power point presentation, etc. using Apache POI.
Developed by Apache Software, it provides developers the capability to work with MS Office files based on the Office Open XML Standards (OOXML) and OLE2, which is called as Microsoft’s OLE 2 Compound Document format.
It contains several classes that the developer can use to read and decode the input data into MS office files. Based on the input and the classes used by the developer, Apache POI can be used for any Microsoft office file. It provides APIs for different file systems to be used based on the requirement. For example, the POIFS is available for the OLE2 File system and HPSF is for OLE2 Document properties. Similarly, XSSF, HSMF, HPBF, and other APIs are available, which can be used as per the requirements.
Why one should use Apache POI?
Apache POI can be used in different ways; however, one of the major uses of this API is in the applications that perform the job of text extraction. Some of these applications fall under web spiders, CMS, and Index builders. These applications extract text from the provided source to complete the further process.
As the major use of Apache POI APIs in the text extraction applications, let us see where to use which API:
- Use of POIFS: This API should be used when the document you want to read has been written in OLE 2 Compound Document Format. The document may have been written in MFC. Use POIFS to read the document. Not only POIFS is used to read the document, but it also provides the possibility to write OLE 2 based document.
- Use of HSSF: You can use HSSF to work with XLS file of MS Office using Java. With HSSF, you can read an excel file. You can also modify the file. And, if required to write the file, it is also possible.
- Use of XSSF: You can use XSSF to work with the XLSX format of MS Excel. And, if you will combine and use HSSF and XSSF, then you will be able to work with both XLS and XLSX files.
- Use of HSMF: You can use HSMF to work with MS Outlook using Java.