How to get merged regions using apache POI’s event API?

How can I get merged regions (merged cells) of an excel sheet using the event API provided by apache POI? Using the “traditional” DOM-like parsing style there are methods called Sheet.getNumMergedRegions() and Sheet.getMergedRegion(int). Unfortunately I need to handle huge Excel files where I get out of memory errors even with the highest Xmx-value I am allowed to use (in this project). So I’d like to use the event API, but wasn’t able to find out how to get information about merged regions, which I need to know to “understand” the content correctly…

Using the example given here: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api I get events for each single cell of a merged region (only the first of them contains any textual content though). So maybe, if there isn’t a more direct way, it would help to know how those merged cells could be (safely) distinguished from other (empty) cells…

Thanks in advance

How to get nearby POIs

I have a database with Points of Interest that all have an address. I want to know what is the method/name/call to get all nearby POIs from a given position. I understand that I need to convert all my



extra some records in the excel using apache pois

I am using the apache poi to handle the .xlsx file. I have two .xlsx files, part.xlsx,and full.xlsx, they own the same structure. Each record(The Row object in the poi) have three colmn:name,age,locat

Android: how to get POIs in Google maps?

I’m building an Android app that uses GoogleMaps; is there a way – through Google APIs, i believe – to get a list of all the POIs near a certain location? For example, i am in position x:y and would l

How to get the row count in a excel sheet with Apache POI with Event API

I am using Apache POI Event API to read large excel worksheets. The whole operation is lenghy, becouse I have to put all the data to database. I would like to display progress of import, ie Processing

How to read specific rows using Apache POI Event API?

I want to read large xls or xlsx file (about more than 30 MB and having 70,000+ rows). I was able to read small excel files using Apache POI eaily until I get an OutOfMemory error. Performance and mem

How to get the Slide number using java via Apache POI API

How do I get the number of slides in a .ppt file using java. To access the .ppt we require the Apache POI API – especially the Slide[] class. I’m using the method getSlideNumber() from here to retriev

Adding border to merged cell using HSSFRegionUtil (Apache POI)

I’m using Apache POI and found that when I add border to the merged cells using HSSFRegionUtil , border for only one cell is appeared. Following is the code Region region = new Region((short)0,(short

API to get metrics for HBase regions

I’d like to get the region metrics for all the regions on a particular region server. In particular, I’m after the readRequestsCount and writeRequestsCount. I see those values in the region server web

Google Places API – closed POIs?

Google Maps seems to have information about whether a place is closed (permanently closed) or opened, but it seems like this information is not available in Google Places API. Does anyone know if it i

Adding event clicks to POIs in Android Augmented Reality framework?

I’m developing an application based on Android AR framework : http://code.google.com/p/android-augment-reality-framework Can anyone please suggest how to add click to the POIs or icons in camera view

Answers

I don’t know for sure where merged cell info gets stored, but I’m fairly sure it won’t be with the cell data itself, as that’s not the Excel way.

What I’d suggest you do is create a simple file without merged cells. Then, take a copy, and add a single merged cell. Unzip both of these (.xlsx is a zip of xml files), and diff them. That’ll show you quite quickly what gets set to mark cells as merged. (My hunch is that it’ll be somewhere in the sheet settings, near the start but not near the cell values, BICBW)

Once you know where the merged cell details live, you can take a look at the XSSF UserModel code for working with merged cells to get an idea of how they work, how they’re manipulated, what the options are etc. With that in mind, you can look at the file format docs for the full details, but those can be a bit heavy and detailed to go to first. Finally, you can add in your code to use the merged info details, once you know where to get it from!

You need to open stream and parse it twice.

First time – to extract merged cells. They are appears in the sheet…xml file after <sheetData>…</sheetData> tag, like in this example:

...
< /sheetData >
< mergeCells count="2" >
    < mergeCell ref="A2:C2"/ >
    < mergeCell ref="A3:A7"/ >
 </mergeCells >

Extract that and keep in some List.

Then reopen the stream again and parse it as usual, to extract rows and cells. In the endElement(…) method when finishing every row, check if this row appears (partially or completely) in the merged region.

To expand on Mike’s answer. You can create a ContentHandler to locate Merge Regions like:

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.ss.util.CellRangeAddress;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class MergedRegionLocator extends DefaultHandler {
    private final List<CellRangeAddress> mergedRegions = new ArrayList<>();

    @Override
    public void startElement (String uri, String localName, String name, Attributes attributes) {
        if ("mergeCell".equals(name) && attributes.getValue("ref") != null) {
            mergedRegions.add(CellRangeAddress.valueOf(attributes.getValue("ref")));
        }
    }

    public CellRangeAddress getMergedRegion (int index) {
        return mergedRegions.get(index);
    }

    public List<CellRangeAddress> getMergedRegions () {
        return mergedRegions;
    }
}

An example of using it with POIs Event-Based parsing:

OPCPackage pkg = OPCPackage.open(new FileInputStream("test.xlsx"));
XSSFReader reader = new XSSFReader(pkg);
InputStream sheetData = reader.getSheetsData().next();

MergedRegionLocator mergedRegionLocator = new MergedRegionLocator();
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(mergedRegionLocator);
parser.parse(new InputSource(sheetData));

mergedRegionLocator.getMergedRegions();