Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mandatoryCell behavior for opened sheets #311

Closed
virtual-machinist opened this issue Mar 5, 2024 · 12 comments
Closed

Improve mandatoryCell behavior for opened sheets #311

virtual-machinist opened this issue Mar 5, 2024 · 12 comments
Labels

Comments

@virtual-machinist
Copy link
Contributor

The mandatoryCell feature works rather strangely:

  • does detect missing cells for XLS files
  • does not detect missing cells for XLSX files
  • does not detect blank cells (i.e. without data but with changed format) for XLSX sheets (i.e. when already opened as a XSSFSheet instance)
  • does detect missing cells for XLSX sheets
  • does not detect blank cells for XLS files

The first three cases are documented, i.e. I shouldn't expect the parser to work correctly for XLSX format, however the last two aren't.

From layman's point of view there isn't too much difference between blank and missing cells, meaning nobody will re-examine if the contents of the cell were cleared together with or without formatting. Thus, it makes sense to expect the same com.poiji.exception.PoijiMultiRowException for both blank and missing cells in an XLS workbook that the feature applies to.

Seeing that XSSFSheet is actually parsed the same way as HSSFSheet using com.poiji.bind.mapping.SheetUnmarshaller we can at least try to fix/improve behavior for opened sheets. XLSX files and streams use a different unmarshaller and mandatoryCell check isn't done there in any way, i.e. adding this feature there may be out of scope for this issue.

I've added test cases to my missing-cell-ignored branch. The excel files basically have the same input as the constructed sheet in com.poiji.deserialize.MandatoryCellsExceptionTest#createDummyExcel. The old XLS format was derived from the XLSX using the ssconvert utility.

@virtual-machinist
Copy link
Contributor Author

virtual-machinist commented Mar 5, 2024

I believe the place we should be looking at is somewhere near com.poiji.bind.mapping.HSSFUnmarshaller#constructTypeValue:

Cell cell = currentRow.getCell(annotationDetail.getColumn());

if (cell != null) {
...
} else if (annotationDetail.isMandatoryCell()) {
    // this is never reached, since the cell is not null
    throw new PoijiRowSpecificException(annotationDetail.getColumnName(), field.getName(), currentRow.getRowNum());
}

I suggest adding a check if the cell is blank but mandatory before casting the value using com.poiji.config.Casting#castValue and assigning it to the target field or doing the casting and then checking if the result is null.

At the moment I have a workaround using the latter option via an overridden DefaultCasting that checks if the target field has an annotation with mandatoryCell set to true, but it feels clunky and won't work without .preferNullOverDefault(true).

Copy link

stale bot commented Mar 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the bot label Mar 17, 2024
@ozlerhakan
Copy link
Owner

Hi @virtual-machinist !

Sorry I got busy with other tasks. I'm open to any suggestion based on your findings for this issue.

@stale stale bot removed the bot label Mar 17, 2024
@virtual-machinist
Copy link
Contributor Author

@ozlerhakan same here.

One thing I found was that doing cell.getCellType() != CellType.BLANK check in addition to cell != null check pretty much solves the problem of detecting blank mandatory cells. But haven't had the time to test if this breaks anything else.

@virtual-machinist
Copy link
Contributor Author

virtual-machinist commented Mar 18, 2024

@ozlerhakan , I've opened a PR with this change, and also added a test case based on the previous one where the cell was simply null. But I can also merge my test cases from the missing-cell-ignored branch. There only one fails that is related to XLSX file parsing that is not done using the same code, as described in the docs.

However, the test modifies XLSX files every time it's run (i.e. I see the file has been changed in git log after running tests). Can this be a problem?

@ozlerhakan
Copy link
Owner

Thanks @virtual-machinist !

However, the test modifies XLSX files every time it's run (i.e. I see the file has been changed in git log after running tests). Can this be a problem?

I'd say that we shouldn't modify an excel, do you think you can find a fix for that?

@virtual-machinist
Copy link
Contributor Author

@ozlerhakan , had a look at the problem. Apparently I forgot that I have to open the workbook in read-only mode for the tests. This was no issue when I used an InputStream instead of a File, as I don't have file-based I/O.

Also while investigating this problem I saw a better workaround for the whole issue by using the org.apache.poi.ss.usermodel.Workbook#setMissingCellPolicy. If I set the missing cell policy to Row.MissingCellPolicy.RETURN_BLANK_AS_NULL, then no modification of com.poiji.bind.mapping.HSSFUnmarshaller#constructTypeValue is needed.

This however cannot be done if using the com.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>, com.poiji.option.PoijiOptions) or com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, com.poiji.option.PoijiOptions) methods for XLS files, as there's no way to specify this cell policy.

The missing cell policy option looks better to me, I think we should document this for opened sheets (i.e. when using com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class<T>, com.poiji.option.PoijiOptions)) and modify com.poiji.bind.mapping.PoijiWorkBook#workbook appropriately.

OR expose org.apache.poi.ss.usermodel.Row.MissingCellPolicy setting in com.poiji.option.PoijiOptions. But this won't work too good when opening sheets, as I may have set it already in the workbook.

What do you think?

@ozlerhakan
Copy link
Owner

This was no issue when I used an InputStream instead of a File, as I don't have file-based I/O.

Cool!

The missing cell policy option looks better to me, I think we should document this for opened sheets (i.e. when using com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class, com.poiji.option.PoijiOptions)) and modify com.poiji.bind.mapping.PoijiWorkBook#workbook appropriately.

There are indeed by design different constraints between XLS and XLSX approaches. Probably, this is also one of the challenges among them. I'd choose the first option where we can have support this by default and document it explicitly.

@virtual-machinist
Copy link
Contributor Author

Did a second attempt at this. Added test cases for both XLS stream and file, and also one for opened XLSX sheet, since mandatory cell check works there as well.

@ozlerhakan
Copy link
Owner

Thanks @virtual-machinist , I'll plan to merge it with a new release.

Copy link

stale bot commented Apr 22, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the bot label Apr 22, 2024
@ozlerhakan
Copy link
Owner

Hey @virtual-machinist , I've just merged the changes, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants