Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thousands of exceptions when parsing a geometry object #2013

Open
lfcnassif opened this issue Dec 5, 2023 · 8 comments
Open

Thousands of exceptions when parsing a geometry object #2013

lfcnassif opened this issue Dec 5, 2023 · 8 comments
Assignees

Comments

@lfcnassif
Copy link
Member

Processing a 150GB UFDR with master, 50950 exceptions like below were printed in the processing log:

java.lang.IllegalArgumentException: Points of LinearRing do not form a closed linestring
	at com.vividsolutions.jts.geom.LinearRing.validateConstruction(LinearRing.java:111)
	at com.vividsolutions.jts.geom.LinearRing.<init>(LinearRing.java:106)
	at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:355)
	at com.vividsolutions.jts.geom.GeometryFactory.createLinearRing(GeometryFactory.java:342)
	at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:236)
	at iped.geo.parsers.kmlstore.KMLParser.parseGeometry(KMLParser.java:256)
	at iped.geo.parsers.kmlstore.KMLParser.parsePlacemark(KMLParser.java:157)
	at iped.geo.parsers.kmlstore.KMLParser.parse(KMLParser.java:64)
	at iped.geo.parsers.kmlstore.KMLFeatureListFactory.parseFeatureList(KMLFeatureListFactory.java:12)
	at iped.geo.parsers.GeofileParser.parse(GeofileParser.java:76)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at iped.parsers.standard.StandardParser.parse(StandardParser.java:245)
	at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Not sure if it is a bug or just a corrupted geometry object, at least I think above logging should be less verbose.

@patrickdalla
Copy link
Collaborator

patrickdalla commented Dec 7, 2023 via email

@lfcnassif
Copy link
Member Author

Sure, I'll try to find a triggering file tomorrow.

@lfcnassif lfcnassif added the bug label Mar 9, 2024
@lfcnassif
Copy link
Member Author

Hi @patrickdalla, I found the case triggering this, just sent the KML samples to you by Teams.

@patrickdalla
Copy link
Collaborator

Many of the exceptions are of type "Points of LinearRing do not form a closed linestring".
This, according to https://gis.stackexchange.com/questions/93946/getting-points-of-linearring-do-not-form-a-closed-linestring, is a source syntax error. Some GIS, like postgis extension to postgres, offer methods to solve some of this inconsistencies (https://postgis.net/docs/ST_MakeValid.html).

So, I suggest 2 options (or both):

  1. We can try similar recover method. In the case, o unclosed linestring, we can repeat the first coord at the end of the linestring and "close it".
  2. We can group all these exceptions and inform about them in a more resumed form.

I think the second is necessary. What about the first? Should IPED try to recover it?
@lfcnassif

@patrickdalla
Copy link
Collaborator

patrickdalla commented May 16, 2024

I saw that in some files were defined "linearring" 's with 2 coords only, what leads to another exception, that a linearring needs at least 4 coords (even if we add the first coord as the last). So another solution would to create "linestring" objects from these invalid linearring entries.

Does it seem a good method to bypass these invalid entries?

@patrickdalla
Copy link
Collaborator

patrickdalla commented May 16, 2024

Maybe the 1 option (repeat the first coord as the last) is not good. Look at a sample. I think the best option is always change the type to "linearstring" when a non closed "linearring" is defined.
image
The above object has the description "rio corrego", so it seems to map a river. Closing it does not represent the river. So the best option is to represent it as "linearstring" although defined as "linearring".

patrickdalla added a commit that referenced this issue May 16, 2024
represents unclosed linearring features as linearstring.
@patrickdalla
Copy link
Collaborator

Some other syntax/semantics errors to bypass or correct:

  1. placemarks coordinates without content
  2. placemarks with no content
  3. references to gx tags without xml namespace declaration
  4. xml UTF-8 encoding declared but windows-1252 used
  5. Document tag inside other document tag

patrickdalla added a commit that referenced this issue May 16, 2024
Some other syntax/semantics errors to bypass or correct:

placemarks coordinates without content
placemarks with no content
references to gx tags without xml namespace declaration
xml UTF-8 encoding declared but windows-1252 used
Document tag inside other document tag
@lfcnassif
Copy link
Member Author

lfcnassif commented May 16, 2024

I think the second is necessary. What about the first? Should IPED try to recover it?
@lfcnassif

I agree to 2. Closing an open linear ring could lead to wrong conclusions, as you noticed. If converting them to a "linear string" is simple and don't take too much processing time, that's seems a good approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants