Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the GeoJSON files written by our QuPath images #390

Merged
merged 2 commits into from
Aug 24, 2023
Merged

Conversation

tomp
Copy link
Contributor

@tomp tomp commented Aug 15, 2023

Modify the code that runs stardist cell detections to produce a valid geojson file.

These changes look for a cell_detections.geojson file in the qupath-stardist output directory, and rewrite it if it looks necessary. Specifically, if the top-level data structure in the file is a list, it's rewritten so that the top-level structure becomes

{
 "type": "FeatureCollection",
 "features": <original list>,
}

The code was tested on the spatial-analysis tutorial slide, and seems to work as expected. As a positive side effect, the geojson file is owned by the user after this, instead of root. I haven't actually tried to load the new geojson file in geopandas yet.

On this sample slide, it takes 2-3 of minutes to read and to write the geojson file, which contains ~725,000 features.
This could be faster if we used a faster json module, or if we avoided parsing and re-composing the geojson file. I could try using orjson or ujson to see if they give a significant speedup.

@tomp tomp requested review from raylim and armaank August 15, 2023 14:53
@tomp tomp changed the title Rewrite geojson files as FeatureCollection's, if necessary. Rewrite geojson files as FeatureCollections, if necessary. Aug 15, 2023
@tomp
Copy link
Contributor Author

tomp commented Aug 15, 2023

I verified that the file we're producing is readable by geopandas. For the slide I'm using for testing, it takes ~340 sec for geopandas to load the file, which is much longer than the ~120 sec it takes for the (slow) stdlib json module to load the same file.

Interestingly, our "fixed" geojson file not readable by the Python geojson module directly, probably because the NaNs are not technically supported by json. (The orjson module also won't load these files for that reason.).

1. Use 'current' instead of 'latest', because 'latest' is flaky.
2. We expect the qudist image to write a valid geoJSON file, now.

This image should produce valid GeoJSON files directly.
@tomp tomp changed the title Rewrite geojson files as FeatureCollections, if necessary. Fix the GeoJSON files written by our QuPath images Aug 24, 2023
@tomp
Copy link
Contributor Author

tomp commented Aug 24, 2023

I ripped out the GeoJSON-rewriting changes, because the QuPath images have been modified to write valid GeoJSON now.

This code was modified just to use the images tagged current, rather than latest. The qupath-stardist:current image on Dockerhub includes the new groovy script.

@tomp tomp merged commit 2866ceb into dev Aug 24, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants