Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add world files to the Google Cloud Storage bucket #317

Open
kebekus opened this issue Oct 31, 2023 · 6 comments
Open

Add world files to the Google Cloud Storage bucket #317

kebekus opened this issue Oct 31, 2023 · 6 comments
Assignees
Labels
question/support Further information is requested

Comments

@kebekus
Copy link

kebekus commented Oct 31, 2023

Dear Stephan,

As far as I can see, the GCS bucket contains nation-specific files with aviation data. Would it be possible to generate similar files that contain the data for the whole world?

I need this data daily to generate aviation maps for Enroute Flight Navigation. Currently, I use the REST JSON API, which is cumbersome. Because of the limit on the maximum item count retrieved per page, I have to split my requests into zillions of smaller ones that download one page each. It would be much easier for me (and produce less load on your server) if I could download GeoJSON files with all the relevant data from GCS.

Best wishes · Thanks again,

Stefan.

@reskume reskume self-assigned this Oct 31, 2023
@reskume reskume added the question/support Further information is requested label Oct 31, 2023
@reskume
Copy link
Contributor

reskume commented Oct 31, 2023

Hi Stefan,

do you use the openAIP Core API to download content for the Enroute app or do you use GCS with a dedicated library to download the geojson files from there? If you use the openAIP Core API, it may be beneficial to get all the geojson files via GCS. You can also create a filter for the files you want and then simply loop over all files and download them in parallel.
Meanwhile I will think a bit about adding world files but this isn't an easy one since adding world exports will, to some extent, degrade the resilience of the system against unexpected outages and service disruptions. Currently each task will be re-scheduled until it is successfully done. Since each export (except the US one) doesn't take much time, this approach scales very well. But if we create world exports, the probability of problems with the exports will drastically increase and will require a more complex solution to reach the currently level of reliability.

Cheers,

Stephan

@kebekus
Copy link
Author

kebekus commented Nov 1, 2023

Dear Stephan,

I do use the core API, and I will experiment with downloads from GCS, as you suggested. Thanks!

Best wishes,

Stefan.

@BlackFlash5
Copy link

I'd be interested in world files as well.

You can also create a filter for the files you want and then simply loop over all files and download them in parallel.

Might be a dumb question, but wouldn't this be a solution for you as well if you are unsure about the resilience of your own service?
After every export task is completed, you could pull everything, merge it and export the merged world files.
This way x developers wouldn't have to work around this by pulling a file for each country and merge them themselves.

@reskume
Copy link
Contributor

reskume commented Nov 30, 2023

The current situation is that we are not able to merge data within our current logic. We could only create the complete list of country files and then combine them into world files but since one export run contains > 1000 export files, the exports do happen in parallel and each job is decoupled from each other to have the best resilience against possible service outages etc and not loose something on the way. This makes is currently impossible to know when every export is finished without adding additional logic to control that.

I don't say that this is impossible to implement, but the effort to get it right and be reliable is simply not worth the benefit. We have a 1000 item limit on the API endpoints. For example, if you want to download the complete airports data it will require a client to loop over all pages and download in parallel which will be total count of 46 requests. At least for me, when run in parallel, this happens quite fast and requires 6 lines of code to do this. The same code can be used to request the whole data for each provided endpoint by just swapping out the requested URL.

Also, please be aware that the bucket files are created every day but only once. So, if you want the newest data you always have to use the API. And in the worst case, the exports are more than one day behind the API if there has been any problems with the exports and the system keeps them back until it sees fit to try them again.

@BlackFlash5
Copy link

My main reason for world files is tile data.
Getting all tiles would take ages through the API, create avoidable load for your service and you can't get just one layer.
Currently I only need the airport layer, so running my own tile server to reduce load times/bandwidth for users becomes an interesting topic.
When running my own "tile server" I wouldn't need the latest data and I'd be perfectly happy with a daily update.

But since this feature seems to be out of scope for now, might I ask one off-topic question?
I couldn't find any information on if tiles can overlap at country borders when downloading all mbtiles-files and merging them into one dataset.
Do you have to detect and merge overlapping tiles, or are they sliced in a way so you don't have to worry about that?

@reskume
Copy link
Contributor

reskume commented Nov 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question/support Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants