Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build PROJ without sqlite dependencies #1552

Closed
yonarw opened this issue Jul 18, 2019 · 35 comments
Closed

Build PROJ without sqlite dependencies #1552

yonarw opened this issue Jul 18, 2019 · 35 comments
Labels

Comments

@yonarw
Copy link
Contributor

yonarw commented Jul 18, 2019

We are currently using PROJ v5.1.0 and want to continue using the library. We are especially interested in the new WKT parsing capabilities of version 6+. Looking at the code a bit it seems sqlite is a hard dependency when we want to be able to parse WKT.

Our question is: Would there be a possible/reasonable way of providing a variant of PROJ without sqlite that still is able to parse WKT?

We'd like to work on and contribute a possible solution for this but first wanted to ask what the community thinks about it.

@rouault
Copy link
Member

rouault commented Jul 18, 2019

sqlite is quite a ubiquitous dependency (especially for anyone working with geospatial data, with the GeoPackage or Spatialite formats for example), and not a hard one to build, so I'm not sure there's really interest in offering that option.
Technically yes parsing WKT doesn't require sqlite, although there are subtle dependencies like when parsing ESRI WKT, where the database is used to replace ESRI specific names with EPSG ones (but in the case the database isn't reachable, that will still work and just don't do the name substitutions)

@yonarw
Copy link
Contributor Author

yonarw commented Jul 18, 2019

sqlite is quite a ubiquitous dependency

While I agree with that, in our case we use PROJ inside of a database (SAP HANA). Having sqlite as a dependency of HANA is the thing we want to avoid.

Technically yes parsing WKT doesn't require sqlite, although there are subtle dependencies like when parsing ESRI WKT, where the database is used to replace ESRI specific names with EPSG ones (but in the case the database isn't reachable, that will still work and just don't do the name substitutions)

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

@hobu
Copy link
Contributor

hobu commented Jul 18, 2019

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

Not really, no. The old PROJ didn't run without its database either, but its database was a bunch of CSV files.

An interesting question is why is the SQLite dependency an issue for you?

@rouault
Copy link
Member

rouault commented Jul 18, 2019

Having sqlite as a dependency of HANA is the thing we want to avoid.

That's an easy one to hide. Just do a static build of sqlite & proj

So do you see any possibility of removing the dependency (maybe via cmake build option) and have restricted capabilities when running without db access.

It could be technically possible, but I'm not sure the project wants to maintain that. Another issue I see is that the test suite ("make check") is dependent in a lot of places on having the database available, so that would mean: either sacrifice "make check" in that degraded environment (but who knows if that works correctly then), or do a lot of changes in the test suite so that it can run in the no-sqlite3 case by skipping tests or allowing 2 possible outputs: a lot of initial work to do that, and a maintenance burden over the long term.

@yonarw
Copy link
Contributor Author

yonarw commented Jul 18, 2019

An interesting question is why is the SQLite dependency an issue for you?

As a database we do not want to have dependencies to an embedded database.

... but its database was a bunch of CSV files.

I see. But AFAIK you did not need to ship the CSV files with the library pre v6 right?

Just do a static build of sqlite & proj

We would not have a lib-dependency to sqlite (although the code is still compiled into the static proj library), but still would need to deliver the .db file.
I understand your point about testability. So far I only looked at the code that depends on sqlite which is mostly inside factory.cpp (?) where a implementation for DatabaseContext and other interfaces is provided. I thought one possibility could be providing an additional implementation which reads the information from CSV files instead of .db file. Tests are of course another part that needs to be considered. Thanks for your answers this far, I will discuss possible solutions with our team.

@rouault
Copy link
Member

rouault commented Jul 18, 2019

but still would need to deliver the .db file.

You could possible puts its content inside a .text entry in some DLL, and use SQLITE in-memory capabilities to work with it.

I thought one possibility could be providing an additional implementation which reads the information from CSV files instead of .db file.

SQlite could with a bit of work be replaced by another SQL92 capable database, but you need to be able to support JOIN, WHERE, ORDER, etc... A simple CSV file reader won't do.

@aaronpuchert
Copy link
Contributor

aaronpuchert commented Jul 30, 2019

The core issue for us is the entanglement of the actual projections (which we want) and the metadata management, which we don't want, because we're having our own.

Not really, no. The old PROJ didn't run without its database either, but its database was a bunch of CSV files.

The old libproj.so ran well enough without the CSV files that we could use it that way. When you have your own CRS table to look up transformation parameters those files were never read, and so we didn't even install them. (We have to maintain that table on our side because we have to ensure transactional integrity and other shenanigans. Needless to say that we also want to allow users to add reference systems.) We have no issue with migrating the CRS data from CSV to a SQLite database, but that it has become a hard dependency even when the data isn't needed.

There is also metadata compiled into the library: the data for the proj_list_* functions in ellps.cpp, datums.cpp, units.cpp. But we have that on our side as well.

An interesting question is why is the SQLite dependency an issue for you?

Our issue with SQLite is neither its size nor availability nor delivery. Certainly we can handle all that. The issue is about a leap in complexity, and that we need a dependency for a feature that we don't use.

The projection library that we want (and it's how we have used it in the past) is a library of plain computations that runs just on CPU and memory and doesn't know about the rest of the system. With SQLite that goes out of the window, as it does all kinds of low-level file I/O. Using a SQL database engine also adds a lot to runtime complexity, which isn't really needed here:

  • Since the database is read-only (at least libproj doesn't modify it), all the painful measures to ensure atomicity, consistency and durability aren't needed. Granted, isolation isn't hard for SQLite because it locks the entire file for writes.
  • Since we know the schema at compile-time, all the generality of being able to create arbitrary tables isn't needed.
  • Since all queries are known at compile-time, compiling and optimizing queries at run-time isn't needed.

But we aren't concerned with how the metadata management is handled, it's rather that we handle it ourselves and thus don't want to carry another metadata manager around that we don't use.

We don't care whether it's an optional or replaceable component, which leaves two possibilities:

  1. Make it possible to build libproj.so without SQLite and without metadata, meaning that clients have to manage the data and the library becomes a pure computation engine.
  2. Abstract away the storage layer, so that it could be implemented in different ways, possibly even by clients of the library.

The first route sounds more promising to me, although the work of separating the components would need to be done in both variants. Ideally metadata management would be handled by an entirely separate library which is an optional dependency for PROJ. That library might then as well use SQLite.

Another issue I see is that the test suite ("make check") is dependent in a lot of places on having the database available.

I don't see this as a big issue, on the contrary: separating tests for the projections from tests for metadata management should lead to a cleaner structure of the entire test suite.

@funchal
Copy link

funchal commented Sep 4, 2019

I agree SQLite should be optional and it would enable using PROJ in a lot of additional contexts where this dependency is not available/not acceptable.

@kbevers
Copy link
Member

kbevers commented Sep 4, 2019

None of the active PROJ developers are interested in liberating PROJ from sqlite. After all, we decided to put it in there because we thought that was a good idea. We did so knowing that a small fraction of the community would not like it. The added benefit of being able to use sqlite to handle the complexity of the CRS registry was, and is, worth it to us. With that said, if the community can provide a solution that is acceptable we can't ignore that. Should anyone want to take on this task, at first a RFC with a detailed description of the proposed solution should presented to the PROJ PSC. Such an RFC has a much better change of receiving a positive vote by the PSC if it comes with a preliminary proposed implementation. It will be quite a disruptive change to the current code and we need to be assured that it is sane and posible to maintain.

@funchal
Copy link

funchal commented Sep 4, 2019

@kbevers Sounds fair!

I had a closer look to check the extend of the usage of sqlite in the code, and it seems to be very well architected and fairly contained within https://github.com/OSGeo/PROJ/blob/3ae09c3ba164728e200a3b930b72ed5fc24ef6ee/src/iso19111/factory.cpp. There's around 50 different queries but only 5 are complex.

I quite like the direction @rouault was going (.text entry in some DLL), but I think the queries are simple enough that it might be possible to use in-memory data-structures rather than through SQL. Alternatively it might be possible to just make the DatabaseContext api virtual so users can implement their own alternative.

@rouault
Copy link
Member

rouault commented Sep 4, 2019

I wouldn't want the code to be bound to the existing SQL requests. Newer one might be added, or existing changed. At some point, the EPSG dataset structure will be changed (presumably end of this year), and so proj.db schema will also have to adapt, and the queries will change for sure.
Having alternate implementations of DatabaseContext might be acceptable, but we'll still want SQL to be used as the input for queries and expect the alternate database to have the same layout as proj.db
Anyway I still don't see the big deal of having a standalone proj.db file. PROJ practical use since the introduction of datum transform capabilities around 2000 has always be tied to using external resources like grid files, etc for non-trivial coordinate operations. If your use of PROJ is simpler than that, you'd better just extract the math of the few projection methods you need.

@funchal
Copy link

funchal commented Sep 4, 2019

Cool. I understand proj.db is fine for 99% of use-cases, but it'd be nice if we could make this work for 1% of users which are either not interested on the grid files or need a self-contained build without external resources. I like your idea of using DatabaseContext.

Basically I think an implementation plan could work along the lines of:

  • Turn the DatabaseContext member functions to virtual in
    class PROJ_GCC_DLL DatabaseContext {
    .
  • We'd also need to do the same for AuthorityFactory in
    class PROJ_GCC_DLL AuthorityFactory {
  • Then rename the current implementation of DatabaseContext to SQLiteDatabaseContext.
  • The static create function creates a SQLiteDatabaseContext by default.
  • Add a cmake build flag to disable compiling the two SQLite implementations when a custom version is provided, in which case the user of the library has to define the create function themselves.

@funchal
Copy link

funchal commented Sep 6, 2019

#1595 is a very rough draft which is only intended to show that the approach I proposed can potentially work.

I'd be happy to spend more time to finish and clean it up. I'd like to move the SQLite DatabaseContext to its own file so it can be conditionally compiled, add a cmake build option, write some tests, and perhaps an example of how a user can implement their own DatabaseContext interface.

Please let me know if there's interest in merging something along those lines if I were to spend more time on it.

@kbevers
Copy link
Member

kbevers commented Nov 27, 2019

Hello, I would like to ask how long will it take to finish this work?

It is not likely that you will be able to build PROJ without SQLite in the future. The reasons for this has been expressed quite clearly in the above comments.

And is there anything I can do for this issue? I'm glad to make some contribution for it very much!

If you after reading the above discussion are still interested in making PROJ build without SQLite, you would have to write up a RFC describing in detail how you would do it. The RFC would have to adress all the various concerns the PROJ core contributors has expressed above and should preferably include a prototype that demonstrates a real world implementation of a PROJ that can build and be used without an SQLite dependency.

@benstadin
Copy link

benstadin commented Dec 29, 2019

I also just faced a problem with the embedded SQLite db: It's very slow to initialize lots of proj contexts and uses a lot of memory. My whole application used to consume about 10 MB, with Proj 6.2.1. and about 100 Proj instances the application uses about 500 MB (before somebody asks: those 100 instances are all required).

I'm about to profile a bit more. But it looks like Proj 6 initializes by reading the Db again for every instance. And the unit tests take several minutes longer (from 2 to 5 minutes).

Update:
Most of the CPU time is indeed eaten up by proj_create_crs_to_crs.

@rouault
Copy link
Member

rouault commented Dec 29, 2019

It's very slow to initialize lots of proj contexts

Yes, you need to minimize the number of contexts

And the unit tests take several minutes longer (from 2 to 5 minutes).

Make sure you use sqlite >= 3.11

@benstadin
Copy link

That's a quick response =).

Sqlite Version is 3.28, System is a Macbook Pro 16 / Core i9. I'll investigate how to reduce the contexts. But I have a bad feeling that it would require quite a lot of work to introduce a relationship between actual worker threads in the main application and the library where the projection methods are in.

NB: There are a few leaks in proj_normalize_for_visualization().

@rouault
Copy link
Member

rouault commented Dec 29, 2019

NB: There are a few leaks in proj_normalize_for_visualization().

Reproducer ?

@benstadin
Copy link

benstadin commented Dec 29, 2019

Sorry, the leak was my fault (faulty copy constructor). The context creation will be challenging. I'll investigate further but even if that works somehow and reduces memory usae, it will still be consuming too much for my application. Optimally it would only initialize as long and take as much memory as required for the actual projection.

@benstadin
Copy link

To give some feedback, I did some further tests and considered my options for integrating proj 6 into my project. Unfortunately I had to remove proj 6 and use another library because of slow initialization, memory usage and architectural choices (e.g. (re-)using proj contexts with worker threads was cumbersome).

I second the initial request to get rid of the SQLite DB, and additionally trim down memory usage and initialization time. I'd like to use proj again in the future and will try again if there are improvements to these issues.

@GrahamAsher
Copy link

I would like to add my vote for getting rid of the SQLite dependency, at least optionally. I have been trying to move my product, CartoType, to proj 6 and have found it impossible for that reason. An earlier commenter said "The projection library that we want (and it's how we have used it in the past) is a library of plain computations that runs just on CPU and memory and doesn't know about the rest of the system." His comments apply to us precisely.

@rouault
Copy link
Member

rouault commented Jan 22, 2020

I would like to add my vote for getting rid of the SQLite dependency

It would require much more than a vote, but significant effort to implement and thus someone to take the lead or fund someone else to do it. The SQLite dependency enables PROJ to provide much higher value than just map projections. The reality of today and tomorrow geodetic, and also cartographic (as the frontier between both tends to blur), needs, requires to be able to deal with datum shifting, time based transformations, etc etc and rely on a database providing the necessary information.

I also question the impossibility of being able to use PROJ6 as it is today as being a lack of effort in trying. It should be abloe to create static builds with a sqlite3 static lib, and if just using proj_create() with pipeline strings, no database access should be attempted (or could probably be made to work with modest effort even if it is currently tried)

@benstadin
Copy link

The reality of today and tomorrow geodetic, and also cartographic (as the frontier between both tends to blur), needs, requires to be able to deal with datum shifting, time based transformations, etc etc and rely on a database providing the necessary information.

This is very reasonable and I fully agree with this statement. However I think we need to look at it from different angles:

  1. Pro SQLite: Enabling things like datum shifting
  2. Con: Avoiding SQLite due to IO usage and dependencies
  3. Con: Memory usage and initialization performance

I’m affected by issue 3. Issue 2 is potentially a non-issue when optimized. SQLite itself isn‘t a problem when integrated well.

Idea: Keep SQLite but change the behavour to initialize the data lazily. It might still require some work, but it‘s a compromise to reduce file IO for those who don‘t yet want the SQLite dependency and will allow to run Proj 6 in memory and CPU constrained environments (I‘ve a define in my appto seamlessly switch between Proj 6 and another lib; I really can‘t use Proj 6 on several targeted devices due to memory requirements and init times as outlined above).

A further optimization for issue 2 might be to embed the db and have it load from memory, entirely avoiding any further file IO. SQLite makes that rather easy.

@rouault
Copy link
Member

rouault commented Jan 22, 2020

change the behavour to initialize the data lazily

That's mostly the current behaviour. But createOperations() need to explore a lot of objects in the database (see https://proj.org/operations/operations_computation.html for how it works), hence a significant part of it is probably accessed. I'm not sure having the DB embedded in the binary will change I/O related performance, and certainly it will not have any impact on memory consumption.

@benstadin
Copy link

... but about 5MB memory per instance and the initialization times I see is a lot.

@hobu hobu mentioned this issue Jan 23, 2020
4 tasks
@rouault
Copy link
Member

rouault commented Feb 4, 2020

Re-reading this thread, it seems that for most people who don't want the SQLite dependency, just using plain cartographic projection and providing already constructed pipelines to proj_create() would be sufficient. It doesn't appear that abstracting the database interface to be able to plug another DB would be that needed.

What would be lost by building without the SQLite dependency:

  • obviously, all functions in the C API that explicitly depend on the database.
  • anything refering to a EPSG code (unless you use the old 'espg' text file, that you'd have to extract from older PROJ installations, and the +init=epsg:XXXX syntax)
  • proj_create_crs_to_crs() with arguments that are not PROJ CRS strings, like CRS names or codes.
  • proj_create_crs_to_crs() / proj_create_operations() would not be able to do datum shift, unless the input & output PROJ CRS strings include an explicit +towgs84 / +nadgrids / +geoidgrids argument.
  • degraded functionality in WKT ingestion / export where some names & aliaex are looked up in the database. In particular the WKT1_ESRI <--> WKT2 mapping would largely be non functional since it heavily depends on the database.
  • with PROJ 7, and using network capabilities, a SQLite3 cache of downloaded chunks is used. But I guess people not wanting the SQLite3 dependency will also not want the libcurl and libtiff ones (which can be disabled), so probably not a big deal.

Besides code changes in the library, the main impacts I see are more on the test suite where we'd have to separate tests that directly or indirectly depend on the database.

If some of the persons participating in this thread are interested in funding such effort, they can email me (even.rouault at spatialys.com)

@stale
Copy link

stale bot commented Apr 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 4, 2020
@stale stale bot closed this as completed Apr 11, 2020
@aaronpuchert
Copy link
Contributor

Re-reading this thread, it seems that for most people who don't want the SQLite dependency, just using plain cartographic projection and providing already constructed pipelines to proj_create() would be sufficient. It doesn't appear that abstracting the database interface to be able to plug another DB would be that needed.

That would be what @yonarw and I want. The limitations that you listed are what I expected.

If some of the persons participating in this thread are interested in funding such effort, they can email me (even.rouault at spatialys.com)

We obviously need to discuss with our management about this, and will reopen if we have a path forward.

@paamand
Copy link

paamand commented Dec 20, 2020

Good discussion. Only thing I would add is the niche of using GDAL/PROJ4 on Android. SQLite dependency is not satisfied in cross-compilation to Android. But I could probably include the source from sqlite.org. Just not very clean imo.

@GrahamAsher
Copy link

My product, CartoType, uses PROJ4 on Android, iOS, Windows, Linux, Mac OS, Raspberry PI, Jetson Nano and many other platforms. It's an absolute must that PROJ4 doesn't have the SQLite dependency. Luckily I can carry on using the previous version without any difficulty, but at some point I might want to revisit this problem. If I have the leisure I might attempt my own fork.

It was a strategic error to introduce the SQLite dependency because of its great impact on runtime memory use, among other things. @rouault said 'The SQLite dependency enables PROJ to provide much higher value than just map projections.'; but actually a library that just provides map projections is exactly what many of us need, and it ought to be possible to separate that out from the other functionality, which actually reduces the net value to us.

@rouault
Copy link
Member

rouault commented Apr 28, 2021

but actually a library that just provides map projections is exactly what many of us need, and it ought to be possible to separate that out from the other functionality, which actually reduces the net value to us.

I'm going to be a bit provocative because I'm a bit annoyed by the constant whining in this thread: why should we care about what value PROJ brings to you, if you don't bring value to PROJ ? The changes of PROJ 6 were done on purpose (this was a strategic decision) because people badly needed a first class geodetic transformation library, which it is now, and gathered the needed funding to make that happen. As outlined before, I guess we could still be able to produce a cut down version that has projection support only, but that won't happen by magic. Otherwise nostalgic people can just indeed fork off and leave their lives without being tied to our crazyness :-)

@GrahamAsher
Copy link

Being provocative is never productive or useful. I have avoided personal remarks or slurs like 'whining' and 'fork off' and I suggest that we all do that here. My criticism, and that of others, is well-meant. We believe that we are bringing value to the project by our criticisms, and, as I and others have mentioned, we might also help with making desired changes to make PROJ more useful to everybody. I have benefited hugely over the last 18 years from criticism of my product, CartoType. While a complaint may initially cause me to feel resentful, I always try to think it over and work out how I might learn from it and improve my work.

@kbevers
Copy link
Member

kbevers commented Apr 28, 2021

We believe that we are bringing value to the project by our criticisms

At this point you are not. Actual contributions, funding or code, would be bring value to the project. As has been expressed many times, none of the current active developers has any use for a SQLite-less PROJ (quite the contrary) so the changes you seek are not going to happen by asking for it over and over. Step up and provide some actual value and we can continue this discussion. Until then you can use PROJ 5 and earlier which comes without the SQLite dependency free of charge.

@GrahamAsher
Copy link

@kbevers We obviously disagree about what constitutes value, and you already know that I believe that constructive criticism has a value of its own. However, I'll leave it there until I have time to take another look at the code. I did in fact start to attempt a clean removal of the SQLite dependency, but it was too hard, and as you say, I can continue to use the older version.

@kbevers
Copy link
Member

kbevers commented Apr 28, 2021

We obviously disagree about what constitutes value

That much is clear. It is obviously of value to you if the PROJ project take on the burden of developing a SQLite-free version of the software. You can update your commercial software package with 3 years of PROJ bug fixes and new projections and bring value to your customers. And I suspect also increase your revenue. If there's a kickback from that revenue to the project we would likely be more welcoming to the suggested change but so far there has been no indications of that from any of the inquiries we've had regarding this topic. So as far as I can tell your "constructive criticism" only leaves us with an increased maintenance burden of a feature the project doesn't really want. From the viewpoint of a maintainer of the project that is not value.

I am going to lock this discussion. I believe everything that needs to be said on this topic has been touched several times in this discussion and reiterating those arguments doesn't get us anywhere. Should someone in the future actually want to make this change happen, here's a few options that will get you of to a good start:

  1. Engage actively with the community instead of just demanding that your pain point be taken care of. Build up social capital by helping make the project better, that way we are more likely to show an interest in your particular problem. Start Here.
  2. Write an RFC that details how your proposed changes will work. Back it up with a prototype implementation and commit to maintaing your code in the future.
  3. Engage one of the contracting PROJ developers and provide them with the funding to realise your vision.

@OSGeo OSGeo locked and limited conversation to collaborators Apr 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants