-
Notifications
You must be signed in to change notification settings - Fork 13
Implementation Alternatives
Before racing ahead with a PostgreSQL - PostGIS - CKAN data warehouse, let's slow down for a moment and consider the alternatives. This list is based on the research from other agencies and our online research.
Not open source (although they do provide source code for some portions).
Socrata was one of the first "start-up" type firms to crack the data warehouse market. I have personal friends who work at Socrata and say very nice things about their product, but it also has some pretty major shortcomings.
- Since it's a cloud offering you don't need to worry about maintaining any of your own systems. Big plus.
- Like Tableau, the product has some very specific capabilities which are shiny and useful, but there is almost no way to extend or modify the built in data analysis tools.
- No database "joins" across tables. Probably no big deal for CMP and Fast-trips tables -- or is it? This is their #1 customer complaint
- Socrata's geospatial/mapping capabilities are brand-new and really bare-bones. Carto is going to eat them alive, according to my jaded friend who works at Socrata.
- Their #2 customer complaint is that it's difficult to import data into the system
However once you've got your data in their system, I hear great things from people saying it's got a nice interface and easy to use.
Carto is essentially a "Pro" offering on top of PostgreSQL/PostGIS. They provide a nice interface, a clean API for opening up your data, and some nice builder tools. I haven't used it ever, because enterprise is not free. I hear it's worth the money.
- But I don't know how much it costs, it's behind a "Contact Us" link. I could contact them if you want to know pricing details. There is a free version which just uses public data, which I think exists so you can demo it and explore the features.
- The Carto "Builder" tool looks like a nice drag/drop way to set up dashboards without coding. Happy to download and explore that if you want.
- Since it's based on PostGIS, it would probably be an easy upgrade path to Carto at some later date.
In the Big Data universe, things move very quickly but the Apache Hadoop ecosystem is the largest and most relevant open source platform for scalable, distributed computing. That isn't exactly what we're looking for; we're looking for a geospatial data analysis platform. However, Hadoop is worth knowing about, even if we don't use it. Hadoop does everything the Big Data way: it's got a MapReduce framework, a SQL query tool, Redis data storage support, Amazon S3 connections, and a ton more.
The PostGIS database is viewed as old-school: a PostGIS database with a defined schema is considered less flexible than the NoSQL alternatives which are used at Google and Facebook for their social graphs and big tables. I don't think we need fast, nimble product changes in the face of competition: so I think it's okay to pick a stable boring database and build upon that.
The learning curve for Hadoop is enormous, and the framework is not specifically tailored to geospatial stuff like we're doing. So... I don't think we want to go down this path just yet. I just want you to be aware of it as an alternative.
Some useful links if you want to know more:
- How to learn Hadoop for free - blog post listing all the components you would need to learn to get up to speed on Hadoop
- Hadoop Home Page
- Apache Hive - data warehouse infrastructure. Creates an SQL-like interface on top of big data stored in Hadoop.
The huge player in geospatial analysis. Their products work with any database backend; they're selling the geo-stuff on top of that. Now they sell cloud services too, but these are mostly products which add further lock-in to their existing products and scripting languages.
- Not open source
- Huge customer base, giant ecosystem of products and support services
- Each web map you create dings you for one of your licensing "credits". You can always buy more credits of course. We had one person devoted essentially full-time to ESRI licensing compliance.
Furthermore: my personal experience with the ESRI product portfolio has been really disheartening. Every step of the way we had licensing issues, and were constantly dealing with ESRI customer support for bugfixes and support. The fact that they have customer support is great, but the number of times we needed to use it was alarming.
The Microsoft business intelligence suite, based around MS SQL Server, is well-regarded, well-supported, and there is an enormous ecosystem of Microsoft-certified vendors who can help you build and maintain systems -- all for fees, of course, but many agencies say it's worth the investment.
- SANDAG is all-in on a Microsoft-based system. The upfront costs are enormous (tens of thousands of $$), but you get a complete turnkey database with lots of hooks into Excel & Access tools. SANDAG uses MS-SQL for their financial accounting, so the system cost is shared across departments.
- Oregon Metro uses a MS-SQL backend database, connected to an ESRI geodatabase. They just switched to this system last year.
Get Started
- Back-End Setup
- Setting up your development environment
- Building Prospector locally
- Publishing your changes
Other Useful Links
- Recipes for typical tasks
- Glossary
- Publishing instructions for CMP standalone site
- Deploying a new release GitHub Pages and Prospector
Platform Considerations
Background Information