Skip to content

Latest commit

 

History

History
121 lines (95 loc) · 5.47 KB

README.md

File metadata and controls

121 lines (95 loc) · 5.47 KB

SuperDB Tests GoPkg

SuperDB is a new analytics database that supports relational tables and JSON on an equal footing. It shines when it comes to data wrangling where you need to explore or process large eclectic data sets. It's also pretty decent at analytics and search use cases.

Unlike other relational systems that do performance-fragile "schema inference" of JSON, SuperDB won't fall over if you throw a bunch of eclectic JSON at it. You can easily do schema inference if you want, but data is ingested by default in its natural form no matter how much heterogeneity it might have. And unlike systems based on the document data model, every value in SuperDB is strongly and dynamically typed thus providing the best of both worlds: the flexibility of the document model and the efficiency and performance of the relational model.

In SuperDB's SQL dialect, there are no "JSON columns" so there isn't a "relational way to do things" and a different "JSON way to do things". Instead of having a relational type system for structured data and completely separate JSON type system for semi-structured data, all data handled by SuperDB (e.g., JSON, CSV, Parquet files, Arrow streams, relational tables, etc) is automatically massaged into super-structured data form. This super-structured data is then processed by a runtime that simultaneously supports the statically-typed relational model and the dynamically-typed JSON data model in a unified compute engine.

SuperSQL

SuperDB uses SQL as its query language, but it's a SQL that has been extended with pipe syntax and lots of fun shortcuts. This extended SQL is called SuperSQL.

Here's a SuperSQL query that fetches some data from GitHub Archive, computes the set of repos touched by each user, ranks them by number of repos, picks the top five, and joins each user with their original created_at time from the current GitHub API:

FROM 'https://data.gharchive.org/2015-01-01-15.json.gz'
| SELECT union(repo.name) AS repos, actor.login AS user
  GROUP BY user
  ORDER BY len(repos) DESC
  LIMIT 5
| FORK (
  => FROM eval(f'https://api.github.com/users/{user}')
   | SELECT VALUE {user:login,created_at:time(created_at)}
  => PASS
  )
| JOIN USING (user) repos

Super JSON

Super-structured data is strongly typed and "polymorphic": any value can take on any type and sequences of data need not all conform to a predefined schema. To this end, SuperDB extends the JSON format to support super-structured data in a format called Super JSON where all JSON values are also Super JSON values. Similarly, the Super Binary format is an efficient binary representation of Super JSON (a bit like Avro) and the Super Columnar format is a columnar representation of Super JSON (a bit like Parquet).

Even though SuperDB is based on these super-structured data formats, it can read and write most common data formats.

Try It

Trying out SuperDB is super easy: just install the command-line tool super.

Detailed documentation for the entire SuperDB system and its piped SQL syntax is available on the SuperDB docs site.

The SuperDB query engine can run locally without a storage engine by accessing files, HTTP endpoints, or S3 paths using the super command. While earlier in its development, SuperDB can also run on a super-structured data lake using the super db sub-commands.

Project Status

Our long-term goal for SuperSQL is to be Postgres-compatible and interoperate with existing SQL tooling. In the meantime, SuperSQL is a bit of a moving target and we would love community engagement to evolve and fine tune its syntax and semantics.

Our areas of active development include:

  • the SuperSQL query language,
  • the type-based query compiler and optimizer,
  • fast, vectorized ingest of common file formats,
  • a complete vectorized runtme, and
  • a data lake based on super-structured data.

SuperDB Desktop - Coming Soon

SuperDB Desktop is an Electron-based desktop app to explore, query, and shape data in a SuperDB data lake. It combines a search experience with a SQL query and has some really slick design for dealing with complex and large JSON data.

Unlike most JSON browsing tools, it won't slow to a crawl --- or worse crash --- if you load it up with ginormous JSON values.

Contributing

See the contributing guide on how you can help improve SuperDB!

Join the Community

Join our public Slack workspace for announcements, Q&A, and to trade tips!