-
Notifications
You must be signed in to change notification settings - Fork 1
Meeting Notes
1/21/2019 Update: New parse structure for optimizer->execution Extensible json representation Plan to strip out extra pieces from Quickstep eventually since we only need the optimizer. Interface should be clean due to wrapper layer abstraction. How do we get data into the system? Currently with tests. Do we want a programatic interface? Should wrap the execution engine in an API layer to build the physical plan easily. Want to be able to deal with this level, even if we lose the optimization. For example: If we want a platform that does graph analytics AND normal relational work -> We need an execution engine that can handle both, which differs at a very low level. Another example: Unifying an optimizer for graph and relational is really difficult, whereas building a single one is fast. However, the fast one is slow.
Kevin Want to clean up the quickstep compiling Type system is the biggest chunk What Join syntax should we use? Want to add full join syntax.
Matt Did the clean up in the parser interface Currently run existing optimizations Want to add DISTINCT support - will require another hash table Execution engine currently only supports cross joins Want to add NULL support so we can do other joins
Somya Added some predicate support in the Project operator Scan the table to generate a bit-vector and then scan the table again to output. Add a custom function like “matches last 3 integers in IP address” Will need optimizer to pass through unknown function somehow. Will need execution parser to parse it. Will need a catalog of custom functions in the execution engine. Will need execution engine to execute these -> Project needs
Work efforts: Build column-store abstraction -> replace the current “tuple” loop with an abstraction. Robert can tackle this. Programmer interface for the execution engine. Somya will start thinking about that as she wraps up other work. Create table + Insert Strip out pieces of Quickstep -> Delay this in favor of functionality, unless it makes you happy. Joins in parser + resolver Kevin LIMIT for testing Aggregation - Add a way to specify Group By Matt
CIDR Talks: Store data in the DNA molecule. DNA is a molecule that knows how to survive naturally for 20,000 years. Compared to 5-70 years with digital media. Density is super high
Papers: Need to make a case for Predicate Locks on high performance in front of the database. “Building new engine Hustle, going to run TPC-C and TATP on top of Hustle with new method to show we have high performance (sans-recovery) and it runs as an independent service.” Will need b-tree indexes. Quickstep and Stanford both have templates. What will it take to run TPC-C? Robert and Kevin to work on this. Job benchmark stuff - Yannis knows How much machine learning for query optimization? ImageJ Data Layer What does it take to save experiments from ImageJ into a database. Tracking input, operations, results. Build analytics around that. Use cases Track what they did Share what they did Cache results for batches you appended. Human productivity improvements.
12/10/2018 Optimizer [Yannis]
- Reduced build time of quickstep.
Execution Engine [Somya]:
- Sketch out the flow between physical operators
Resolver [Kevin]:
- Talked with Jianqiao about the Resolver.
- The first version of the resolver will used a dummy catalog and type system.
Execution Engine [Matt];
- Familiazring with execution engine
- Adaptive aggregation exploration
Next Steps:
- Integrate resolver and optimizer
- Create a dummy catalog
- Join with nested loops
- Aggregation
11/26/2018 Parser[Kevin]
- Aggregated for query 2 are supported checked in.
Optimizer[Yannis]
- Hustle builds with Quickstep.
- Work on commonprefix, resolver.
Execution Engine [Somya, Robert]
- Making datatypes flexible in size (current defaults to fixed sizes and only supports int and ipaddress)
- Explore multi-threading support.
General
- CI is in place in travis.
11/12/2018 Parser [Matt, Kevin]
- Send the example plan for the Optimizer.
- Wrote a cli.
- Ongoing work expanding parser support.
Optimizer [Yannis]
- Produce a plan at Quickstep's physical plan level for now, in the future discuss if we should use the execution level abstraction or the physical plan to pass on to the execution engine in the future.
- Working on physical plan serialization for execution engine.
- Meeting with Parser and Execution teams on Tuesday to coordinate.
Execution Engine [Robert, Somya]
- Will work on Tuesday to work on execution plan de-serializer.
- Working on custom type aggregation support.
- Working on a quick "project" operator for initial query.
Compiling [Somya]
- Switched to Makefile instead of CMake.
- Created example of how compiler structure will look.
- Waiting on other efforts to implement final version.
11/7/2018
Worked on the plan for the first query.
Parser [Matt, Kevin]
- Write the cli.
- Support the first query in the parser.
- Decide who should own the resolver with Yannis.
- Create a parse tree example.
Optimizer [Yannis]
- Deserialize the input from the parser in a passthrough resolver class.
- Give an example of the output of the optimizer for the first query.
Execution Engine [Robert]
- Get ready for first query
- Worked on CMake investigation.
Compiling [Somya]
- Investigate CMake, and Makefile and compile the modules at once.
10/28/2018 Catalog [Matt, Kevin]
- Quickstep's typesystem is too complicated for a standalone parser? How can we make it simpler?
- Meet with Jianqiao and figure out how we can simplify the typesystem.
Execution Engine [Robert]
- Physical Plans can be executed.
- You can validate the result of a query execution with sqlite.
- This week: explore how to integrate predicates.
10/22/2018
Catalog, Execution Engine [Robert]
- Pushed v1 of the catalog to github
- Reorganized github
- Evaluate window queries from the start, look at TPC-DS look at this paper: http://www.vldb.org/pvldb/vol8/p702-tangwongsan.pdf
- Find the other paper: ...
- Cleared up the issue when C++ called Rust
- Next step: simple scheduler
Parser [Matt]
- Working with a POC in lemon, problems with huge query space for testing
- Automatically parse lemon rules and produce all possible sql commands up to a certain depth
- Python that parses the lemon output into a structure, useful to test
- Lemon rules can be easily exported to bison
- RAGS: test generation of queries for given workloads
General
- Quantum Resistant Cryptography, what does the space look like? What kind of properties do you get? Is oder maintained?
10/15/2018
Parser [Kevin, Matt]
- Met with Jianqiao.
- Worked on a Rust interface to call C++ function.
- Working on testing to cover SQLite parser.
- Test SQLite queries in Quickstep's parser and figure out what can be parsed.
- By thanksgiving: produce ASTs for the quickstep surface, test json serialization and cleanly separate the parser.
Catalog, Storage Manager [Robert]
- Catalog done by by the end of the week.
- Simple select query without predicates can run now on the storage manager.
- By thanksgiving run simple select * query (no aggregations).
Optimizer [Yannis]
- Working through the Quickstep's code. No new progress to report.
General
- Target: TPC-DS run on Hustle.
- Architecture Comments: Keep CC and resolver under a latch to avoid races. In the future we should push CC downstream.
CRISP Report
- Two key operations for ML applications, matrix multiplication (sparsity), transpose, matrix algebra.
- We should support matrix operations and we should start from read, write, transpose a matrix.
- CAPA project: Simple filter query on a key value (key: 8bytes, value: 100bytes) and see what the memory can do.
10/08/2018
General [Yannis]
- Proposed module design, flow and input and ouput of modules.
10/01/2018
Parser [Matt, Kevin]
- Decided to re-implement the parser.
- Examine if we can use Quickstep's parser.
- Use SQLite parser's test to verify our parser, start by successfully parsing the 7 million test queries.
Catalog, Hustle[Robert]
- Create a skeleton implementation (coordinate with Matt and Kevin) that creates and deletes a table.
- The Rust side of the catalog is almost done, run into a garbage collection problem. Arguments passed from C++ to Rust are garbage collected.
- Need different concurrency control for stats and schema.
- Catalog API will return a token that could be used to return the entire catalog or an id and the catalog module will store snapshots of the catalog.
Optimizer [Yannis]
- Quickstep's optimizer is a viable choice and we should use it.
General
- We need locks on views.
- Parser should create a AST and only do grammar validation.
- CC module will verify the columns and tables and grab the necessary locks.
09/24/2018
Storage Manager [Aarati]
- Looked into the storage manager and about global indices.
- Will write up and share the storage manager's design.
Parser [Matt, Kevin]
- Continuing work to isolate the parser by replacing calls to the code generator with calls to custom functions.
- Using SQLite's parser might be not as easy as we hopped. Should we build our own and use SQLite's SQLite's test suite to ensure with support the same surface?
- Take a couple of days to decide if we should modify SQLite's parser. Explore the contents of the parser's classes and evaluate how to add a new function to the parser.
- Worked on defining the schema with Robert.
[Robert]
- Did meetings to define the structure of the Catalog and the help with the Parser.
- Will work on the first implementation and definition of the API of the Catalog.
Optimizer [Yannis]
- Looked at Quickstep's optimizer as an alternative to starting from scratch.
- Evaluating Quickstep's optimizer, will work on separating Quickstep's optimizer from the parser and catalog.
General
- We should restrict the languages in order to not require heavy runtimes, we should use Rust, C, C++.
- Follow Rust's model for error handling, define custom error codes and handle then explicitly.