Skip to content
Paul Rogers edited this page Feb 22, 2020 · 30 revisions

JSON Reader

Branches:

  • DRILL-6953-rev - Newest version
  • DRILL-6953 - Original version with a bunch of batch count fixes
  • DRILL-7572 - JSON Structure Parser PR (Done)
  • DRILL-7574 - Revised projection parser PR (Done)

Review comments suggest that the current approach needs adjustment. Maybe:

  • Pull out the core JSON parser - DRILL-7572.
  • Pull out the projection rework plus better projection tests - DRILL-7574.
  • Pull out the JSON reader itself (not the full format plugin).
  • Recreate the plugin on top of the above. Retest
  • Track down other bugs and fix them as separate PRs.
  • Finally, follow up with revised JSON format plugin.
   JSON Format Plugin
           |      Schema
           |     /
           v   L
       JSON Loader   JSON Projection
           |        /
           v      L
    JSON Structure Parser
           |
           v
   Jackson JSON Parser

Prior Plan

Tasks:

  • Make JSON reader reusable (request from Charles)
  • Fix issues reported in PR

Follow-on work:

  • Run all unit tests with new reader enabled
  • Refactor to allow use of JSON reader outside of JSON file scan
  • Provided-schema support
  • Review past two years of JSON changes
  • Switch JSON function to use this reader
  • Switch Kafka to use this reader

SPI

  • Refactor the plugin registry - DRILL-7590
  • Reshuffle plugin files
  • Fix secondary set of plugin registry issues
  • Refactor Plugin API: remove init(), replace context
  • Basic SPI outline, in tentative location, with registry integration, storage plugin only
  • Combine format plugins into core registry. Extension map.
  • Add format plugins to SPI
  • Add session vars to SPI

Prior work:

  • DRILL-7458: Base framework for storage plugins (Abandoned)

Filter Push-Down Framework

  • DRILL-7458: Base framework for storage plugins (Abandoned)
  • Review comments in PR
  • Revise design
  • Revise code & submit as new PR by itself

Revised Copier

Tasks:

  • PR: DRILL-7486: refactor reader creation - merged
  • PR: DRILL-TBD: refactor tests to use new schema builder
  • PR: DRILL-TBD: Bulk copy in RSL

On branch svr-exp3

  • Get copier to work with bulk copy. (Done)
  • Split copier tests into multiple files. (Done)
  • Bulk copy tests for structured data types. (Done)
  • PR for move of allocator into RSL options.
  • PR for restructure of reader creator and indexes.
  • PR for bulk copy feature in RSL
  • PR for copier

Current problem: TestCsvWithSchema.testBlankCols() fails with SV4 from sort. Likely problem is batch ownership. Maybe first move merging into sort?

Revised Mock

  • Revise, PR ColumnMetadata
  • Revise mock to keep its structure externally, use CMD internally
  • Revise to use Base structure

PR for general clean-up.

  • Branch: clean
  • Copy from copier branch
  • Copy from Sumo branch
  • Copy from mock branch
  • Copy from CountFix branch

Code Gen Refactoring

  • Refactor ExprTreeMaterializer to use schema, not vectors.
  • Project code gen unit tests
  • Unit tests for specific bits of code gen
  • Begin process of thinking how to incorporate column readers/writers

Schema Support

Series of moves to get scan ready for an planner-created schema.

  • CSV Reader
    • PR for revised schema handling
    • Remove internal conversions in favor of shims
    • Look into other readers
  • Refactor Column Metadata
    • Add wildcard column
    • Add untyped column
  • Modify scan framework to produce reader schema
    • Rework projection parser to generate a schema
    • Remove the projection set code in favor of schema
    • Refine how projection set/schema is presented in schema negotiator

Parquet reader

  • Review existing code, work out an approach

Data Model

  • Prepare a writeup, gathering recent comments.
  • Look into Java Object support.
  • Work out an evolution plan.

Docker/K8s

Wait for Abhishek.

Other

  • Remove OUT_OF_MEMORY result status (DRILL-7487) - Done
  • Remove STOP result status (DRILL-7583) - (Done)
    • Working branch: stop
    • DRILL-7507: Convert fragment interrupts to exceptions - Merged
    • DRILL-7506: Simplify code gen error handling - Merged
    • DRILL-7576: Other error handling
    • PR to remove STOP
    • PR to remove other items
  • Remove NOT_YET result status
  • Find fixed-size-block branch
  • Support for DESCRIBE to get schema from non-catalog tables
  • Support for data sets nested within a file-like or plugin-like object
  • Follow up on local directory paths. See #1987: DRILL-7589.

Branches

  • DRILL-6953-rev - PR for JSON reader - Abandoned for now
  • DRILL-7224
  • DRILL-7311
  • DRILL-7311-2
  • DRILL-7311-debug
  • DRILL-7333 - Abandoned, done via other PRs
  • DRILL-7333-orig - Obsolete?
  • DRILL-7439 - Abandoned, done via other PRs
  • DRILL-7447
  • DRILL-7456 - Merged
  • DRILL-7458 - Base framework PR - Abandoned
  • DRILL-7458-2
  • DRILL-7572 - JSON Structure Parser - Open PR
  • DRILL-7574 - Revised projection parser - Open PR
  • DRILL-7576 - Fail fast for operator errors - Open PR
  • DRILL-7583 - Remove STOP - Open PR
  • DRILL-7590 - Refactor plugin registry - Open PR
  • Dec10
  • Dec30
  • Dec30b
  • JavaObjRow - Quick & dirty Java object batch prototype
  • July14
  • June18
  • June6
  • Nov7
  • Nov7b
  • Nov7c
  • Oct19
  • Oct26
  • Oct29
  • RowSetRev4 - Probably obsolete
  • cg-test
  • cleanup-Dec1
  • error
  • error2
  • error3
  • lastSetFix
  • logrev
  • logrev-exp1
  • logrev-exp2
  • logrev-exp3
  • master
  • md-type
  • perf
  • shim - Text reader schema revision
  • svr-exp
  • svr-exp2
  • svr-exp3
  • vectorcheck

Obsolete Branches

  • DRILL-7306 - Merged
  • DRILL-7306-debug - Obsolete?
  • DRILL-7324 - Merged
  • DRILL-7327 - Merged
  • DRILL-7358 - Merged
  • DRILL-7377 - Merged
  • DRILL-7377x - Obsolete?
  • DRILL-7402 - Merged
  • DRILL-7403 - Merged
  • DRILL-7412 - Merged
  • DRILL-7413 - Merged
  • DRILL-7413x - Obsolete?
  • DRILL-7414 - Merged
  • DRILL-7424 - Merged
  • DRILL-7436 - Merged
  • DRILL-7441 - Merged
  • DRILL-7442 - Merged
  • DRILL-7445 - Merged
  • DRILL-7446 - Merged
  • DRILL-7476 - Merged
  • DRILL-7479 - Merged
  • DRILL-7486 - Merged
  • DRILL-7487 - Merged
  • DRILL-7502 - Merged
  • DRILL-7503 - Merged
  • DRILL-7506 - Merged
  • DRILL-7507 - Merged
  • stop - Work to retire STOP status
  • zCountFix - Draft of batch count fixes
  • zCountFix2 - Draft of batch count fixes
  • zCountFix3 - Draft of batch count fixes
  • DRILL-6951-1 - Probably obsolete; related to mock plugin
  • DRILL-6953 - Probably obsolete
  • DRILL-6953-2 - Probably obsolete
  • DRILL-6953-orig - Probably obsolete
  • DRILL-7293-orig
Clone this wiki locally