Skip to content
Paul Rogers edited this page Apr 16, 2020 · 30 revisions

JSON Reader

  • Add link to jsonlines in various places.
  • Add image below to a doc file somewhere.
  • Reapply new JSON reader and test revisions. Run all tests. Resolve issues.
  • Ask for a full test suite run
  • Retrofit other uses of older JSON reader

Branches:

  • DRILL-6953-rev - Newest version
  • DRILL-6953 - Original version with a bunch of batch count fixes
  • DRILL-7572 - JSON Structure Parser PR (Done)
  • DRILL-7574 - Revised projection parser PR (Done)
  • DRILL-7633: Fixes for union and repeated list accessors (Open)
  • DRILL-7631: Updates to the Json Structure Parser
  • json - Working branch
  • DRILL-7601: Shift column conversion to reader from scan framework (Closed)
  • DRILL-7640: EVF-based JSON Loader

Review comments suggest that the current approach needs adjustment. Maybe:

   JSON Format Plugin
           |      Schema
           |     /
           v   L
       JSON Loader   JSON Projection
           |        /
           v      L
    JSON Structure Parser
           |
           v
   Jackson JSON Parser

SPI

  • Basic SPI outline, in tentative location, with registry integration, storage plugin only
  • Combine format plugins into core registry. Extension map.
  • Add format plugins to SPI
  • Add session vars to SPI

Storage Plugin Registry

  • Refactor the plugin registry - DRILL-7590, and follow-on fixes - Done
  • Fix format plugin immutability issues - DRILL-6168
  • Refactor storage plugins (remove init(), ensure immutability)
  • Decide on which plugin API changes are acceptable to third parties
  • Reshuffle plugin files
  • Fix secondary set of plugin registry issues - From lists of tasks
  • Avoid scan of all plugins at scan time
  • Avoid loading plugins at startup
  • Gracefully handle bad configs
  • Etc.

Filter Push-Down

  • Wait for REST reader
  • Apply current version to REST reader
  • Determine if some existing storage plugins can be retrofitted
  • Write up documentation somewhere

Prior work:

  • DRILL-7458: Base framework for storage plugins (Abandoned)

Schema Support

  • First PR: DRILL-7696
  • PR for scan framework
  • Retrofit CSV reader.

Series of moves to get scan ready for an planner-created schema.

  • CSV Reader
    • PR for revised schema handling
    • Remove internal conversions in favor of shims
  • Look into other readers
  • Refactor Column Metadata
    • Add wildcard column
    • Add untyped column
  • Modify scan framework to produce reader schema
    • Rework projection parser to generate a schema
    • Remove the projection set code in favor of schema
    • Refine how projection set/schema is presented in schema negotiator

Other Issues

  • JDBC DataSource Issue - From mail list
  • Parquet Issue - From mail list/Slack

Revised JDBC Driver

  • Review current driver. Where is boundary between JDBC and Drill client?
  • Update wire format from Jig.
  • Review Avatica for its JSON wire format.
  • Review Hive's client. Usable in this context?
  • Resurrect Jig serializer, deserializers; update for column accessors
  • Create a JDBC2

Revised Copier

Tasks:

  • PR: DRILL-7486: refactor reader creation - merged
  • PR: DRILL-TBD: refactor tests to use new schema builder
  • PR: DRILL-TBD: Bulk copy in RSL

On branch svr-exp3

  • Get copier to work with bulk copy. (Done)
  • Split copier tests into multiple files. (Done)
  • Bulk copy tests for structured data types. (Done)
  • PR for move of allocator into RSL options.
  • PR for restructure of reader creator and indexes.
  • PR for bulk copy feature in RSL
  • PR for copier

Current problem: TestCsvWithSchema.testBlankCols() fails with SV4 from sort. Likely problem is batch ownership. Maybe first move merging into sort?

Revised Mock

  • Revise, PR ColumnMetadata
  • Revise mock to keep its structure externally, use CMD internally
  • Revise to use Base structure

Code Gen Refactoring

  • Refactor ExprTreeMaterializer to use schema, not vectors.
  • Project code gen unit tests
  • Unit tests for specific bits of code gen
  • Begin process of thinking how to incorporate column readers/writers

Parquet reader

  • Review existing code, work out an approach

Data Model

  • Prepare a writeup, gathering recent comments.
  • Look into Java Object support.
  • Work out an evolution plan.

Docker/K8s

Wait for Abhishek.

Other

  • Remove NOT_YET result status
  • Find fixed-size-block branch
  • Support for DESCRIBE to get schema from non-catalog tables
  • Support for data sets nested within a file-like or plugin-like object
  • Follow up on local directory paths. See #1987: DRILL-7589.
  • Retire unused data, vector types (MONEY, obsolete DECIMAL, etc.)
  • Some plan for the problem-child data types (repeated list, etc.)
  • Easier to work with files
    • Command to create a workspace (not just edit JSON)
    • CREATE TABLE x STORED AS y AS - or whatever the Hive/Impala syntax is
    • Allow overwrite in CTAS
    • Don't store the CRC on the local file system
    • CTAS emits a CSVH file, but names it CSV, so can't easily reread.

Branches

  • DRILL-6953-rev - PR for JSON reader - Abandoned for now
  • DRILL-7224
  • DRILL-7311
  • DRILL-7311-2
  • DRILL-7311-debug
  • DRILL-7333 - Abandoned, done via other PRs
  • DRILL-7333-orig - Obsolete?
  • DRILL-7439 - Abandoned, done via other PRs
  • DRILL-7447
  • DRILL-7456 - Merged
  • DRILL-7458 - Base framework PR - Abandoned
  • DRILL-7458-2
  • DRILL-7572 - JSON Structure Parser - Open PR
  • DRILL-7620: Fix plugin mutability issues
  • DRILL-7631: Updates to the Json Structure Parser
  • DRILL-7640: EVF-based JSON Loader
  • Dec10
  • Dec30
  • Dec30b
  • JavaObjRow - Quick & dirty Java object batch prototype
  • July14
  • June18
  • June6
  • Nov7
  • Nov7b
  • Nov7c
  • Oct19
  • Oct26
  • Oct29
  • RowSetRev4 - Probably obsolete
  • cg-test
  • cleanup-Dec1
  • error
  • error2
  • error3
  • json - working Json reader branch
  • lastSetFix
  • logrev
  • logrev-exp1
  • logrev-exp2
  • logrev-exp3
  • master
  • md-type
  • perf
  • shim - Text reader schema revision
  • svr-exp
  • svr-exp2
  • svr-exp3
  • vectorcheck

Obsolete Branches

  • DRILL-7306 - Merged
  • DRILL-7306-debug - Obsolete?
  • DRILL-7324 - Merged
  • DRILL-7327 - Merged
  • DRILL-7358 - Merged
  • DRILL-7377 - Merged
  • DRILL-7377x - Obsolete?
  • DRILL-7402 - Merged
  • DRILL-7403 - Merged
  • DRILL-7412 - Merged
  • DRILL-7413 - Merged
  • DRILL-7413x - Obsolete?
  • DRILL-7414 - Merged
  • DRILL-7424 - Merged
  • DRILL-7436 - Merged
  • DRILL-7441 - Merged
  • DRILL-7442 - Merged
  • DRILL-7445 - Merged
  • DRILL-7446 - Merged
  • DRILL-7476 - Merged
  • DRILL-7479 - Merged
  • DRILL-7486 - Merged
  • DRILL-7487 - Merged
  • DRILL-7502 - Merged
  • DRILL-7503 - Merged
  • DRILL-7506 - Merged
  • DRILL-7507 - Merged
  • DRILL-7574 - Revised projection parser - Merged
  • DRILL-7576 - Fail fast for operator errors - Merged
  • DRILL-7583 - Remove STOP - Merged
  • DRILL-7590: Refactor plugin registry - Merged
  • DRILL-7601: Shift column conversion to reader from scan framework - Merged
  • DRILL-7617: Disabled plugins not showing in Web UI - Merged
  • DRILL-7632: Improve user exception formatting - Merged
  • DRILL-7633: Fixes for union and repeated list accessors
  • DRILL-7634: Rollup of code cleanup changes
  • stop - Work to retire STOP status
  • zCountFix - Draft of batch count fixes
  • zCountFix2 - Draft of batch count fixes
  • zCountFix3 - Draft of batch count fixes
  • DRILL-6951-1 - Probably obsolete; related to mock plugin
  • DRILL-6953 - Probably obsolete
  • DRILL-6953-2 - Probably obsolete
  • DRILL-6953-orig - Probably obsolete
  • DRILL-7293-orig
Clone this wiki locally