Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg: Add support for user specified target cols in IMPORT INTO #39023

Merged
merged 3 commits into from
Jul 29, 2019

Conversation

adityamaru27
Copy link
Contributor

This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source.
(Part of the larger roadmap #26834)

@adityamaru27 adityamaru27 requested review from dt and a team July 22, 2019 16:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru27 adityamaru27 force-pushed the add-target-cols-to-import-into branch from 3d73368 to 8a49379 Compare July 22, 2019 17:00
adityamaru27 added a commit to adityamaru27/cockroach that referenced this pull request Jul 22, 2019
This is an add on to cockroachdb#39023. It adds the grammar required to support
queries such as:
`IMPORT INTO test CSV DATA ('...');`

No target columns implies that all columns are imported into from the
data sources.

Release note: None
@adityamaru27 adityamaru27 force-pushed the add-target-cols-to-import-into branch from 8a49379 to adabb7b Compare July 22, 2019 19:36
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 11 of 12 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)


pkg/ccl/importccl/import_stmt.go, line 440 at r2 (raw file):

				intoCols = append(intoCols, name.String())
			}
			tableDetails = []jobspb.ImportDetails_Table{{Desc: &importing, IsNew: false, TargetCols: intoCols}}

should we be checking that the named cols are actually valid here?

I think we also want to validate that every column is not targetted has a default if we support filling defaults or has an implicit null default and is nullable, otherwise there is no point in starting the import, right?


pkg/ccl/importccl/import_stmt.go, line 631 at r2 (raw file):

	parentID sqlbase.ID,
	tables map[string]*sqlbase.TableDescriptor,
	targetCols map[string]*distsqlpb.ReadImportDataSpec_TargetColList,

minor nit here/throughout: as a reader, I had a little trouble with this type and knowing what it was/how to read it at first glance. I think I managed to figure it out but the parts I struggled with were what are the keys (string could be anything) and then, is it the same as the tables map right above it? if so, should we just be changing that to contain structs? if not, when/why is it different? also, if *distsqlpb.ReadImportDataSpec_TargetColList is just a long-winded []string maybe we could use that for internal calls and just wrap it in the proto message when building the spec proto.

Anyway, this is all just personal preference / style opinion, and I realize you've spent more time in here figuring what you need to plumb though than I have, so I'm fine with it as-is if you are.


pkg/sql/row/row_converter.go, line 231 at r3 (raw file):

	var txCtx transform.ExprTransformContext
	// Although we don't yet support DEFAULT expressions on visible columns,

can you update this comment to clarify how targeting fewer columns works w.r.t. defaults?

I think I gather that we don't support DEFAULT yet (we should for non-targeted cols) but what does that mean?

@adityamaru27 adityamaru27 force-pushed the add-target-cols-to-import-into branch from adabb7b to 00a8a2e Compare July 26, 2019 17:40
Copy link
Contributor Author

@adityamaru27 adityamaru27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt)


pkg/ccl/importccl/import_stmt.go, line 440 at r2 (raw file):

Previously, dt (David Taylor) wrote…

should we be checking that the named cols are actually valid here?

I think we also want to validate that every column is not targetted has a default if we support filling defaults or has an implicit null default and is nullable, otherwise there is no point in starting the import, right?

Done. Added a validity check for named cols, also added a check to disallow DEFAULT expressions and non-nullable non-target columns.


pkg/ccl/importccl/import_stmt.go, line 631 at r2 (raw file):

Previously, dt (David Taylor) wrote…

minor nit here/throughout: as a reader, I had a little trouble with this type and knowing what it was/how to read it at first glance. I think I managed to figure it out but the parts I struggled with were what are the keys (string could be anything) and then, is it the same as the tables map right above it? if so, should we just be changing that to contain structs? if not, when/why is it different? also, if *distsqlpb.ReadImportDataSpec_TargetColList is just a long-winded []string maybe we could use that for internal calls and just wrap it in the proto message when building the spec proto.

Anyway, this is all just personal preference / style opinion, and I realize you've spent more time in here figuring what you need to plumb though than I have, so I'm fine with it as-is if you are.

Thanks for this suggestion, the map tables and targetCols did have a 1:1 relation so I just made tables have structs instead. This allowed us to avoid plumbing the TargetColList separately and made it cleaner.


pkg/sql/row/row_converter.go, line 231 at r3 (raw file):

Previously, dt (David Taylor) wrote…

can you update this comment to clarify how targeting fewer columns works w.r.t. defaults?

I think I gather that we don't support DEFAULT yet (we should for non-targeted cols) but what does that mean?

Done.

@adityamaru27
Copy link
Contributor Author

@dt RFAL. Addressed comments and added some more tests to ensure we do not support DEFAULT expression, or non-nullable non-target columns.

This change adds the proto definitions required to
plumb information about the target columns to be imported into,
through the import job and ReadImport processor.

Release note: None
This change adds the ability for users to specify a subset
of the columns of an existing table to import into, from a
CSV data source.

TODO: Add the grammar which supports IMPORT INTO without any
target columns specified, in which case we import data for all
columns.

Release note: None
This change adds unit tests for importing into an existing table while
specifying a subset of all visible columns as targets.

Release note: None
@adityamaru27 adityamaru27 force-pushed the add-target-cols-to-import-into branch from 00a8a2e to 91aa7c4 Compare July 29, 2019 13:44
@adityamaru27
Copy link
Contributor Author

TFTR!
bors r+

craig bot pushed a commit that referenced this pull request Jul 29, 2019
39023: pkg: Add support for user specified target cols in IMPORT INTO r=adityamaru27 a=adityamaru27

This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. 
(Part of the larger roadmap #26834)

39043: distsqlrun: lookup join efficiency improvements r=jordanlewis a=jordanlewis

- distsqlrun: don't save lookup rows twice
- distsqlrun: stream lookup join output
- distsql: remove unused field from lookup join spec

Previously, lookup join buffered its rendered output rows before
emitting any of them, because this used to be a requirement when lookup
join was also responsible for doing a second layer of lookups against an
index. This was no longer necessary.

Now, after the result of a lookup batch is accumulated into memory, rows
are rendered and emitted in a row-at-a-time streaming fashion.

Also, remove the unused extra index filter expression field from the lookup join spec.

39118: opt: add IndexOrdinal alias r=RaduBerinde a=RaduBerinde

Adding a `cat.IndexOrdinal` alias to make it more explicit when a
value is an index ordinal. It is only an alias because a separate type
would make things like loops more annoying.

Release note: None

39146: sql/tree: store From as value in SelectClause AST node r=nvanbenschoten a=nvanbenschoten

Drive by cleanup. The value is assumed to be non-nil in a number of
places, so we should avoid the unnecessary indirection.

Release note: None

Co-authored-by: Aditya Maru <adityamaru@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jul 29, 2019

Build succeeded

@craig craig bot merged commit 91aa7c4 into cockroachdb:master Jul 29, 2019
adityamaru27 added a commit to adityamaru27/cockroach that referenced this pull request Jul 29, 2019
This is an add on to cockroachdb#39023. It adds the grammar required to support
queries such as:
`IMPORT INTO test CSV DATA ('...');`

No target columns implies that all columns are imported into from the
data sources.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants