Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid type coercion when a dereferenced field is same type #3967

Merged
merged 1 commit into from
Aug 6, 2020

Conversation

laurachenyu
Copy link
Contributor

Got failure 'Unsupported coercion from int to int' because unnecessary type coercion was called when retrieving a dereferenced field without type changes.

@tooptoop4
Copy link
Contributor

@laurachenyu have you got a sample table definition and query that replicates the issue?

@phd3 phd3 self-requested a review June 8, 2020 21:48
@laurachenyu
Copy link
Contributor Author

@laurachenyu have you got a sample table definition and query that replicates the issue?

Following test will recreate the issue:

DROP TABLE IF EXISTS evolve_test;
CREATE TABLE evolve_test (dummy bigint, a row(b bigint, c varchar), d bigint) with (format ='orc', partitioned_by=array['d']);
INSERT INTO evolve_test values (1, row(1, 'abc'), 1);
ALTER TABLE evolve_test DROP COLUMN a;
ALTER TABLE evolve_test ADD COLUMN a row(b bigint, c varchar, f int);
INSERT INTO evolve_test values (2, row(2, 'def', 2), 2);
SELECT a.b FROM evolve_test;

@findepi
Copy link
Member

findepi commented Jun 9, 2020

@laurachenyu thanks for your PR!

please add a test. We should be able to test this with TestOrcReader (let us know if you need guidance).

If not, we could cover this with a product test (like TestAvroSchemaEvolution)

@phd3
Copy link
Member

phd3 commented Jun 9, 2020

@laurachenyu Thanks for fixing this issue!

It may be easier to add the suggested testcase here along with other dereference related schema mismatches: https://github.com/prestosql/presto/blob/master/presto-hive/src/test/java/io/prestosql/plugin/hive/TestHiveIntegrationSmokeTest.java#L4230

@laurachenyu
Copy link
Contributor Author

Thanks, @phd3.

@cla-bot
Copy link

cla-bot bot commented Jun 10, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Laura Chen.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@laurachenyu laurachenyu force-pushed the my_branch branch 2 times, most recently from cb3ed5b to 4d7380b Compare June 11, 2020 18:04
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a comment about the comment.

Failure is unrelated, tests've been retriggered.

@phd3 phd3 requested a review from martint June 11, 2020 18:59
@phd3
Copy link
Member

phd3 commented Jun 11, 2020

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jun 11, 2020
@cla-bot
Copy link

cla-bot bot commented Jun 11, 2020

The cla-bot has been summoned, and re-checked this pull request!

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we need a similar fix in HiveCoercionRecordCursor too. This RecordCursor does not create the coercers correctly (only assumes top level types). The issue is visible for TEXTFILE input format and s3_select_pushdown_enabled set to true.

The issue is relatively less likely to appear because (1) such a schema evolution itself might be rare and (2) HiveCoercionRecordCursor is not used for GenericHiveRecordCursor. (GenericHiveRecordCursor does the coercion for top-level row.) (3) Also, the CSV format does not face this issue because it only deals with VARCHAR.

@trinodb trinodb deleted a comment from cla-bot bot Jul 31, 2020
@trinodb trinodb deleted a comment from cla-bot bot Jul 31, 2020
@trinodb trinodb deleted a comment from cla-bot bot Jul 31, 2020
@ebyhr ebyhr added the bug Something isn't working label Jul 31, 2020
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you shorten the commit message as follows or something?

Avoid type coercion when a dereferenced field is same type

@phd3 Let's separate an issue because this PR itself is valuable and actually our environment is facing this type coercion error too.

@laurachenyu laurachenyu changed the title Avoid type coercion when retrieving a dereferenced field without type… Avoid type coercion when a dereferenced field is same type Aug 3, 2020
@ebyhr
Copy link
Member

ebyhr commented Aug 5, 2020

@laurachenyu My above comment meant a commit e617e50 (not PR title). Could you amend the commit message and do force-push?

@laurachenyu
Copy link
Contributor Author

@laurachenyu My above comment meant a commit e617e50 (not PR title). Could you amend the commit message and do force-push?

@ebyhr Thanks, just pushed with changed commit message.

@ebyhr ebyhr merged commit 039d3a6 into trinodb:master Aug 6, 2020
@ebyhr
Copy link
Member

ebyhr commented Aug 6, 2020

Merged, thanks!

@ebyhr ebyhr mentioned this pull request Aug 6, 2020
8 tasks
@ebyhr ebyhr added this to the 340 milestone May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

5 participants