Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: COPY INTO GCS location seems to duplicate path #16304

Closed
1 of 2 tasks
rad-pat opened this issue Aug 21, 2024 · 6 comments
Closed
1 of 2 tasks

bug: COPY INTO GCS location seems to duplicate path #16304

rad-pat opened this issue Aug 21, 2024 · 6 comments
Assignees
Labels
C-bug Category: something isn't working

Comments

@rad-pat
Copy link

rad-pat commented Aug 21, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

v1.2.618-nightly

What's Wrong?

When issuing a COPY INTO command for GCS, the resulting path in GCS is duplicated

How to Reproduce?

CREATE table t1 (c1 int null);
INSERT INTO t1 values (1), (2), (3);

COPY INTO 'gcs://bucket/tables/t1'
CONNECTION = (
	CREDENTIAL = '<snip>'
)
FROM default.t1
FILE_FORMAT = (TYPE = PARQUET);

Looks in GCS, see that path is bucket/tables/t1/tables/t1

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@rad-pat rad-pat added the C-bug Category: something isn't working label Aug 21, 2024
@rad-pat
Copy link
Author

rad-pat commented Aug 22, 2024

So it seems that including a trailing slash on the end of the path makes it behave correctly. I can include the slash, but since it always exports one or many parquet files to the location, should it not be assumed that the location is always a path, or at least that /tables/ is the path and t1 is the file(?? for the one or many files)

Works correctly:

CREATE table t1 (c1 int null);
INSERT INTO t1 values (1), (2), (3);

COPY INTO 'gcs://bucket/tables/t1/'
CONNECTION = (
	CREDENTIAL = '<snip>'
)
FROM default.t1
FILE_FORMAT = (TYPE = PARQUET);

@youngsofun
Copy link
Member

@rad-pat thank you. it is bug.

@rad-pat
Copy link
Author

rad-pat commented Aug 30, 2024

@youngsofun , presume this is fixed now with #16321?

Was this affecting internal storage if GCS is used, or would that have remained unaffected?

@youngsofun
Copy link
Member

it should have been fixed, please have a try

@rad-pat
Copy link
Author

rad-pat commented Aug 30, 2024

Yes, seems fixed for COPY INTO, thanks. I just wondered if there was any effect to the parquet files stored by the system whilst this bug was happening?

@youngsofun
Copy link
Member

youngsofun commented Aug 31, 2024

The behavior of the bug is as follows:

If your location string does not end with a /, copying into bucket/<path> will result in bucket/<path>/<path>/<file_name_containing_uuid> instead of bucket/<path>/<file_name_containing_uuid>

While it’s unfortunate to make this mistake, I don’t think it’s a major issue in practice, especially if you are only using it for unloading. The additional <path>/ can be considered part of the randomly generated path created by Databend.

@rad-pat rad-pat closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants