Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: COPY INTO CPU load takes a long time to rise #8574

Closed
1 of 2 tasks
Tracked by #7823
yufan022 opened this issue Nov 1, 2022 · 7 comments · Fixed by #8586
Closed
1 of 2 tasks
Tracked by #7823

bug: COPY INTO CPU load takes a long time to rise #8574

yufan022 opened this issue Nov 1, 2022 · 7 comments · Fixed by #8586
Labels
C-bug Category: something isn't working

Comments

@yufan022
Copy link
Contributor

yufan022 commented Nov 1, 2022

Search before asking

  • I had searched in the issues and found no similar issues.

Version

v0.8.89-nightly

What's Wrong?

COPY INTO import FROM 's3://xx/xx/100000-100/' credentials=(aws_key_id='xx' aws_secret_key='xx') pattern ='.*[.]gz' file_format = (type = 'tsv' compression
= GZIP) force=true;
Query OK, 0 rows affected (2 min 22.85 sec)

The s3 path /100000-100/* only has 100 files.

At the beginning of the command execution, I observed a very low cpu load:
image

It took about 60 seconds for the CPU load to rise:
image

i checked info logs during the period of low CPU load.
image
got alot of this

2022-11-01T06:47:01.463339Z  INFO opendal::services::s3::backend: backend build started: Builder
2022-11-01T06:47:01.463371Z  INFO opendal::services::s3::backend: backend use root
2022-11-01T06:47:02.067443Z  INFO opendal::services::s3::backend: backend build finished: Builder
2022-11-01T06:47:01.463339Z  INFO opendal::services::s3::backend: backend build started: Builder
2022-11-01T06:47:01.463371Z  INFO opendal::services::s3::backend: backend use root
2022-11-01T06:47:02.067443Z  INFO opendal::services::s3::backend: backend build finished: Builder
2022-11-01T06:47:01.463339Z  INFO opendal::services::s3::backend: backend build started: Builder
2022-11-01T06:47:01.463371Z  INFO opendal::services::s3::backend: backend use root
2022-11-01T06:47:02.067443Z  INFO opendal::services::s3::backend: backend build finished: Builder
...

How to Reproduce?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@yufan022 yufan022 added the C-bug Category: something isn't working label Nov 1, 2022
@BohuTANG
Copy link
Member

BohuTANG commented Nov 1, 2022

Ping @Xuanwo for the first-glance troubleshooting. Many opendal::services::s3::backend logs is by design?

@Xuanwo
Copy link
Member

Xuanwo commented Nov 1, 2022

Ping @Xuanwo for the first-glance troubleshooting. Many opendal::services::s3::backend logs is by design?

Yes.

I'm guessing they are plenty of read (which is in debug) logs here. However, it's still worth to reuse the same operator to reduce a bit of cost.

@yufan022
Copy link
Contributor Author

yufan022 commented Nov 1, 2022

Ping @Xuanwo for the first-glance troubleshooting. Many opendal::services::s3::backend logs is by design?

Yes.

I'm guessing they are plenty of read (which is in debug) logs here. However, it's still worth to reuse the same operator to reduce a bit of cost.

Is it concurrent? The log will last for one minute, the speed is not fast, and there is not much network traffic during this period.

image

@Xuanwo
Copy link
Member

Xuanwo commented Nov 1, 2022

Is it concurrent? The log will last for one minute, the speed is not fast, and there is not much network traffic during this period.

Most of the time is consumed on parsing CSV content. We will improve it at #8486.

@BohuTANG

This comment was marked as off-topic.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 1, 2022

I found the root cause of this issue.

https://github.com/datafuselabs/databend/blob/2dfd578d234fe4ea1f65c432b60bca37ec8b4095/src/query/service/src/interpreters/interpreter_copy_v2.rs#L101-L181

This call requires all files to be stat which is not needed. I will improve this.

@BohuTANG
Copy link
Member

BohuTANG commented Nov 1, 2022

I will try to fix in #8586, cc @Xuanwo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants