Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After the configuration of full sense webhook, redis memory Skyrocketing #14541

Closed
ruster-cn opened this issue Mar 30, 2021 · 8 comments
Closed

Comments

@ruster-cn
Copy link

When I configured webhook for all projects, there was a problem of rising memory in redis.

At the same time, the jobservice log keeps reporting errors:

Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/jobservice/worker/cworker/c_worker.go:76]: Job incoming: {"name":"WEBHOOK","id":"852bf961b5e170e5539806f0","t":1617076864,"args":null,"fails":10,"err":"open /var/log/jobs/852bf961b5e170e5539806f0.log: no space left on device","failed_at":1617066530}
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/jobservice/runner/redis.go:175]: Retrying job WEBHOOK:852bf961b5e170e5539806f0, revision: 1617076864
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/common/config/store/driver/rest.go:31]: get configuration from url: http://core:8080/api/internal/configurations
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/common/config/store/driver/rest.go:31]: get configuration from url: http://core:8080/api/internal/configurations
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [ERROR] [/jobservice/runner/redis.go:111]: Job 'WEBHOOK:852bf961b5e170e5539806f0' exit with error: open /var/log/jobs/852bf961b5e170e5539806f0.log: no space left on device

After looking at the relevant code, I suspect that the job server may have generated a log file for each task, a large number of files, which exhausted inode and led to webhook sending failure. The failed webhook needs to be put into the redis queue to wait for retrying (default retrying 50 times).

@reasonerjt
Copy link
Contributor

@rust-cn
How many projects are there in your Harbor instance?

@ruster-cn
Copy link
Author

@rust-cn
How many projects are there in your Harbor instance?

2000+ project

@steven-zou
Copy link
Contributor

@ruster-cn

All kinds of webhooks are enabled?

Could you please try to connect to the Redis for checking how many jobs are queued in the Redis?

Connect to redis and check the data under the key {harbor_job_service_namespace}:jobs:WEBHOOK.

@ruster-cn
Copy link
Author

@steven-zou
yes, We turn on all kinds of webhooks。 sorry,Now there is no way to check the data. When there is a problem, we turn off all webhooks and redis memory begins to decline slowly。

@ruster-cn
Copy link
Author

ruster-cn commented Apr 8, 2021

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

@steven-zou
Copy link
Contributor

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

@ruster-cn

Writing logs of the task into a separate log file is easy for the user to check the task log from the API/portal. Actually, jobservice has a log file sweeper, you need to check what's the duration to trigger the sweeping process. Maybe your duration is a little too long?

@ruster-cn
Copy link
Author

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

@ruster-cn

Writing logs of the task into a separate log file is easy for the user to check the task log from the API/portal. Actually, jobservice has a log file sweeper, you need to check what's the duration to trigger the sweeping process. Maybe your duration is a little too long?

thanks, I'll try to adjust the log retention time。

@steven-zou
Copy link
Contributor

Close the issue now and reopen it if you have further comments then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants