After the configuration of full sense webhook, redis memory Skyrocketing #14541

ruster-cn · 2021-03-30T06:04:00Z

When I configured webhook for all projects, there was a problem of rising memory in redis.

At the same time, the jobservice log keeps reporting errors：

Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/jobservice/worker/cworker/c_worker.go:76]: Job incoming: {"name":"WEBHOOK","id":"852bf961b5e170e5539806f0","t":1617076864,"args":null,"fails":10,"err":"open /var/log/jobs/852bf961b5e170e5539806f0.log: no space left on device","failed_at":1617066530}
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/jobservice/runner/redis.go:175]: Retrying job WEBHOOK:852bf961b5e170e5539806f0, revision: 1617076864
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/common/config/store/driver/rest.go:31]: get configuration from url: http://core:8080/api/internal/configurations
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [INFO] [/common/config/store/driver/rest.go:31]: get configuration from url: http://core:8080/api/internal/configurations
Mar 30 12:01:04 172.20.0.1 jobservice[41446]: 2021-03-30T04:01:04Z [ERROR] [/jobservice/runner/redis.go:111]: Job 'WEBHOOK:852bf961b5e170e5539806f0' exit with error: open /var/log/jobs/852bf961b5e170e5539806f0.log: no space left on device

After looking at the relevant code, I suspect that the job server may have generated a log file for each task, a large number of files, which exhausted inode and led to webhook sending failure. The failed webhook needs to be put into the redis queue to wait for retrying (default retrying 50 times).

The text was updated successfully, but these errors were encountered:

reasonerjt · 2021-03-30T08:51:23Z

@rust-cn
How many projects are there in your Harbor instance?

ruster-cn · 2021-03-30T09:18:57Z

@rust-cn
How many projects are there in your Harbor instance?

2000+ project

steven-zou · 2021-04-07T06:39:08Z

@ruster-cn

All kinds of webhooks are enabled?

Could you please try to connect to the Redis for checking how many jobs are queued in the Redis?

Connect to redis and check the data under the key {harbor_job_service_namespace}:jobs:WEBHOOK.

ruster-cn · 2021-04-07T10:50:25Z

@steven-zou
yes, We turn on all kinds of webhooks。 sorry，Now there is no way to check the data. When there is a problem, we turn off all webhooks and redis memory begins to decline slowly。

ruster-cn · 2021-04-08T03:09:24Z

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

steven-zou · 2021-05-21T02:06:16Z

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

@ruster-cn

Writing logs of the task into a separate log file is easy for the user to check the task log from the API/portal. Actually, jobservice has a log file sweeper, you need to check what's the duration to trigger the sweeping process. Maybe your duration is a little too long?

ruster-cn · 2021-05-28T10:08:10Z

@steven-zou
When the number of job service tasks is increasing rapidly, it is a hidden danger that job service generates too many small files. I looked at the code and found that this log only records the log of each task. Can we consider writing logs to a file. The problem can also be solved by using the log format of [jobid]-[content]. I think the effect is the same. But if write a file, need to consider the file cutting and cleaning mechanism.

@ruster-cn

Writing logs of the task into a separate log file is easy for the user to check the task log from the API/portal. Actually, jobservice has a log file sweeper, you need to check what's the duration to trigger the sweeping process. Maybe your duration is a little too long?

thanks, I'll try to adjust the log retention time。

steven-zou · 2021-11-08T02:50:27Z

Close the issue now and reopen it if you have further comments then.

reasonerjt assigned steven-zou Mar 30, 2021

reasonerjt added area/webhook area/job-services labels Mar 30, 2021

reasonerjt added needs/reproduction area/performance labels Mar 30, 2021

steven-zou added the waiting-for-verify label Jun 17, 2021

steven-zou closed this as completed Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After the configuration of full sense webhook, redis memory Skyrocketing #14541

After the configuration of full sense webhook, redis memory Skyrocketing #14541

ruster-cn commented Mar 30, 2021

reasonerjt commented Mar 30, 2021

ruster-cn commented Mar 30, 2021

steven-zou commented Apr 7, 2021

ruster-cn commented Apr 7, 2021

ruster-cn commented Apr 8, 2021 •

edited

Loading

steven-zou commented May 21, 2021

ruster-cn commented May 28, 2021

steven-zou commented Nov 8, 2021

After the configuration of full sense webhook, redis memory Skyrocketing #14541

After the configuration of full sense webhook, redis memory Skyrocketing #14541

Comments

ruster-cn commented Mar 30, 2021

reasonerjt commented Mar 30, 2021

ruster-cn commented Mar 30, 2021

steven-zou commented Apr 7, 2021

ruster-cn commented Apr 7, 2021

ruster-cn commented Apr 8, 2021 • edited Loading

steven-zou commented May 21, 2021

ruster-cn commented May 28, 2021

steven-zou commented Nov 8, 2021

ruster-cn commented Apr 8, 2021 •

edited

Loading