Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC is failing with external Redis #18900

Closed
pavel-z1 opened this issue Jul 6, 2023 · 10 comments
Closed

GC is failing with external Redis #18900

pavel-z1 opened this issue Jul 6, 2023 · 10 comments
Assignees

Comments

@pavel-z1
Copy link

pavel-z1 commented Jul 6, 2023

I have the same issue with harbor v2.8.2 that described here #14922
[ERROR] [/jobservice/job/impl/gc/garbage_collection.go:434]: failed to clean registry cache error retrieving 'blobs::*' keys: dial tcp: lookup redis on 127.0.0.11:53: no such host, pattern blobs::*

We use external Redis Sentinel instance.
I've changed GC Shedule but this does not solved error.
Is there any solution to this problem?

@pavel-z1 pavel-z1 changed the title GC is failing with eternal Redis GC is failing with external Redis Jul 6, 2023
@MinerYang
Copy link
Contributor

Could you please share the external redis config (harbor.yml) as well as more jobservice log?

@pavel-z1
Copy link
Author

pavel-z1 commented Jul 6, 2023

Redis config from harbor.yml

external_redis:
  host: 192.168.0.125:26379,192.168.0.167:26379,192.168.1.14:26379
  sentinel_master_set: mymaster
  password: **************
  registry_db_index: 1
  jobservice_db_index: 2
  trivy_db_index: 5
  idle_timeout_seconds: 30

Full GC job log:

2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:143]: Garbage Collection parameters: [delete_untagged: true, dry_run: false, time_window: 2]
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:153]: start to run gc in job.
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:467]: start to delete untagged artifact (no actually deletion for dry-run mode)
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:491]: end to delete untagged artifact (no actually deletion for dry-run mode)
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:506]: artifact trash candidates.
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-400 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-sampleproject/sampleservice Digest-sha256:56a8459484d5d10a7f4c3446f2744a9a88f084cd852573712202420f9b4a64c9 CreationTime-2023-03-02 00:00:01
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-363 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-frontend_android/android-30-nodejs-12 Digest-sha256:f20ebf9c298f3ed4e6cab4561d698bfafa3e7aef73b8c15fe227dcab3cdecef9 CreationTime-2023-01-18 16:47:46
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-364 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-frontend_android/node Digest-sha256:f20ebf9c298f3ed4e6cab4561d698bfafa3e7aef73b8c15fe227dcab3cdecef9 CreationTime-2023-01-18 16:47:46
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-612 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-uc-panel/graphql_uc_back Digest-sha256:935f486253b984abf114befd20060951516478d4d789283f657a1ecd5c69bc49 CreationTime-2023-06-25 00:00:03
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-613 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-uc-panel/graphql_uc_back Digest-sha256:02efd2b8b5b013a0b31bee2f161d59c69727c421738c6bca7618fcd2e38f3270 CreationTime-2023-06-25 00:00:03
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-372 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-frontend_android_a/android-30-nodejs-12 Digest-sha256:f20ebf9c298f3ed4e6cab4561d698bfafa3e7aef73b8c15fe227dcab3cdecef9 CreationTime-2023-01-19 08:50:07
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-373 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-frontend_android_n/node Digest-sha256:f20ebf9c298f3ed4e6cab4561d698bfafa3e7aef73b8c15fe227dcab3cdecef9 CreationTime-2023-01-19 08:50:20
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-399 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-sampleproject/sampleservice Digest-sha256:29980ccb66261754331d733346a7d48c72693f146a06eb4ed919089a1a19661e CreationTime-2023-03-02 00:00:01
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:508]: ID-95 MediaType-application/vnd.docker.container.image.v1+json ManifestMediaType-application/vnd.docker.distribution.manifest.v2+json RepositoryName-user-file-processor/user-file-processor Digest-sha256:a12ecbbcc81dd8bc0d11308e4288b359b02062ffc73de6cc70794f0f83ac80ce CreationTime-2022-02-21 15:58:44
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:214]: no need to execute GC as there is no non referenced artifacts.
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:403]: 0 blobs and 0 manifests are actually deleted
2023-07-06T07:15:15Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:404]: The GC job actual frees up 0 MB space.
2023-07-06T07:15:15Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:434]: failed to clean registry cache error retrieving 'blobs::*' keys: dial tcp: lookup redis on 127.0.0.11:53: no such host, pattern blobs::*

@zyyw zyyw added the area/gc label Jul 7, 2023
@zyyw
Copy link
Contributor

zyyw commented Jul 7, 2023

Hi @pavel-z1 , thanks for reporting this issue. Three things:

  1. please make sure there is no network connectivity issue between your jobservice container and the external redis endpoint stated here:
external_redis:
  host: 192.168.0.125:26379,192.168.0.167:26379,192.168.1.14:26379
  1. according to this issue:
select * from schedule where vendor_type='GARBAGE_COLLECTION';
  1. If you don't mind, please share the config.yml file of jobservice, located here common/config/jobservice/config.yml.

@zyyw zyyw self-assigned this Jul 7, 2023
@pavel-z1
Copy link
Author

pavel-z1 commented Jul 8, 2023

Hi @zyyw

SQL Query result:

# select * from schedule where vendor_type='GARBAGE_COLLECTION';
 id |       creation_time        |        update_time         |    vendor_type     | vendor_id |    cron     | callback_func_name |                                                                                                                            callback_func_par
am                                                                                                                             | cron_type |       extra_attrs        |  revision
----+----------------------------+----------------------------+--------------------+-----------+-------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------+-----------+--------------------------+------------
 15 | 2023-07-06 07:15:14.033293 | 2023-07-06 07:15:14.033293 | GARBAGE_COLLECTION |        -1 | 0 0 0 * * * | GARBAGE_COLLECTION | {"trigger":null,"deleteuntagged":true,"dryrun":false,"extra_attrs":{"delete_untagged":true,"dry_run":false,"redis_url_reg":"redis+sentinel:/
/:***********@192.168.0.125:26379,192.168.0.167:26379,192.168.1.14:26379/mymaster/1?idle_timeout_seconds=30","time_window":2}} | Daily     | {"delete_untagged":true} | 1688774400
(1 row)

Jobservice config.yml

# cat common/config/jobservice/config.yml
---
#Protocol used to serve
protocol: "http"

#Server listening port
port: 8080

#Worker pool
worker_pool:
  #Worker concurrency
  workers: 10
  backend: "redis"
  #Additional config if use 'redis' backend
  redis_pool:
    #redis://[arbitrary_username:password@]ipaddress:port/database_index
    redis_url: redis+sentinel://:********@192.168.0.125:26379,192.168.0.167:26379,192.168.1.14:26379/mymaster/2?idle_timeout_seconds=30
    namespace: "harbor_job_service_namespace"
    idle_timeout_second: 3600
#Loggers for the running job
job_loggers:
  - name: "STD_OUTPUT" # logger backend name, only support "FILE" and "STD_OUTPUT"
    level: "DEBUG" # INFO/DEBUG/WARNING/ERROR/FATAL
  - name: "FILE"
    level: "DEBUG"
    settings: # Customized settings of logger
      base_dir: "/var/log/jobs"
    sweeper:
      duration: 1 #days
      settings: # Customized settings of sweeper
        work_dir: "/var/log/jobs"

#Loggers for the job service
loggers:
  - name: "STD_OUTPUT" # Same with above
    level: "DEBUG"


reaper:
  # the max time to wait for a task to finish, if unfinished after max_update_hours, the task will be mark as error, but the task will continue to run, default value is 24,
  max_update_hours: 24
  # the max time for execution in running state without new task created
  max_dangling_hours: 168

# the max size of job log returned by API, default is 10M
max_retrieve_size_mb: 10

I have no network issues.
Thank you for help

@zyyw
Copy link
Contributor

zyyw commented Jul 10, 2023

Could you please try to ping one of the redis 192.168.0.125:26379,192.168.0.167:26379,192.168.1.14:26379 from jobservice container?
According to the log, it seems like there is connectivity issue between your jobservice container and external redis:

dial tcp: lookup redis on 127.0.0.11:53: no such host

@pavel-z1
Copy link
Author

Ping from docker container is fine to all Redis instances
Example one of them:

ping 192.168.0.125
Ping 192.168.0.125 (192.168.0.125): 56(84) bytes.
64 bytes from 192.168.0.125: icmp_seq=1 ttl=58 time=4 ms
64 bytes from 192.168.0.125: icmp_seq=2 ttl=58 time=3 ms
64 bytes from 192.168.0.125: icmp_seq=3 ttl=58 time=4 ms

--- 192.168.0.125 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss

Our initial installation of harbor was with local redis container.
After that we changed configuration with command "./install.sh --with-trivy". Redis service moved to external hosts.
As I correctly understand docker, this error dial tcp: lookup redis on 127.0.0.11:53: no such host indicate that docker container try to resolve redis IP via shortname redis from the custom docker harbor network bridge DNS service.

It can't do it because there is no container with name redis, therefore dns request failed.
How to configure harbor so that it does not try to do dns resolve by shortname redis?

@pavel-z1
Copy link
Author

Is there any workaround for this problem?
GC hangs on this error, after which it is impossible to perform any retention operation.

@github-actions
Copy link

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Sep 23, 2023
@MinerYang MinerYang removed the Stale label Sep 26, 2023
Copy link

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Nov 25, 2023
Copy link

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2023
@github-project-automation github-project-automation bot moved this from Issues to Completed in GC Improvement Activities Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Completed
Development

No branches or pull requests

3 participants