Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Image Rescanning using Harbor Webhook API #3116

Open
wants to merge 105 commits into
base: topic/11-13-feat_implement_management_api_for_controlling_harbor_per-project_quota
Choose a base branch
from

Conversation

jopemachine
Copy link
Member

@jopemachine jopemachine commented Nov 19, 2024

Resolves #1913 (BA-143).

Important

Currently, the Image Rescanning through Webhook API implemented in this PR is only available for the HarborV2 registry.

Details

  • Since the Harbor Webhook's request header cannot be customized, it was implemented in a way that bypasses the auth_middleware. It would be great to generalize this approach or find a better solution in the following works.

Implementation details

Events to Monitor

We will need to pay attention to the following events to ensure the image table is updated automatically.

Events to ignore

We won't need to care about the following events in this PR.

  • PULL_ARTIFACT
  • UPLOAD_CHART
  • DOWNLOAD_CHART
  • DELETE_CHART
  • SCANNING_COMPLETED
  • SCANNING_FAILED
  • QUOTA_EXCEED
  • QUOTA_WARNING
  • REPLICATION

How to setup webhook

  1. Configure registry webhook referring to Harbor webhook API documentation.

webhooks2

  1. Add auth header configured in step 1 as webhook_auth_header to the ContainerRegistry row's extra column. Event handler is executed only when the webhook_auth_header equals the auth header value set in the webhook policy. If the auth header in the webhook policy is empty, the event handler will still be executed even if the webhook_auth_header is not set.

Tip

We don't need to subscribe to events for which event handlers are not implemented.


Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue

📚 Documentation preview 📚: https://sorna--3116.org.readthedocs.build/en/3116/


📚 Documentation preview 📚: https://sorna-ko--3116.org.readthedocs.build/ko/3116/

Copy link
Member Author

jopemachine commented Nov 19, 2024

@jopemachine jopemachine added this to the 24.12 milestone Nov 19, 2024
@jopemachine jopemachine added the type:feature Add new features label Nov 19, 2024
@jopemachine jopemachine force-pushed the topic/11-19-feat_implement_image_rescan_based_on_harbor_webhook branch 2 times, most recently from bb8a301 to 8b4b017 Compare November 19, 2024 06:17
@jopemachine jopemachine changed the title feat: Implement image rescan based on Harbor webhook feat: Implement image rescan using Harbor webhook API Nov 19, 2024
@jopemachine jopemachine changed the title feat: Implement image rescan using Harbor webhook API feat: Implement image rescan using Harbor Webhook API Nov 19, 2024
@jopemachine jopemachine linked an issue Nov 19, 2024 that may be closed by this pull request
@jopemachine
Copy link
Member Author

I tried to implement logic to remove the ImageRow upon receiving a DELETE_ARTIFACT event, but the webhook event does not include the architecture information of the image.

In the case of the PUSH_ARTIFACT event, the issue doesn't matter because the architecture information can be scanned via the Harbor artifact API before adding the ImageRow.

However, for DELETE_ARTIFACT, the event occurs after the image has already been deleted, so there is no way to retrieve the architecture information.

@jopemachine jopemachine changed the title feat: Implement image rescan using Harbor Webhook API feat: Implement image rescanning using Harbor Webhook API Nov 20, 2024
@jopemachine jopemachine changed the title feat: Implement image rescanning using Harbor Webhook API feat: Implement Image Rescanning using Harbor Webhook API Nov 20, 2024
@jopemachine jopemachine force-pushed the topic/11-19-feat_implement_image_rescan_based_on_harbor_webhook branch from 5c30e46 to 6e92403 Compare November 20, 2024 05:34
@jopemachine jopemachine force-pushed the topic/11-13-feat_implement_management_api_for_controlling_harbor_per-project_quota branch from 573457b to 2d4af78 Compare November 20, 2024 06:45
@jopemachine jopemachine force-pushed the topic/11-19-feat_implement_image_rescan_based_on_harbor_webhook branch from 6e92403 to b7773cc Compare November 20, 2024 06:45

query = sa.select(ContainerRegistryRow).where(
(ContainerRegistryRow.type == ContainerRegistryType.HARBOR2)
& (ContainerRegistryRow.url.like(f"%{registry_url}%"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal to check using registry_name, but the Harbor webhook request does not include registry_name information.

@jopemachine jopemachine marked this pull request as ready for review November 20, 2024 07:52
@@ -428,6 +428,16 @@ async def auth_middleware(request: web.Request, handler) -> web.StreamResponse:
Fetches user information and sets up keypair, user, and is_authorized
attributes.
"""
allow_list = request.app["auth_middleware_allowlist"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of other code accessing the RootContext field stored in request.app[“_root.context”], is there a reason why you stored it directly in the app here?
I'd like to know what separates the cases that access app directly from those that access RootContext. @achimnol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RootContext contains mutable configurations such as local_config and shared_config, as well as many objects that are frequently accessed throughout the codebase.

So I thought it might be better to inject hardcoded values into the app instead.
(But this is just my personal opinion.)

Copy link

@HyeockJinKim HyeockJinKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be fine if the API was only used for harbor webhooks.

@jopemachine
Copy link
Member Author

This would be fine if the API was only used for harbor webhooks.

The API currently supports only Harbor webhooks, but it could be expanded to accommodate other container registries's webhook in the future.

@jopemachine jopemachine force-pushed the topic/11-13-feat_implement_management_api_for_controlling_harbor_per-project_quota branch from b4b1ee6 to 628d41a Compare December 10, 2024 01:48
@jopemachine jopemachine force-pushed the topic/11-19-feat_implement_image_rescan_based_on_harbor_webhook branch from d0186e5 to f4a60c4 Compare December 10, 2024 01:48
@jopemachine jopemachine force-pushed the topic/11-13-feat_implement_management_api_for_controlling_harbor_per-project_quota branch from 628d41a to e6dfa69 Compare December 13, 2024 01:05
…ng_harbor_per-project_quota' into topic/11-19-feat_implement_image_rescan_based_on_harbor_webhook
Copy link

@HyeockJinKim HyeockJinKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Documentations comp:manager Related to Manager component size:M 30~100 LoC type:feature Add new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revamp image scanning process
2 participants