-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dev: move storage metadata collection to background job #5818
Conversation
WalkthroughThe changes introduced in this pull request involve modifications across multiple endpoints to implement asynchronous retrieval of asset metadata using the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
apiserver/plane/bgtasks/storage_metadata_task.py (1)
1-28
: Overall assessment: Good implementation with room for improvementThe new background task for retrieving asset object metadata is well-structured and appropriately uses Celery for asynchronous processing. The error handling and use of S3Storage are good practices. However, there are opportunities to enhance error handling, improve debugging capabilities, and add return values for better observability.
The suggested improvements will make the function more robust and easier to debug. Additionally, verifying the usage of this function in other parts of the codebase is crucial to ensure it's being called correctly and used as intended.
apiserver/plane/settings/storage.py (2)
42-46
: Approve with suggestions: Dynamic endpoint URL settingThe introduction of dynamic endpoint URL setting based on the request context is a good improvement. However, there are a few points to consider:
Type hinting and validation:
Add type hinting for therequest
parameter and implement validation to ensure it has the required attributes.Consistency:
Apply the same logic to the regular S3 client creation for consistent behavior across different storage backends.Security considerations:
Ensure that the use ofrequest.scheme
andrequest.get_host()
is safe in your context. Consider implementing additional validation or using a whitelist of allowed hosts to prevent potential security issues.Here's a suggested implementation addressing these points:
from typing import Optional from django.http import HttpRequest class S3Storage(S3Boto3Storage): def __init__(self, request: Optional[HttpRequest] = None): # ... (existing code) def get_endpoint_url(): if request and hasattr(request, 'scheme') and hasattr(request, 'get_host'): return f"{request.scheme}://{request.get_host()}" return self.aws_s3_endpoint_url endpoint_url = get_endpoint_url() if os.environ.get("USE_MINIO") == "1": # Create an S3 client for MinIO self.s3_client = boto3.client( "s3", aws_access_key_id=self.aws_access_key_id, aws_secret_access_key=self.aws_secret_access_key, region_name=self.aws_region, endpoint_url=endpoint_url, config=boto3.session.Config(signature_version="s3v4"), ) else: # Create an S3 client self.s3_client = boto3.client( "s3", aws_access_key_id=self.aws_access_key_id, aws_secret_access_key=self.aws_secret_access_key, region_name=self.aws_region, endpoint_url=endpoint_url, config=boto3.session.Config(signature_version="s3v4"), )This implementation adds type hinting, validates the
request
object, and applies the dynamic endpoint URL setting to both MinIO and regular S3 client creation.
Line range hint
1-180
: Summary: Localized change with potential for broader impactThe changes made to the
S3Storage
class are localized to the__init__
method and introduce a more flexible way of setting the endpoint URL for the S3 client. While this change doesn't directly affect other methods in the class, it does alter the class's initialization behavior, which could have broader implications:
- Any code that instantiates
S3Storage
might need to be updated to pass arequest
object if dynamic endpoint URL setting is desired.- The change currently only affects MinIO client creation, which could lead to inconsistent behavior between MinIO and regular S3 usage.
- The new behavior introduces a potential security consideration that should be carefully evaluated in the context of the entire application.
Overall, the change is an improvement in flexibility, but care should be taken to ensure it's implemented consistently and securely across the application.
Consider creating a configuration or environment variable to explicitly enable or disable this dynamic endpoint URL setting. This would provide more control over when and where this feature is used, making it easier to manage in different environments (development, staging, production).
apiserver/plane/space/views/asset.py (1)
166-167
: LGTM: Improved metadata handling with background task.The changes effectively address the PR objectives:
- The condition now checks for a falsy value of
storage_metadata
, which is more robust than checking forNone
.- The metadata retrieval is now offloaded to a background task, which should improve performance.
Consider adding error handling for the background task call:
try: get_asset_object_metadata.delay(str(asset.id)) except Exception as e: # Log the error logger.error(f"Failed to queue metadata retrieval for asset {asset.id}: {str(e)}") # Optionally, you could also set a flag on the asset to indicate the metadata retrieval needs to be retriedThis will ensure that any issues with queuing the background task are logged and don't silently fail.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- apiserver/plane/app/views/asset/v2.py (4 hunks)
- apiserver/plane/app/views/issue/attachment.py (2 hunks)
- apiserver/plane/bgtasks/storage_metadata_task.py (1 hunks)
- apiserver/plane/settings/storage.py (1 hunks)
- apiserver/plane/space/views/asset.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (10)
apiserver/plane/bgtasks/storage_metadata_task.py (2)
1-7
: LGTM: Imports are well-organized and relevant.The imports are appropriately separated into third-party and module imports. All imported items are used in the code, and the necessary components for the task are included.
10-28
: 🛠️ Refactor suggestionImprove error handling and add return values for better observability.
The function looks good overall, but there are a few areas that could be improved:
- Silent returns in exception cases make debugging difficult.
- There's no logging for the
FileAsset.DoesNotExist
case.- The function doesn't return any value indicating success or failure.
- There's no validation of the retrieved metadata before saving.
Consider applying the following improvements:
@shared_task def get_asset_object_metadata(asset_id): try: # Get the asset asset = FileAsset.objects.get(pk=asset_id) # Create an instance of the S3 storage storage = S3Storage() # Get the storage metadata = storage.get_object_metadata( object_name=asset.asset.name ) + # Validate metadata + if not metadata: + raise ValueError("Retrieved metadata is empty") # Save the asset + asset.storage_metadata = metadata asset.save() - return + return True except FileAsset.DoesNotExist: + log_exception(f"FileAsset with id {asset_id} does not exist") - return + return False except Exception as e: log_exception(e) - return + return FalseThese changes will:
- Add logging for the
FileAsset.DoesNotExist
case.- Return boolean values to indicate success or failure.
- Add basic validation for the retrieved metadata.
- Ensure that all return paths have a value.
To ensure that this function is being called correctly from other parts of the codebase, you can run the following script:
This script will help verify that the function is being imported and called correctly, and that it's being used as a Celery task where appropriate.
✅ Verification successful
The proposed improvements enhance error handling and observability without introducing issues.
The suggested changes to the
get_asset_object_metadata
function are valid and beneficial:
- Added logging for the
FileAsset.DoesNotExist
case improves debugging capabilities.- Returning boolean values (
True
for success andFalse
for failure) provides better observability of the task's outcome.- Validation of retrieved metadata ensures data integrity before saving.
These modifications will help maintain the robustness of the task and facilitate easier troubleshooting.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of get_asset_object_metadata in the codebase # Search for imports of the function echo "Checking imports:" rg --type python "from plane.bgtasks.storage_metadata_task import get_asset_object_metadata" # Search for calls to the function echo "\nChecking function calls:" rg --type python "get_asset_object_metadata\s*\(" # Check if the function is being used as a Celery task echo "\nChecking Celery task usage:" rg --type python "get_asset_object_metadata\.delay\s*\("Length of output: 475
Script:
#!/bin/bash # Description: Revised verification of get_asset_object_metadata usage in the codebase # Search for imports of the function using the correct type identifier echo "Checking imports:" rg --type py "from plane\.bgtasks\.storage_metadata_task import get_asset_object_metadata" # Search for calls to the function echo "\nChecking function calls:" rg --type py "get_asset_object_metadata\s*\(" # Check if the function is being used as a Celery task echo "\nChecking Celery task usage:" rg --type py "get_asset_object_metadata\.delay\s*\("Length of output: 1329
apiserver/plane/space/views/asset.py (2)
18-18
: LGTM: New import aligns with PR objectives.The addition of
get_asset_object_metadata
from the background tasks module aligns well with the PR objective of moving storage metadata collection to a background job. This change should help improve performance by offloading the metadata collection process.
Line range hint
1-285
: Overall assessment: Changes meet PR objectives and improve code quality.The modifications in this file successfully address the PR objectives:
- The storage metadata logic has been improved by changing the condition to check for a falsy value instead of
None
.- The metadata collection process has been moved to a background job, which should enhance performance.
These changes should lead to more efficient handling of asset metadata without blocking the main execution thread. The code is now more robust and scalable.
apiserver/plane/app/views/issue/attachment.py (2)
23-23
: Importget_asset_object_metadata
for asynchronous metadata collectionThe import statement correctly adds
get_asset_object_metadata
fromplane.bgtasks.storage_metadata_task
, enabling the use of a background task for storage metadata collection.
258-259
: Verify that deferring metadata collection does not affect functionalityBy moving the storage metadata collection to a background task when
issue_attachment.storage_metadata
is falsy, there may be a delay before this data becomes available. Please ensure that no immediate operations depend onissue_attachment.storage_metadata
being populated immediately after thepatch
method execution.apiserver/plane/app/views/asset/v2.py (4)
25-25
: Import statement for background task is appropriate.The addition of
get_asset_object_metadata
import enables asynchronous retrieval of storage metadata using a background task.
200-201
: Ensure correct handling of emptystorage_metadata
.The condition
if not asset.storage_metadata
will evaluate toTrue
whenasset.storage_metadata
isNone
or an empty dictionary{}
. Confirm that this behavior is intentional and that an empty dictionary signifies missing metadata.
450-451
: Ensure correct handling of emptystorage_metadata
.
687-688
: Ensure correct handling of emptystorage_metadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
apiserver/plane/bgtasks/storage_metadata_task.py (1)
1-28
: Overall assessment: Good implementation with room for improvementThe introduction of this background task for retrieving asset metadata is a positive addition to the project. It aligns well with the PR objectives of moving storage metadata collection to a background job. The use of Celery for this purpose is appropriate and will help in improving the overall performance of the application.
However, there are several areas where the implementation could be enhanced:
- More detailed return values for better task result tracking.
- Increased flexibility in S3Storage configuration.
- More robust handling of potential edge cases (like empty metadata).
- Improved logging for both successful and unsuccessful operations.
These improvements would make the task more robust, flexible, and easier to monitor and debug. Consider implementing the suggested changes to further enhance the quality and maintainability of this background task.
apiserver/plane/space/views/asset.py (1)
166-167
: LGTM: Improved metadata handling with background task.The changes successfully address the PR objectives:
- The condition now correctly checks for an empty dictionary instead of
None
.- Metadata collection is moved to a background job, which should improve performance.
Consider adding a comment or log statement to indicate that metadata retrieval has been queued as a background task. This could be helpful for debugging or monitoring purposes.
Example:
if not asset.storage_metadata: get_asset_object_metadata.delay(str(asset.id)) logger.info(f"Metadata retrieval queued for asset {asset.id}")Also, consider if there are any scenarios where the metadata might be needed immediately after this operation. If so, you may want to add a flag to optionally wait for the metadata retrieval.
apiserver/plane/app/views/asset/v2.py (3)
200-201
: LGTM: Improved metadata check and background task implementationThe changes effectively implement the PR objectives:
- The condition for checking storage metadata has been improved to handle empty dictionaries.
- The metadata collection has been moved to a background task using
get_asset_object_metadata.delay()
.These changes should improve performance by offloading the metadata collection process.
Consider adding error handling for the background task to ensure any failures are logged or handled appropriately:
try: get_asset_object_metadata.delay(asset_id=str(asset_id)) except Exception as e: logger.error(f"Failed to start background task for asset {asset_id}: {str(e)}")
687-688
: LGTM: Consistent implementation of background task for project assetsThe changes in the
ProjectAssetEndpoint.patch
method are consistent with those in theUserAssetsV2Endpoint.patch
method:
- The condition for checking storage metadata has been improved.
- The metadata collection has been moved to a background task.
This consistency ensures that both user and project assets are handled similarly.
As suggested for the
UserAssetsV2Endpoint.patch
method, consider adding error handling:try: get_asset_object_metadata.delay(asset_id=str(pk)) except Exception as e: logger.error(f"Failed to start background task for asset {pk}: {str(e)}")
Line range hint
25-688
: Overall LGTM: Successful implementation of background tasks for metadata collectionThe changes in this file successfully address the PR objectives:
- The storage metadata logic has been fixed by improving the condition checks.
- Metadata collection has been moved to a background job using Celery's
delay()
method.These modifications are consistently applied across both
UserAssetsV2Endpoint
andProjectAssetEndpoint
classes, ensuring uniform behavior for different types of assets. The implementation should lead to improved performance by offloading the potentially time-consuming metadata collection process to background tasks.To further enhance this implementation, consider:
- Implementing a retry mechanism for failed background tasks.
- Adding monitoring and logging for the background tasks to track their performance and any potential issues.
- Ensuring that the Celery worker configuration is optimized to handle the expected load of metadata collection tasks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- apiserver/plane/app/views/asset/v2.py (4 hunks)
- apiserver/plane/app/views/issue/attachment.py (2 hunks)
- apiserver/plane/bgtasks/storage_metadata_task.py (1 hunks)
- apiserver/plane/settings/storage.py (1 hunks)
- apiserver/plane/space/views/asset.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (7)
apiserver/plane/bgtasks/storage_metadata_task.py (1)
1-7
: LGTM: Imports are well-organized and relevant.The imports are appropriately separated into third-party and module imports. All imported modules and functions are used in the code, and they cover the necessary dependencies for the task.
apiserver/plane/settings/storage.py (2)
Line range hint
1-164
: Note: Misalignment with PR objectivesWhile the changes in this file introduce flexibility in endpoint URL construction, they don't directly address the main objectives stated in the PR description:
- Moving storage metadata collection to a background job
- Fixing storage metadata logic (changing the condition to check for an empty dictionary instead of
None
)Consider clarifying how these changes relate to the PR objectives or if they should be part of a separate PR.
To ensure we haven't missed any relevant changes, let's check for metadata-related modifications:
✅ Verification successful
Confirmed: Changes Do Not Address PR Objectives
The updates in
apiserver/plane/settings/storage.py
do not implement the main objectives of the PR:
- Moving storage metadata collection to a background job.
- Fixing storage metadata logic by changing the condition to check for an empty dictionary instead of
None
.Please ensure these objectives are addressed in the relevant parts of the codebase or consider separating them into a different PR.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for metadata-related changes # Expected: No significant changes to metadata handling in this file rg --type python "metadata" apiserver/plane/settings/storage.pyLength of output: 188
Script:
#!/bin/bash # Description: Search for metadata-related changes in storage.py # Expected: Details of metadata handling within the specified file rg "metadata" apiserver/plane/settings/storage.pyLength of output: 145
42-46
: Approve with suggestions: Enhance flexibility and improve code clarityThe change introduces welcome flexibility by allowing dynamic endpoint URL construction based on the request object. However, there are a few points to address:
Add the
request
parameter to the method signature for clarity:def __init__(self, request=None):Add type hints and documentation for the
request
parameter:from django.http import HttpRequest def __init__(self, request: Optional[HttpRequest] = None): """ Initialize the S3Storage. :param request: The current HTTP request, used for dynamic endpoint URL construction. If None, falls back to the configured endpoint URL. """Consider adding a comment explaining why this change was made, as it's not immediately clear how it relates to the PR objectives of moving storage metadata collection to a background job.
To ensure this change doesn't introduce inconsistencies, let's verify its usage:
✅ Verification successful
Approve with clarification: Consistent usage of
request
parameter confirmedThe verification reveals that
S3Storage
is consistently initialized with therequest
parameter across multiple files in the codebase. This demonstrates that the introduced flexibility aligns with existing usage patterns and does not introduce inconsistencies.No further action is required regarding the
request
parameter usage.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for other occurrences of S3Storage initialization # Expected: No other occurrences of S3Storage being initialized with a request parameter rg --type python "S3Storage\(" -C 3Length of output: 188
Script:
#!/bin/bash # Description: Check for other occurrences of S3Storage initialization # Expected: No other occurrences of S3Storage being initialized with a request parameter rg "S3Storage\s*\(" --type py -C 3Length of output: 6318
apiserver/plane/space/views/asset.py (2)
18-18
: LGTM: New import for background task.The import of
get_asset_object_metadata
from the background tasks module aligns with the PR objective of moving storage metadata collection to a background job.
Line range hint
1-280
: Overall assessment: Changes align with PR objectives and improve code quality.The modifications in this file successfully implement the intended changes:
- Fixed the storage metadata logic to check for an empty dictionary instead of
None
.- Moved the metadata collection to a background job, potentially improving performance.
These changes are focused and don't introduce unrelated alterations to the file. The overall structure and functionality of the file remain intact, with improvements to the asset metadata handling process.
apiserver/plane/app/views/asset/v2.py (1)
25-25
: LGTM: New import for background taskThe new import statement for
get_asset_object_metadata
is correctly placed and aligns with the PR objective of moving storage metadata collection to a background job.apiserver/plane/app/views/issue/attachment.py (1)
23-23
: Import statement added correctlyThe import of
get_asset_object_metadata
fromplane.bgtasks.storage_metadata_task
is appropriate and necessary for the background task implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- deploy/selfhost/docker-compose.yml (1 hunks)
- nginx/nginx.conf.dev (1 hunks)
- nginx/nginx.conf.template (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
deploy/selfhost/docker-compose.yml (1)
37-37
: LGTM: Simplified AWS_REGION environment variable declarationThe change from
AWS_REGION=${AWS_REGION:-""}
toAWS_REGION=${AWS_REGION:-}
is a minor improvement in the Docker Compose configuration. Both versions effectively set an empty string as the default value for AWS_REGION, but the new version is more concise. This change:
- Maintains the same functionality
- Slightly improves readability
- Aligns with common Docker Compose practices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
None
.Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores