-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update settings-driven product locator builder branch #12
Conversation
Create settings
Add method to just download a file and save it to a local repository. The method do not load the file into memory, it just download it and save it.
Refactor the `download_file` method in the `DatasourceHTTP` class to improve its implementation. The method now checks if the file already exists in the local repository before attempting to retrieve it from the datasource. If the file is not found, it is retrieved and added to the repository. Additionally, the method now raises a `RuntimeError` if the file cannot be retrieved.
Update docstrings to reflect the changes made in the previous commit.
Refactor the `Downloader` class to improve its functionality and add a new `download_file` method. The `download_files` method now downloads files from the datasource into the local repository, filtering them based on the specified start and end times. The `download_file` method is responsible for retrieving individual files from the datasource and saving them in the local repository. These changes enhance the overall file downloading capabilities of the `Downloader` class.
Simplify base_directory initialization.
Simplify the implementation of the DatasourceCache class by removing unnecessary comments and improving code readability. The cache now uses a float value for the life_time parameter, with a default value of +inf to indicate that the cache should be kept forever. This change enhances the overall maintainability of the codebase.
Refactor basic test to use the new test download functions.
Update GOES-16 data refresh interval and add clarification in comments.
Release v0.1 rc2
Pre-release v0.1-rc2
refactor(docs): update installation instructions for GOES-DL package
Fix readme v0.1 rc2
Merge pull request #7 from wvenialbo/fix-readme-v0.1-rc2
Fix readme v0.1 rc2
Reviewer's Guide by SourceryThis PR introduces significant changes to the GOES-DL project's architecture, focusing on improving file downloading and caching mechanisms. The changes include separating download and retrieval operations, enhancing repository management, and updating documentation. The implementation now properly handles local file caching and provides clearer interfaces for data source operations. Updated ER diagram for settings.jsonerDiagram
DATASET {
string goes
}
GOES {
string description
string[] channel
string datasource
string instrument
string level
string mode
string origin
string scene
}
PRODUCT {
string ACHA
string ACHA2KM
string ACHP2KM
string ACM
string ACTP
string ADP
string AICE
string AITA
string AOD
string BRF
string CCL
string CMIP
string COD
string COD2KM
string CPS
string CTP
string DMW
string DMWM
string DMWV
string DSI
string DSR
string FDC
string FSC
string LCFA
string LSA
string LST
string LST2KM
string LVMP
string LVTP
string MCMIP
string RRQPE
string RSR
string Rad
string SST
string TPW
string VAA
}
DATASET ||--o{ GOES : contains
GOES ||--o{ PRODUCT : contains
Updated class diagram for DownloaderclassDiagram
class Downloader {
-Datasource datasource
-Locator locator
-str date_format
+download_files(start: str, end: str = "")
+get_files(start: str, end: str = "") tuple[list[bytes], list[str]]
-_get_file_list(start_time: str, end_time: str = "") list[str]
-_load_files(file_paths: list[str]) list[bytes]
-_retrieve_files(file_paths: list[str])
}
Updated class diagram for DatasourceAWSclassDiagram
class DatasourceAWS {
-str bucket_name
+download_file(file_path: str)
+get_file(file_path: str) bytes
+listdir(dir_path: str) list[str]
-_retrieve_file(file_path: str) bytes
}
Updated class diagram for DatasourceHTTPclassDiagram
class DatasourceHTTP {
+download_file(file_path: str)
+get_file(file_path: str) bytes
+listdir(dir_path: str) list[str]
-_retrieve_file(file_path: str) bytes
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Here's the code health analysis summary for commits Analysis Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @wvenialbo - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟡 Documentation: 3 issues found
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@@ -61,14 +61,16 @@ class DatasourceCache: | |||
Notes | |||
----- | |||
Set `life_time` to a positive number of second to enable the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): The default cache lifetime of +inf could lead to stale data being served. Consider using a more conservative default like 1 hour.
Infinite caching by default means cached data will never expire unless explicitly cleared. This could cause issues if the remote data changes.
""" | ||
if self.repository.has_item(file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Consider validating cached files when checking has_item to ensure data integrity
The current implementation assumes cached files are valid but they could be corrupted or incomplete. Consider adding checksum validation.
if self.repository.has_item(file_path) and self.repository.validate_checksum(file_path):
@@ -86,7 +86,7 @@ from GOES_DL.downloader import Downloader | |||
# Initialize the downloader for GridSat-GOES (GOES-12, Full Disk) | |||
locator = GridSatProductLocatorGC("F", "G12") | |||
|
|||
datasource = DatasourceHTTP(locator) | |||
datasource = DatasourceHTTP(locator, repository="./my_data/gridsat-gc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Consider adding a brief explanation of the repository parameter
It would be helpful to mention that this is the local directory path where downloaded files will be stored.
datasource = DatasourceHTTP(
locator,
repository="./my_data/gridsat-gc" # Local directory path for downloaded files
)
datasource = DatasourceAWS(locator) | ||
# GOES-16 data is updated every 10 minutes. If you are downloading | ||
# old data, you may leave the cache refresh rate as default (+inf). | ||
datasource = DatasourceAWS(locator, repository="./my_data/goes-r", cache=600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Specify the units for the cache parameter
Please clarify if cache=600 is in seconds, minutes, or another unit of time.
datasource = DatasourceAWS(locator, repository="./my_data/goes-r", cache=600) | |
datasource = DatasourceAWS(locator, repository="./my_data/goes-r", cache_seconds=600) |
end="1984-08-24T00:00-0004", | ||
) | ||
|
||
# `files1` and files2` are lists of tuple[str, bytes] with file path and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (documentation): Fix backtick placement in variable names
The backtick should be at the start of 'files2', not the end of 'files1'.
0b8abcf
into
implement-settings-driven-creation
Summary by Sourcery
Update the settings configuration to support GOES-R Series Imagery datasets and refactor the downloader and datasource classes for improved file handling. Enhance documentation with detailed usage instructions and expand test coverage for downloader functionalities.
New Features:
Enhancements:
Documentation:
Tests: