-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor database backend #195
Conversation
Codecov Report
@@ Coverage Diff @@
## master #195 +/- ##
==========================================
- Coverage 95.75% 95.68% -0.08%
==========================================
Files 18 23 +5
Lines 2051 2064 +13
Branches 388 388
==========================================
+ Hits 1964 1975 +11
- Misses 69 70 +1
- Partials 18 19 +1
Continue to review full report at Codecov.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Some suggestions made.
One thing in the notebooks, there's a code block like
create_timeseries(
scenario="high",
climate_model="model_b",
count=10,
b_factor=1 / 1000,
),
We should probably make b_factor = 2 / 1000 for the high scenario cause in the current implementation the high scenario is below the low scenario for model_b which will probably confuse people (even though it's not actually related to what is going on in the notebook, a good distraction to avoid)
src/scmdata/database/_database.py
Outdated
|
||
Creating a new :class:`ScmDatabase` does not modify any existing data on | ||
disk. To load an existing database ensure that the :attr:`root_dir` and | ||
:attr:`levels` are the same as the previous instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and backend stuff?
|
||
Parameters | ||
---------- | ||
scmrun : :class:`scmdata.ScmRun <scmdata.run.ScmRun>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the docstrings to BaseRun or whatever the class is called or is this somehow tightly coupled to ScmRun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question applies throughout I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are actually returning ScmRun objects, but I get your point. Maybe that change comes when we have more subclasses that would make that differentiation useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep probably one for the future
|
||
If metadata includes non-alphanumeric characters then it | ||
might appear modified in the returned table. The original | ||
metadata values can still be used to filter data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is it to get the metadata as is actually used in the data? I'm assuming very cause it's not a 1:1 map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you would need to read every file. The alternative would be to create an inventory file a la the pangeo CMIP6 archive. Some thought would be needed to ensure that it remains insync with the files on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok ye nice let's leave then
src/scmdata/database/_utils.py
Outdated
if not os.path.isdir(dir_to_check): | ||
try: | ||
os.makedirs(dir_to_check) | ||
except OSError: # pragma: no cover | ||
# Prevent race conditions if multiple threads attempt to create dir at same time | ||
if not os.path.exists(dir_to_check): | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not os.path.isdir(dir_to_check): | |
try: | |
os.makedirs(dir_to_check) | |
except OSError: # pragma: no cover | |
# Prevent race conditions if multiple threads attempt to create dir at same time | |
if not os.path.exists(dir_to_check): | |
raise | |
os.makedirs(dir_to_check, exist_ok=True) |
Would that be simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would. Added that with a check to assert that the target directory isn't a file
01b5426
to
caf2718
Compare
@znicholls push some changes that resolve your comments. I also rebased ontop of master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice
Pull request
Please confirm that this pull request has done the following:
CHANGELOG.rst
addedNo code changes but split
scmdata.database
into a package. Technically this is a breaking change as the name/location of the backend classes has changed.