Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor database backend #195

Merged
merged 12 commits into from
May 21, 2022
Merged

Refactor database backend #195

merged 12 commits into from
May 21, 2022

Conversation

lewisjared
Copy link
Collaborator

@lewisjared lewisjared commented May 4, 2022

Pull request

Please confirm that this pull request has done the following:

  • Documentation added (where applicable)
  • Example added (either to an existing notebook or as a new notebook, where applicable)
  • Description in CHANGELOG.rst added

No code changes but split scmdata.database into a package. Technically this is a breaking change as the name/location of the backend classes has changed.

@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #195 (13002d3) into master (57a555d) will decrease coverage by 0.07%.
The diff coverage is 97.93%.

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
- Coverage   95.75%   95.68%   -0.08%     
==========================================
  Files          18       23       +5     
  Lines        2051     2064      +13     
  Branches      388      388              
==========================================
+ Hits         1964     1975      +11     
- Misses         69       70       +1     
- Partials       18       19       +1     
Impacted Files Coverage Δ
src/scmdata/database/_utils.py 83.33% <83.33%> (ø)
src/scmdata/database/backends/netcdf.py 98.30% <98.30%> (ø)
src/scmdata/__init__.py 100.00% <100.00%> (ø)
src/scmdata/database/__init__.py 100.00% <100.00%> (ø)
src/scmdata/database/_database.py 100.00% <100.00%> (ø)
src/scmdata/database/backends/__init__.py 100.00% <100.00%> (ø)
src/scmdata/database/backends/base.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cbb7ae...13002d3. Read the comment docs.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@lewisjared lewisjared requested a review from znicholls May 16, 2022 10:10
Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some suggestions made.

One thing in the notebooks, there's a code block like

create_timeseries(
            scenario="high",
            climate_model="model_b",
            count=10,
            b_factor=1 / 1000,
        ),

We should probably make b_factor = 2 / 1000 for the high scenario cause in the current implementation the high scenario is below the low scenario for model_b which will probably confuse people (even though it's not actually related to what is going on in the notebook, a good distraction to avoid)

CHANGELOG.rst Show resolved Hide resolved
src/scmdata/database/_database.py Show resolved Hide resolved

Creating a new :class:`ScmDatabase` does not modify any existing data on
disk. To load an existing database ensure that the :attr:`root_dir` and
:attr:`levels` are the same as the previous instance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and backend stuff?

src/scmdata/database/_database.py Outdated Show resolved Hide resolved
src/scmdata/database/_database.py Outdated Show resolved Hide resolved
src/scmdata/database/_database.py Outdated Show resolved Hide resolved

Parameters
----------
scmrun : :class:`scmdata.ScmRun <scmdata.run.ScmRun>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the docstrings to BaseRun or whatever the class is called or is this somehow tightly coupled to ScmRun?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question applies throughout I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are actually returning ScmRun objects, but I get your point. Maybe that change comes when we have more subclasses that would make that differentiation useful?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep probably one for the future


If metadata includes non-alphanumeric characters then it
might appear modified in the returned table. The original
metadata values can still be used to filter data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is it to get the metadata as is actually used in the data? I'm assuming very cause it's not a 1:1 map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you would need to read every file. The alternative would be to create an inventory file a la the pangeo CMIP6 archive. Some thought would be needed to ensure that it remains insync with the files on disk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ye nice let's leave then

Comment on lines 16 to 22
if not os.path.isdir(dir_to_check):
try:
os.makedirs(dir_to_check)
except OSError: # pragma: no cover
# Prevent race conditions if multiple threads attempt to create dir at same time
if not os.path.exists(dir_to_check):
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not os.path.isdir(dir_to_check):
try:
os.makedirs(dir_to_check)
except OSError: # pragma: no cover
# Prevent race conditions if multiple threads attempt to create dir at same time
if not os.path.exists(dir_to_check):
raise
os.makedirs(dir_to_check, exist_ok=True)

Would that be simpler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it would. Added that with a check to assert that the target directory isn't a file

@lewisjared
Copy link
Collaborator Author

@znicholls push some changes that resolve your comments. I also rebased ontop of master

@lewisjared lewisjared mentioned this pull request May 21, 2022
6 tasks
Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

@znicholls znicholls merged commit 63032ce into master May 21, 2022
@znicholls znicholls deleted the refactor-backend branch May 21, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants