Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #1069 Added numba cache dir for pytest #1070

Merged
merged 20 commits into from
Feb 1, 2025

Conversation

seanlaw
Copy link
Contributor

@seanlaw seanlaw commented Jan 28, 2025

See #1069

Pull Request Checklist

Below is a simple checklist but please do not hesitate to ask for assistance!

  • Fork, clone, and checkout the newest version of the code
  • Create a new branch
  • Make necessary code changes
  • Install black (i.e., python -m pip install black or conda install -c conda-forge black)
  • Install flake8 (i.e., python -m pip install flake8 or conda install -c conda-forge flake8)
  • Install pytest-cov (i.e., python -m pip install pytest-cov or conda install -c conda-forge pytest-cov)
  • Run black --exclude=".*\.ipynb" --extend-exclude=".venv" --diff ./ in the root stumpy directory
  • Run flake8 --extend-exclude=.venv ./ in the root stumpy directory
  • Run ./setup.sh dev && ./test.sh in the root stumpy directory
  • Reference a Github issue (and create one if one doesn't already exist)

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 89.18919% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.31%. Comparing base (82ecd51) to head (f0bb309).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
stumpy/fastmath.py 33.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1070      +/-   ##
==========================================
- Coverage   97.33%   97.31%   -0.02%     
==========================================
  Files          93       93              
  Lines       15219    15239      +20     
==========================================
+ Hits        14813    14830      +17     
- Misses        406      409       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seanlaw
Copy link
Contributor Author

seanlaw commented Jan 29, 2025

@NimaSarajpoor Commented:

I feel the underlying challenge is "how can we make sure we are reading data from the correct location to which data was written?" So, can we try to track that numba cache directory by defining a config var like:
config.NUMBA_CACHE_DIR which can be set to None as default. Then, if cache._save() is called, we can write the directory there. We can even raise warning if the user provided numba cache directory is different than an already-set config. config.NUMBA_CACHE_DIR

According to the numba docs:

Numba picks the cache directory in the following order:

  1. In-tree cache. Put the cache next to the corresponding source file under a pycache directory following how .pyc files are stored.
  2. User-wide cache. Put the cache in the user’s application directory using appdirs.user_cache_dir from the Appdirs package.
  3. IPython cache. Put the cache in an IPython specific application directory. Stores are made under the numba_cache in the directory returned by IPython.paths.get_ipython_cache_dir().

So, by default, the cache files are written next the source files under a __pycache__ directory. This is precisely the behavior that we are seeing (for 1) above). For users, 99% of the time the cache files will be written to site-packages/stumpy/__pycache__. For pytest unit testing, 100% of the time the cache files will be written to stumpy.git/stumpy/__pycache__. My code currently differentiates between the "users 99%" and our "100% pytest" scenario via the if/else condition:

    if cache_dir is not None:  # pragma: no cover
        numba_cache_dir = str(cache_dir)
    elif "PYTEST_CURRENT_TEST" in os.environ:
        numba_cache_dir = "stumpy/__pycache__"
    else:  # pragma: no cover
        site_pkg_dir = site.getsitepackages()[0]
        numba_cache_dir = site_pkg_dir + "/stumpy/__pycache__"

I haven't tested this but I added the first if condition above to potentially handle 2) and 3) OR when the user defines the cache directory via NUMBA_CACHE_DIR (i.e., numba.config.CACHE_DIR). Although, I would say that these are beyond the scope of stumpy. When stumpy is installed as a pacakge, the stumpy cache files should always be expected in site-packages/stumpy/__pycache__. The only way that this would change is if the user does something unexpected (i.e., set the NUMBA_CACHE_DIR. In that case, that's no longer our problem to solve.

We can possibly check if numba.config.CACHE_DIR != '': and then spit out a warning that cache may have unexpected behavior.

Copy link
Collaborator

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I added some comments for your consideration..

stumpy/fastmath.py Show resolved Hide resolved
stumpy/cache.py Show resolved Hide resolved
stumpy/cache.py Outdated Show resolved Hide resolved
tests/test_cache.py Show resolved Hide resolved
stumpy/cache.py Show resolved Hide resolved
@NimaSarajpoor
Copy link
Collaborator

NimaSarajpoor commented Jan 29, 2025

@seanlaw

According to the numba docs:

I haven't tested this but I added the first if condition above to potentially handle 2) and 3) OR when the user defines the cache directory via NUMBA_CACHE_DIR (i.e., numba.config.CACHE_DIR). Although, I would say that these are beyond the scope of stumpy.

We can possibly check if numba.config.CACHE_DIR != '': and then spit out a warning that cache may have unexpected behavior.

I noticed you added a warning to reflect this. I was thinking about using numba config instead of the [stumpy] config var (and then assign its value to NUMBA_CASHE_DIR) I proposed for writing cache files to the user-defined location. Then, I noticed your message that it is better to not include it in the STUMPY's scope.

@seanlaw
Copy link
Contributor Author

seanlaw commented Jan 30, 2025

@NimaSarajpoor I made most of the changes but kept some. Let me know what you think

Copy link
Collaborator

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I just left one new comment.

stumpy/cache.py Outdated Show resolved Hide resolved
@seanlaw
Copy link
Contributor Author

seanlaw commented Jan 30, 2025

@NimaSarajpoor I've made some changes based on your comments

@seanlaw
Copy link
Contributor Author

seanlaw commented Jan 30, 2025

I've also added a simple/stupid "check" to see if cache._clear() has been called before cache._save().

Copy link
Collaborator

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
Just left some minor comments. Other than those, I think it is all good!

stumpy/cache.py Outdated Show resolved Hide resolved
stumpy/cache.py Outdated Show resolved Hide resolved
tests/test_cache.py Outdated Show resolved Hide resolved
stumpy/cache.py Show resolved Hide resolved
stumpy/cache.py Outdated Show resolved Hide resolved
@NimaSarajpoor
Copy link
Collaborator

@seanlaw

I've also added a simple/stupid "check" to see if cache._clear() has been called before cache._save().

My thought was to just add a note to docstring. It is nice to see that you tried to create some level of protection for users by tracking the calls to those functions.

@seanlaw seanlaw merged commit d6bfb4a into TDAmeritrade:main Feb 1, 2025
27 checks passed
@seanlaw
Copy link
Contributor Author

seanlaw commented Feb 1, 2025

@NimaSarajpoor Thank you for your comments and review. I think we ended up in a good place with this PR!

@NimaSarajpoor
Copy link
Collaborator

@seanlaw
Do you think this is ready?

I took a look at the changes again. It seems good to me. Thanks for explaining the behaviour of cache.save() in the section "notes" of its docstring.

@NimaSarajpoor
Copy link
Collaborator

@seanlaw
I think I need to start refreshing my browser next time before writing a comment! All good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants