-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to provenance log #968
Conversation
Codecov Report
@@ Coverage Diff @@
## master #968 +/- ##
==========================================
+ Coverage 88.58% 89.14% +0.55%
==========================================
Files 164 164
Lines 5977 6570 +593
==========================================
+ Hits 5295 5857 +562
- Misses 682 713 +31
Continue to review full report at Codecov.
|
starfish/util/logging.py
Outdated
|
||
@lru_cache(maxsize=1) | ||
def get_git_commit_hash(): | ||
return subprocess.check_output(["git", "describe", "--always"]).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if either or both of starfish was installed via pip
or user is not int the starfish directory
hold true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an excellent point
starfish/util/logging.py
Outdated
def get_core_dependency_info(): | ||
dependency_info = dict() | ||
for dependency in CORE_DEPENDENCIES: | ||
ps = Popen(('pip', 'show', dependency), stdout=PIPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is highly likely that pkg_resources is a more portable way of determining this.
starfish/util/logging.py
Outdated
@lru_cache(maxsize=1) | ||
def get_git_commit_hash(): | ||
# First check for git repo | ||
if call(["git", "branch"], stderr=STDOUT, stdout=open(os.devnull, 'w')) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this breaks down if someone has a different git repo that pip installs starfish. Then we would be picking up the hash of the repo that depends on starfish.
I think what we want is to detect that starfish was installed via pip install -e .
(and its conda equivalent), and only in that case do we log the commit hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait you mean only if it wasn't installed with pip or conda then we check for the git hash right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait I think I understand what you mean. What do you think is the best way to check for that? Checking is the starfish package location doesn't include site-packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's my guess, with basically zero time put into proving its feasibility:
- get the path of some module in starfish. resolve it to its
.py
file (https://stackoverflow.com/questions/7162366/get-location-of-the-py-source-file might give you an idea of how to do this, but I would test it out) as opposed to the.pyc
/.pyo
file. - run
git status --porcelain -- 123 $file
. this output will be different, depending on whether the file is tracked by a git repo or not. You may need tocd
into the directory first though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would investigate pkg_resources
to see if you can determine how a package was installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having some issues with git status --porcelain
but alternate simpler idea; what if we just check if a starfish.py is being tracked by git with git ls-files --error-unmatch starfish.py
and if so do a git descibe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having some issues with git status --porcelain but alternate simpler idea; what if we just check if a starfish.py is being tracked by git with git ls-files --error-unmatch starfish.py and if so do a git descibe
What issues are you having with git status --porcelain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --porcelain flag doesn't produce any output. Git status does
starfish/util/logging.py
Outdated
|
||
@lru_cache(maxsize=1) | ||
def get_os_info(): | ||
return {"Platform": platform.system(), "Version:": platform.version()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python_version is not os_info, but also worth logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely on the right track! I'm curious what issues you have with git status --porcelain
. it's supposed to be a very stable API.
starfish/util/logging.py
Outdated
|
||
|
||
@lru_cache(maxsize=1) | ||
def get_core_dependency_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_core_dependency_info(): | |
def get_core_dependency_info() -> Mapping[str, str]: |
starfish/util/logging.py
Outdated
|
||
|
||
@lru_cache(maxsize=1) | ||
def get_git_commit_hash(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_git_commit_hash(): | |
def get_git_commit_hash() -> str: |
starfish/util/logging.py
Outdated
# First check if in starfish repo | ||
try: | ||
check_output(["git", "ls-files", "--error-unmatch", starfish.__file__]) | ||
return check_output(["git", "describe", "--always"]).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in the try-block. It would seem that a failure here is indicative of a bug, rather than starfish is not under git tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about the git describe
or the git ls-files --error-unmatch
? Because the second is just list the files under tracking and error if starfish.file is not one of them, isn't that the behavior we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure that all you need to do is to execute git ls-files os.path.basename(starfish.__file__)
, after you chdir to os.path.dirname(starfish.__file__)
.
git ls-files of a file tracked by git, but not while you are in the directory fails:
[tt]:~> git ls-files microscopy/starfish-git/starfish/__init__.py
fatal: not a git repository (or any of the parent directories): .git
but chdir and ls-files and it works:
[tt]:~> cd microscopy/starfish-git/starfish/
[tt]:~/microscopy/starfish-git/starfish:tonytung-pr-684> git ls-files __init__.py
__init__.py
[tt]:~/microscopy/starfish-git/starfish:tonytung-pr-684>
starfish/util/logging.py
Outdated
|
||
|
||
@lru_cache(maxsize=1) | ||
def get_os_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_os_info(): | |
def get_os_info() -> Mapping[str, str]: |
starfish/util/logging.py
Outdated
|
||
|
||
@lru_cache(maxsize=1) | ||
def get_git_commit_hash(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method, as written, is subject to strange working directory effects. i suspect you should run the commands in the parent path of starfish.__file__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starfish.file outputs the file directory path though so does it matter?
starfish/types/_constants.py
Outdated
@@ -34,6 +34,10 @@ class Coordinates(AugmentedEnum): | |||
""" | |||
This is name of the provenance log attribute stored on the IntensityTable | |||
""" | |||
CORE_DEPENDENCIES = ['numpy', 'scikit-image', 'pandas', 'scikit-learn', 'scipy', 'xarray', 'sympy'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a tuple or a set.
starfish/util/logging.py
Outdated
# First check if in starfish repo | ||
try: | ||
check_output(["git", "ls-files", "--error-unmatch", starfish.__file__]) | ||
return check_output(["git", "describe", "--always"]).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure that all you need to do is to execute git ls-files os.path.basename(starfish.__file__)
, after you chdir to os.path.dirname(starfish.__file__)
.
git ls-files of a file tracked by git, but not while you are in the directory fails:
[tt]:~> git ls-files microscopy/starfish-git/starfish/__init__.py
fatal: not a git repository (or any of the parent directories): .git
but chdir and ls-files and it works:
[tt]:~> cd microscopy/starfish-git/starfish/
[tt]:~/microscopy/starfish-git/starfish:tonytung-pr-684> git ls-files __init__.py
__init__.py
[tt]:~/microscopy/starfish-git/starfish:tonytung-pr-684>
I also strongly encourage you to add some tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the testing strategy here, in either docker or just via virtualenvs there should likely be three-ish different install strategies each which call get_git_commit_hash
:
python -m starfish ...
pip install -e . && starfish ..
python setup.py sdist && pip install dist/build/starfish... && starfish ...
starfish/util/logging.py
Outdated
@lru_cache(maxsize=1) | ||
def get_git_commit_hash() -> str: | ||
# First check if in starfish repo | ||
os.chdir(os.path.dirname(starfish.__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll likely want a finally
along with this chdir
returning the user to getcwd
otherwise file look ups will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the testing strategy here, in either docker or just via virtualenvs there should likely be three-ish different install strategies each which call
get_git_commit_hash
:
python -m starfish ...
pip install -e . && starfish ..
python setup.py sdist && pip install dist/build/starfish... && starfish ...
I would expand the matrix a bit further to state that logging has to work regardless of cwd.
Would recommend shoving this into new travis stages. Can run on a lower frequency (only on commits to master? or if @joshmoore builds a release process, we can have tests that only run when we attempt to cut a release?) if it takes too much time.
starfish/util/logging.py
Outdated
@lru_cache(maxsize=1) | ||
def get_git_commit_hash() -> str: | ||
# First check if in starfish repo | ||
os.chdir(os.path.dirname(starfish.__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I don't understand the third example of install? Also this seems like a lot just to test this one command. Creating a virtual env. or docker container and installing starfish three different ways. One of them being getting travis to checkout from git, while already hooked up to run from a branch seems messy. I wanna push back a little and say "is anyone actually asking for this git hash info to be included in the log"? For most users (that installed through pip) it's going to result in a line that says "Starfish project not under git tracking" for every command, which might confuse them. |
I think this matches my concern: basically getting (and keeping) this right is going to be high cost for low value.
nods The reverse strategy be to encode the git during build and otherwise |
Yes, I think @berl asked for it...? My guess is that you are right: people should not be processing data in a permanent way with unreleased versions, and if that happens, we should just assume that all bets are off. Recording the hash is pretty meaningless when people can be mucking around with the source files. Proposal: record that it's being run from source and not a released version. Probably the best way to do that is to have the release build set a magic variable. |
I mentioned the git hash as a possibility for more fine-grained tracking of the codebase to augment the version number if applicable. @ttung proposal sounds good. Of course no one will need to ever muck about with source files in starfish anyway! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, can you dump what the log looks like now? might be good to have the community (i.e., #starfish-dev) look it over to see if it covers all the bases.
@@ -22,6 +22,9 @@ | |||
# NOTE: if we move to python 3.7, we can produce this value at call time via __getattr__ | |||
__version__ = pkg_resources.require("starfish")[0].version | |||
|
|||
# Variable to be set by release process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it super explicit, e.g., is_released_version
or something like that.
also, cc: #761
Example from ISS pipeline:
|
Added fields for:
-core dependencies
-git hash
-os information
Also refactored the LogEncoder to be in the new utils.logging file.