Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkout repos at latest tag by default #2

Merged
merged 7 commits into from
Jun 28, 2018
Merged

Checkout repos at latest tag by default #2

merged 7 commits into from
Jun 28, 2018

Conversation

jeanconn
Copy link
Contributor

Instead of checking out just the tip of master, checkout the most recent tag.

jeanconn added 2 commits June 23, 2018 11:24
This sets the git clone pieces to checkout at the latest (by commit date) tag
for the repos by default.  Should also tag a tag as an option for the individual
clone/fetches.  Also adds a few lines to try to get the repos via ssh from the
sot org if possible.
@jeanconn
Copy link
Contributor Author

For an idea of which packages need some tag updates, here's the output with my silly print statements.

In [1]: run scripts/clone_ska_sources all
Cloning source Ska.Shell.
Auto-checked out at 3.3.1 NOT AT tip of master
Cloning source Ska.File.
Auto-checked out at 3.4.1 NOT AT tip of master
Cloning source pyyaks.
Auto-checked out at 3.3.4 NOT AT tip of master
Cloning source ska_path.
Auto-checked out at 3.1 NOT AT tip of master
Cloning source testr.
Auto-checked out at 3.2 which is also tip of master
Cloning source Ska.tdb.
Auto-checked out at 3.5.1 NOT AT tip of master
Cloning source Chandra.Time.
Auto-checked out at 3.20.1 NOT AT tip of master
Cloning source Ska.ParseCM.
Auto-checked out at 3.3.1 which is also tip of master
Cloning source Ska.DBI.
Auto-checked out at 3.8.2 NOT AT tip of master
Cloning source Ska.ftp.
Auto-checked out at 3.4.3 NOT AT tip of master
Cloning source Ska.Numpy.
Auto-checked out at 3.8.1 NOT AT tip of master
Cloning source Quaternion.
Auto-checked out at 3.4.1 which is also tip of master
Cloning source Ska.engarchive.
Auto-checked out at 3.43 NOT AT tip of master
Cloning source kadi.
Auto-checked out at 3.15.2 NOT AT tip of master
Cloning source Ska.Matplotlib.
Auto-checked out at 3.11.2 NOT AT tip of master
Cloning source Ska.quatutil.
Auto-checked out at 3.3.1 NOT AT tip of master
Cloning source Ska.Sun.
Auto-checked out at 3.5 which is also tip of master
Cloning source Chandra.Maneuver.
Auto-checked out at 3.7 which is also tip of master
Cloning source cmd_states.
Auto-checked out at 3.14 which is also tip of master
Cloning source xija.
Auto-checked out at 3.9 which is also tip of master
Cloning source maude.
Auto-checked out at 3.1 NOT AT tip of master

# I suppose we could also use github to get the most recent release (not tag)
if tag is None:
tags = sorted(repo.tags, key=lambda t: t.commit.committed_datetime)
repo.git.checkout(tags[-1].name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously just one way of going about this. I'm also wondering now if the {{ GIT_DESCRIBE_TAG }} method can break if the tag isn't formatted in a way that conda will like. I note their docs say "Conda acknowledges PEP 440" but I don't really know what acknowledges means in that context. We can obviously just update tags to work if they are broken, so that's just something to keep in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have any release tags that are not compliant then they should be fixed. From your previous output it appears this is a non issue.

@taldcroft
Copy link
Member

This is going in the right direction. Overall I feel like the original implementation and API can be simplified. @jzuhone maybe had some different uses in mind, but I feel like there basically just needs to be three methods, __init__, _get_repo(self, name, tag=None), and build_packages(self, name=None, tag=None).

_get_repo either clones a fresh copy or uses the existing repo, and takes care of the work of fetching / pulling to make sure the return repo is at the correct tag (or latest if no tag is supplied). I don't immediately see a need an API for separately cloning / updating, that just happens as a matter of course when building.

build_packages builds either all the packages (if name is None) or builds the specified package. This can be done neatly with:

if name is None:
    names = [nm.strip() for nm in open(BUILD_LIST, "r") if nm.strip() and not nm.startswith('#')]
else:
    names = [name]
for name in names:
    repo = self._get_repo(name, tag)
    # Now do the few steps for building `repo`, probably checking to see if
    # the expected output conda package is already there.  This code is probably
    # short enough to just be in the loop here, but if not could be factored out to
    # self._build_package(repo)

except:
yml = os.path.join(pkg_defs_path, name, "meta.yaml")
with open(yml) as f:
requires = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is from #1, but what's going on with this pre-processing of the meta.yaml file? What fails if you just start from data = yaml.load(fh)? (PEP8, avoid using single-letter variable names. My go-to idiom here is fh for filehandle).

url = data['about']['home']
repo = git.Repo.clone_from(url, clone_path)
else:
repo = git.Repo(clone_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to do the equivalent of git fetch origin to get new tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Though I'm wondering if we need to think about the use cases and problems of the "already-existing" dir some more anyway .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We stipulate that the existing directory is created and maintained only via this script. Problems then seem extremely unlikely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so do we need a way to build a custom/test package? Or would we just do that outside this tool?

Copy link
Member

@taldcroft taldcroft Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to handle this procedurally by making a release candidate tag such as 3.43.4rc1. Apart from testing the packaging itself, one can do functional/integration testing using pip install from the development git repo. I.e. spin up a dev Ska3, conda uninstall the package in question, pip install the dev version, then test. In that case an RC tag is not needed. But for the full process one can make the tag, build the package locally, and conda install using the local build path. Since "conda acknowledges PEP440", this should work.

But let's defer further specifics of this use case for a future PR if needed, it will be easier based on an already-working system. I'll stub a placeholder into the process wiki.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So adding get fetch origin seems fine if you want to reuse the repos dirs over some period of time and want to make sure you have the newest tags at origin. We could also just clone the repos fresh for every run of the tool and remove this code path, but that does seem a little annoying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I envision an "official" build process for production, done as aca user, that uses a permanent location like /proj/sot/ska3/conda-builds, or something like that. Fetching and pulling is definitely faster than re-cloning the whole thing every time.

# tags = sorted(repo.tags, key=lambda t: t.tag.tagged_date)
# I suppose we could also use github to get the most recent release (not tag)
if tag is None:
tags = sorted(repo.tags, key=lambda t: t.commit.committed_datetime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -11,32 +11,54 @@ class SkaBuilder(object):

def __init__(self, ska_root=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be conda_build_root='.'. Using ska_root is confusing to me since I think that should be $SKA, which in general is a configured (directory). Historically we did build there, but that is considered to be a mistake. Also, just use convention of tools that default to writing in the current directory unless told otherwise. It makes debugging and usage by multiple users go more smoothly.

Then there could be a production process that uses one special place, explicitly specified in a cron job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with build_root instead of conda_build_root but can obviously just edit again.

@@ -11,32 +11,54 @@ class SkaBuilder(object):

def __init__(self, ska_root=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add user='sot', git_repo_path='git@github.com:{user}/{name}.git' kwargs (and set corresponding instance attrs).

# Try ssh first to avoid needing passwords for the private repos
# We could add these ssh strings to the meta.yaml for convenience
try:
git_ssh_path = 'git@github.com:sot/' + name + '.git'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo = git.Repo.clone_from(self.git_repo_path.format(user=self.user, name=name), clone_path)

@taldcroft
Copy link
Member

With the changes here, it seems this is not so hardwired to building Ska and instead it is more about generally maintaining a list of conda packages. If you move these into the SkaBuilder init then this can really be used to build anything given a directory of build recipes and a spec of the package order.

pkg_defs_path = os.path.join(ska_conda_path, "pkg_defs")
build_list = os.path.join(ska_conda_path, "build_order.txt")

@taldcroft
Copy link
Member

The lines if name == "ska" should be replaced with generic "determine if this is a metapackage" code. I'm not sure of the most robust method there, but it seems if the meta.yaml has no build instructions then that is a good indicator? Or no build.* file in the directory.

@taldcroft
Copy link
Member

For an idea of which packages need some tag updates, here's the output with my silly print statements.

It turns out most of those NOT AT tip of master are because of the sot-wide update to add license lines to code files. I have done a scrub of that list and added appropriate release tags to the four actual cases where the tag doesn't match what is in skare/pkgs.manifest (py3).

@jeanconn
Copy link
Contributor Author

At some point we probably also need to figure out build(er) requirements. ska_builder is presently using the git Python module which I don't think is in our current vanilla/non-ska conda env.

@taldcroft
Copy link
Member

taldcroft commented Jun 25, 2018

At some point we probably also need to figure out build(er) requirements. ska_builder is presently using the git Python module which I don't think is in our current vanilla/non-ska conda env.

No worries. This is now captured in the process doc.

else:
repo = git.Repo(clone_path)
repo.remotes.origin.fetch()
repo.remotes.origin.fetch("--tags")
Copy link
Contributor Author

@jeanconn jeanconn Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell, the fetching of tags behavior seems to differ a bit based on git version, and I'm not sure about gitpython at all. Using both fetch and fetch --tags seemed, at worse, duplication.
I think that fetching from 'origin' will be appropriate in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use conda git for this right? That should make behavior uniform. Git fetch origin alone will always get the tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gitpython gets the git in your path. And yes, conda git should probably work as the thing it gets in the path. Should get added to requirements (and I don't recall which environments/installs had issues with https). Just when trying to figure out how you do a fetch with gitpython, I had also seen the language that "most" tags should be reachable via git fetch and hadn't figured the exceptions yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that we need "--tags" to be safe if a tag has actually changed on origin. For example, I retagged skare3 repo and I wasn't getting the associated new commit until adding this back again. I know this shouldn't come up (you should re-use a tag), but...

Could also delete all local tags before the fetch, but that seems more problematic.

self.ska_build_dir, "--no-test",
"--no-anaconda-upload"]
self.build_dir, "--no-test",
"--no-anaconda-upload", "--skip-existing"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the --skip-existing option is smart enough that it skips if you have an existing build at the requested version (not just a package built with the name) so this seems to just work for our use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For verification of the behavior, I had just done a single test of this outside this code:

  • built Ska.Shell at 3.3.1
  • tried to build again with --skip-existing, but no build was done
  • checked out the repo at 3.3.2
  • tried to rebuild with --skip-existing and it built 3.3.2

Hopefully there are no gotchas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not going to help conda to "know" when any meta packages should be rebuilt, so if we want that automated, we will need to figure out a mechanism.

subprocess.run(cmd_list)

def build_one_package(self, name):
repo = self._get_repo(name)
if repo is not None:
repo.remote().pull()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cut the pulls and the pull version checks below in favor of that --skip-existing option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pull was never needed as long as you do a fetch and subsequent checkout at the desired tag.

@@ -59,30 +81,20 @@ def _build_package(self, name):
print("Building package %s." % name)
pkg_path = os.path.join(pkg_defs_path, name)
cmd_list = ["conda", "build", pkg_path, "--croot",
self.ska_build_dir, "--no-test",
"--no-anaconda-upload"]
self.build_dir, "--no-test",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may take a little longer to build, but I'd prefer to fix the packages as needed so we can actually run the tests if provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate, I don't understand this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to run the tests and fix any tests that need fixing, instead of skipping the tests with "--no-test".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and testing or not, I think we should check the status of the conda build command and do something with it (stop or save to report at the end).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know I'm not a big fan of build time testing, but we can discuss later when we have the core requirements all in place.

👍 on checking the build command status. Just call with check=True, timeout=120 so it raises an exception if anything went wrong or it takes too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a test option for the build? Mentioning because we haven't defined a process to make new recipes. If we just plop them in the repo, the easiest way to determine they are complete and correct is to run the tests. For example, by running the build tests, I just discovered that the maude recipe needs to have the requests package added as a runtime dependency/requirement. Of course, I could run the build and build tests outside the ska_builder process, but I think that might be over-complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, that timeout won't work for some builds, so I'm not sure if it would be better to just not define one for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, make it an option that is disabled by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on not having the timeout.

self.user = user
self.git_repo_path = git_repo_path
self.build_dir = os.path.abspath(os.path.join(build_root, "builds"))
self.src_dir = os.path.abspath(os.path.join(build_root, "src"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there would be any other issues using abspath when trying to work somewhat relatively (using a build root explicitly defined as "." or something else relative), but this seems to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using abspath tends to make logging output cruftier than required, but otherwise should almost always be OK.

self.git_repo_path = git_repo_path
self.build_dir = os.path.abspath(os.path.join(build_root, "builds"))
self.src_dir = os.path.abspath(os.path.join(build_root, "src"))
os.environ["SKA_TOP_SRC_DIR"] = self.src_dir
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to use this as a somewhat global environment variable, or pass it to conda build as an env in the subprocess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't matter.

@taldcroft
Copy link
Member

Do you see any need for the clone_ska_sources script? I don't.

@jeanconn
Copy link
Contributor Author

It was helpful in testing/modifying the process to have the script to separate the two tasks but can probably be safely removed when we're done.

@jeanconn
Copy link
Contributor Author

I think I'd like to merge this and do some other changes in separate/smaller PRs. Were there any outstanding issues you really wanted to see addressed in this PR @taldcroft ?

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for merge!

@jeanconn jeanconn merged commit 9c2c7cd into master Jun 28, 2018
@jeanconn jeanconn deleted the last_tag branch June 28, 2018 19:25
javierggt added a commit that referenced this pull request Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants