-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EggStorage doesn't pick the latest version of the project with GIT #34
Comments
Agreed. Maybe we should make the GIT version sortable including the number of commits for example? |
Yeah, number of commits should work unless you retroactively squash some commits or reset the branch. Apparently git-describe's output includes number of commits since the last tag, but you need to have one. I think this should not be a problem for most projects, there just needs to be a big fat red warning in the documentation |
Yeah, I was thinking something along the lines of |
Just ran into this problem today. Is there any reason to maintain multiple eggs in the first place? Scrapyd doesn't offer you any way to schedule a spider to run using a specific version of an egg, so would it be easier to just have a single egg per project (or a symlink to the last-uploaded one)? |
I really simple workaround to this issue is to add an annotated tag to your git project, I tried this on https://github.com/scrapinghub/testspiders project:
the "3" above is the commit count since the last annotated tag, so it should work with scrapyd sorting issue. |
|
what if instead of relying on version sorting we add a symlink that points to the latest uploaded egg? |
will sorting by modified time of egg files work? umrashrf@a5ab226 |
@umrashrf yes, mtime sorting and symlinking are basically the same, both ignore version sorting and rely on latest uploaded egg. I prefer symlinking because it is O(1) and you can easily see in the filesystem to what version it is pointing. |
I am afraid I agree with @dangra, symlinking may be the way to go, it could also be used by all strategies. |
Right! +1 for symlinking. I am sure it is not as easy as changing def list(...) :) |
In practice, scrapyd support a single version. Should we simplify it and keep only one version? (with an atomic rename to prevent race conditions, ofc) |
👍 for single version. Keep things simple. |
I am re-regretting on myself here, using annotated tags works fine because scrapyd sorts the versions using
|
👍 in summary I prefer to fix the issue in docs instead of changing the behavior for everyone else not using GIT. |
Symlink idea implemented by #46. Feedback welcome! |
@jnv: The problem with single version is that you can't consider it a bugfix, it changes all scrapyd apis including public webservice api endpoints: I think versions were added to scrapyd so people can schedule jobs with specific versions, think about stable/testing/feature versions, but so far |
Implemented by #47 |
I noticed that if |
yes, it works like that because timetstamp versioning is LooseVersion sorteable. |
It will work even if "LooseVersion sorteable" was not there because time is sent in string of int. Do you think this will fix #45 until we have #46 and/or #47 merged? |
to fix #45 and still use GIT friendly versioning you just do #34 (comment) |
fixed by #47 |
When I deploy a project with
version = GIT
and later schedule the job, scrapyd sometimes picks some older version of the project.If I am correct, this is due to the fact, that
FilesystemEggStorage
sorts versions by eggs' filenames and then picks the last one. This is fine for Mercurial and timestamp versioning, since both are sequential. But Git hashes are not ordered, so sometimes the older project takes precedence.Since
scrapyd-deploy
picks Git version withgit describe --always
, the revisions are apparently supposed to be sequentially tagged (likev1.0
and so), eggs will be then ordered correctly. I think this fact is worth mentioning in the documentation, don't you think?The text was updated successfully, but these errors were encountered: