EggStorage doesn't pick the latest version of the project with GIT #34

jnv · 2014-01-19T19:49:52Z

When I deploy a project with version = GIT and later schedule the job, scrapyd sometimes picks some older version of the project.

If I am correct, this is due to the fact, that FilesystemEggStorage sorts versions by eggs' filenames and then picks the last one. This is fine for Mercurial and timestamp versioning, since both are sequential. But Git hashes are not ordered, so sometimes the older project takes precedence.

Since scrapyd-deploy picks Git version with git describe --always, the revisions are apparently supposed to be sequentially tagged (like v1.0 and so), eggs will be then ordered correctly. I think this fact is worth mentioning in the documentation, don't you think?

The text was updated successfully, but these errors were encountered:

pablohoffman · 2014-04-25T21:09:35Z

Agreed. Maybe we should make the GIT version sortable including the number of commits for example?

jnv · 2014-04-25T22:11:08Z

Yeah, number of commits should work unless you retroactively squash some commits or reset the branch. Apparently git-describe's output includes number of commits since the last tag, but you need to have one. I think this should not be a problem for most projects, there just needs to be a big fat red warning in the documentation

pablohoffman · 2014-04-25T22:34:18Z

Yeah, I was thinking something along the lines of git rev-list --count HEAD, and we definitely need to add a note to the documentation, warning about squashes.

Blender3D · 2014-05-03T04:00:43Z

Just ran into this problem today. Is there any reason to maintain multiple eggs in the first place? Scrapyd doesn't offer you any way to schedule a spider to run using a specific version of an egg, so would it be easier to just have a single egg per project (or a symlink to the last-uploaded one)?

dangra · 2014-05-07T15:47:53Z

I really simple workaround to this issue is to add an annotated tag to your git project, I tried this on https://github.com/scrapinghub/testspiders project:

$ git describe --always 
bd11f43
$ git tag -a v1.0 01f3d898cb99500bd933d729ca100396a27dd160
$ git describe --always 
v1.0-3-gbd11f43

the "3" above is the commit count since the last annotated tag, so it should work with scrapyd sorting issue.

dangra · 2014-05-07T15:59:44Z

~~hmm ta.. it won't work for a commit count greater than 9, sorry about that :)~~

dangra · 2014-05-07T17:25:25Z

what if instead of relying on version sorting we add a symlink that points to the latest uploaded egg?

umrashrf · 2014-05-08T10:03:40Z

will sorting by modified time of egg files work? umrashrf@a5ab226

dangra · 2014-05-08T13:18:06Z

@umrashrf yes, mtime sorting and symlinking are basically the same, both ignore version sorting and rely on latest uploaded egg. I prefer symlinking because it is O(1) and you can easily see in the filesystem to what version it is pointing.

jnv · 2014-05-08T13:27:52Z

I am afraid mtime can be very brittle to rely on, as it may be easily modified by an outside process. Furthermore there is already a timestamp versioning strategy, which is simpler and more robust – the only thing it's missing is a deduplication, which is guaranteed by commit ID (IMO no big deal).

I agree with @dangra, symlinking may be the way to go, it could also be used by all strategies.

umrashrf · 2014-05-08T14:02:50Z

Right! +1 for symlinking. I am sure it is not as easy as changing def list(...) :)

pablohoffman · 2014-05-08T14:26:25Z

In practice, scrapyd support a single version. Should we simplify it and keep only one version? (with an atomic rename to prevent race conditions, ofc)

jnv · 2014-05-08T15:23:50Z

👍 for single version. Keep things simple.

dangra · 2014-05-08T18:05:56Z

hmm ta.. it won't work for a commit count greater than 9, sorry about that :)

I am re-regretting on myself here, using annotated tags works fine because scrapyd sorts the versions using distutils.version.LooseVersion

>>> LooseVersion('v1.0-2-b') < LooseVersion('v1.0-10-a')
True

dangra · 2014-05-08T18:07:32Z

I think this fact is worth mentioning in the documentation, don't you think?

👍 in summary I prefer to fix the issue in docs instead of changing the behavior for everyone else not using GIT.

dangra · 2014-05-09T02:32:03Z

Symlink idea implemented by #46. Feedback welcome!

dangra · 2014-05-09T02:39:22Z

👍 for single version. Keep things simple.

@jnv: The problem with single version is that you can't consider it a bugfix, it changes all scrapyd apis including public webservice api endpoints: listversions.json and deleteversion.json

I think versions were added to scrapyd so people can schedule jobs with specific versions, think about stable/testing/feature versions, but so far schedule.json api doesn't support passing a version so removing "versions" won't affect users (in theory).

dangra · 2014-05-09T03:21:29Z

I think this fact is worth mentioning in the documentation, don't you think?
in summary I prefer to fix the issue in docs instead of changing the behavior for everyone else not using GIT.

Implemented by #47

umrashrf · 2014-05-12T09:11:53Z

I noticed that if version = is set in scrapy.cfg which is empty string then time.time() is sent to scrapyd to sort egg files which will work like #46. Right?

dangra · 2014-05-12T12:45:26Z

I noticed that if version = is set in scrapy.cfg which is empty string then time.time() is sent to scrapyd to sort egg files which will work like #46. Right?

yes, it works like that because timetstamp versioning is LooseVersion sorteable.

umrashrf · 2014-05-12T18:04:43Z

It will work even if "LooseVersion sorteable" was not there because time is sent in string of int.

Do you think this will fix #45 until we have #46 and/or #47 merged?

dangra · 2014-05-12T18:08:29Z

to fix #45 and still use GIT friendly versioning you just do #34 (comment)

dangra · 2014-05-26T14:41:09Z

fixed by #47

pablohoffman mentioned this issue Apr 30, 2014

Inconsistent spiders among different servers #45

Closed

dangra mentioned this issue May 9, 2014

Use most recently uploaded egg by default ignoring versioning #46

Closed

dangra mentioned this issue May 9, 2014

fix GIT versioning for projects without annotated tags #47

Merged

dangra closed this as completed May 26, 2014

dangra mentioned this issue Oct 6, 2014

If using GIT as version, scrapyd will not execute your latest version #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EggStorage doesn't pick the latest version of the project with GIT #34

EggStorage doesn't pick the latest version of the project with GIT #34

jnv commented Jan 19, 2014

pablohoffman commented Apr 25, 2014

jnv commented Apr 25, 2014

pablohoffman commented Apr 25, 2014

Blender3D commented May 3, 2014

dangra commented May 7, 2014

dangra commented May 7, 2014

dangra commented May 7, 2014

umrashrf commented May 8, 2014

dangra commented May 8, 2014

jnv commented May 8, 2014

umrashrf commented May 8, 2014

pablohoffman commented May 8, 2014

jnv commented May 8, 2014

dangra commented May 8, 2014

dangra commented May 8, 2014

dangra commented May 9, 2014

dangra commented May 9, 2014

dangra commented May 9, 2014

umrashrf commented May 12, 2014

dangra commented May 12, 2014

umrashrf commented May 12, 2014

dangra commented May 12, 2014

dangra commented May 26, 2014

EggStorage doesn't pick the latest version of the project with GIT #34

EggStorage doesn't pick the latest version of the project with GIT #34

Comments

jnv commented Jan 19, 2014

pablohoffman commented Apr 25, 2014

jnv commented Apr 25, 2014

pablohoffman commented Apr 25, 2014

Blender3D commented May 3, 2014

dangra commented May 7, 2014

dangra commented May 7, 2014

dangra commented May 7, 2014

umrashrf commented May 8, 2014

dangra commented May 8, 2014

jnv commented May 8, 2014

umrashrf commented May 8, 2014

pablohoffman commented May 8, 2014

jnv commented May 8, 2014

dangra commented May 8, 2014

dangra commented May 8, 2014

dangra commented May 9, 2014

dangra commented May 9, 2014

dangra commented May 9, 2014

umrashrf commented May 12, 2014

dangra commented May 12, 2014

umrashrf commented May 12, 2014

dangra commented May 12, 2014

dangra commented May 26, 2014