Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec extension proposal: "downloads" #487

Closed
AlexanderDzhoganov opened this issue Nov 29, 2014 · 13 comments · Fixed by #3877
Closed

Spec extension proposal: "downloads" #487

AlexanderDzhoganov opened this issue Nov 29, 2014 · 13 comments · Fixed by #3877
Labels
Enhancement New features or functionality Network Issues affecting internet connections of CKAN Spec Issues affecting the spec

Comments

@AlexanderDzhoganov
Copy link
Member

I propose that we implement an extension called 'downloads'

"downloads" : {
            "description" : "A list of alternative URLs where a mod can be downloaded by tools",
            "type"        : "array",
             "items"       : { "type" : "string" },
             "format"      : "uri"
}

For now this will mostly allow mirorred ckan metadata to continue to function if the mirror goes down among other things.

@AlexanderDzhoganov AlexanderDzhoganov added Spec Issues affecting the spec spec-extension labels Nov 29, 2014
@AlexanderDzhoganov AlexanderDzhoganov changed the title Extension proposal: x_downloads Spec extension proposal: "x_downloads" Nov 29, 2014
@AlexanderDzhoganov AlexanderDzhoganov added the Enhancement New features or functionality label Nov 29, 2014
@Ippo343
Copy link
Contributor

Ippo343 commented Dec 7, 2014

This issue is now particularly important at times like this when Kerbalstuff is down, making most mods in the index unavailable for download.

@TeddyDD
Copy link

TeddyDD commented Dec 9, 2014

Maybe it's better to change the type of "download" field to array in next spec version?

@Ippo343
Copy link
Contributor

Ippo343 commented Dec 9, 2014

I agree with TeddyDD: imho, "download" should take an array of download locations to try.

@pjf
Copy link
Member

pjf commented Dec 10, 2014

My brain hasn't properly spun up today, but things to consider with regards to multiple download locations are:

  1. What if the files are different in different locations?
  2. Internally, how do we cache these? Right now we cache by URL, but if we have multiple URLs for the same file, we need a way to check if we've already got the file from an alternate URL.
  3. What happens with mods which share a download path, and those have ended up with different download lists? This is especially common when a mod has been split into assets and config.

For 1, implementing checksums (#62) would help verify that what we downloaded is what we expect. For 2 and 3, one could always consider the first URL to be the canonical URL, and files are cached as if they were downloaded from there, even if they were not.

Point 3 could be potentially relieved by allowing mods to reference another mod's download section, rather than having to provide its own.

Although as mentioned in #489, the spec and CKAN core itself will never read, use, or validate any key starting with "x_", as they're specifically reserved for extensions which are not in the core, so this ticket could do with a rename (or a new ticket created) if we want to discuss extending the download field itself.

@TeddyDD
Copy link

TeddyDD commented Dec 10, 2014

  1. CKAN could recognize cached files by checksum if there would be more than one url in downloads (I assume checksums would be mandatory)

@pjf
Copy link
Member

pjf commented Dec 10, 2014

I assume checksums would be mandatory.

That would be nice, but also a barrier to humans writing the metadata themselves.

Having said that, caching by checksum if it exists is most definitely a good idea, and should be implemented if we add checksums to the spec.

Ref #62 .

@AlexanderDzhoganov AlexanderDzhoganov changed the title Spec extension proposal: "x_downloads" Spec extension proposal: "downloads" Dec 11, 2014
@pjf
Copy link
Member

pjf commented May 24, 2015

#935, while not superseding this, may certainly remove much of the need. :)

@pjf pjf added the Support Issues that are support requests label May 24, 2015
@TeddyDD
Copy link

TeddyDD commented May 24, 2015

That would be nice, but also a barrier to humans writing the metadata themselves.

This is simple to solve. We just need to add checksum calculation tool to netkan.exe (something like netkan.exe -checksum file.zip
Also Linux users already have sha1sum etc.

@netkan-bot
Copy link
Member

Hey there! I'm a fun-loving automated bot who's responsible for making sure old support tickets get closed out. As we haven't seen any activity on this ticket for a while, we're hoping the problem has been resolved and I'm closing out the ticket automaically. If I'm doing this in error, please add a comment to this ticket to let us know, and we'll re-open it!

@pjf pjf removed the Support Issues that are support requests label Jun 1, 2015
@dbent
Copy link
Member

dbent commented Feb 15, 2016

May as well re-open this as it's getting particularly relevant.

@dbent dbent reopened this Feb 15, 2016
@dbent
Copy link
Member

dbent commented Feb 15, 2016

My thoughts:

  • I support making download a string or array of string. (A string would obviously be treated as an array of string with a single element)
  • I support adding optional checksums and preferentially caching by the checksum. If there is no checksum, we'll cache by a hash of each download URL. This is not perfectly efficient since we could download duplicate copies of mods when we already have a local copy, but we should be striving to make checksums non-optional, so we can eventually rely only on that. I'll give additional thoughts in Support sha1/md5 fingerprints of downloaded files #62

@dbent
Copy link
Member

dbent commented Feb 15, 2016

Also, deprecate $kref so that a single mod can use multiple sources to automatically generate multiple download URLs.

Right now we have x_netkan_jenkins and will soon have x_netkan_github. The mere presence of these properties in a .netkan should indicate we want to use these sources.

For example:

{
    "x_netkan_jenkins": {
        "url": "https://ksp.sarbian.com/jenkins/job/ModuleManager/"
    },
    "x_netkan_github": {
        "user": "sarbian",
        "repo": "ModuleManager"
    }
}

Would generate multiple download URLs, one from Jenkins and one from GitHub. This would be trivial to do with how NetKAN's transformers work.

EDIT: Dealing with priorities may be a bit tricky though.

EDIT2: Thoughts on handling priorities:

Each source would specify a priority number, e.g.:

{
    "x_netkan_jenkins": {
        "url": "https://ksp.sarbian.com/jenkins/job/ModuleManager/",
        "priority": 1
    },
    "x_netkan_github": {
        "user": "sarbian",
        "repo": "ModuleManager",
        "priority": 2
    }
}

During transformation each source transformer would adds it download URL, size, and checksum to a temporary array:

{
    "x_netkan_downloads": [
        {
            "url": "https://jenkins.example/ModuleManager.zip",
            "checksum": {
                "sha256": "0123456789abcdef"
            },
            "size": 123456,
            "priority": 1
        },
        {
            "url": "https://github.example/ModuleManager.zip",
            "checksum": {
                "sha256": "0123456789abcdef"
            },
            "size": 123456,
            "priority": 2
        }
    ]
}

Then at the end will be a transformer which takes the x_netkan_downloads array and produces an ordered downloads array from it sorting them by the priority given in each. Multiple sources having the same priority would be logged as a warning. This download transformer would also check that each source has the same checksum and size. If they don't agree it would be an error and generation of the .ckan would be aborted. If they do agree it would write out the download_size and checksum that would be identical for every source.

EDIT3: An open question is what we should do if one download source fails during .ckan generation, abort the whole thing?

@TeddyDD
Copy link

TeddyDD commented Feb 15, 2016

@dbent 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New features or functionality Network Issues affecting internet connections of CKAN Spec Issues affecting the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants