Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve relative urls in catalog references #35

Merged
merged 4 commits into from
Jul 3, 2015

Conversation

ScottWales
Copy link

The leading '/' in catalog references is needed to construct the full URL path from a catalog reference.

For instance, in the catalog http://thredds-test.unidata.ucar.edu/thredds/testDatasets.xml the reference TestFmrc points to /thredds/catalog/fmrc/ecmwf/global_2p5/catalog.xml. Resolving the relative URL should give http://thredds-test.unidata.ucar.edu/thredds/catalog/fmrc/ecmwf/global_2p5/catalog.xml. Removing the /, as is being done in CatalogRef.__init__(), means the URL is constructed relative to the current catalog path, rather than being relative to the hostname.

Scott Wales added 3 commits July 2, 2015 12:30
It should be possible to connect to a catalog reference using the base
catalog's URL and the reference's href
The leading '/' in reference URLs is needed in order to properly
construct the full URL from a relative URL
@dopplershift
Copy link
Member

In principle I have no problem with this, but I see @lesserwhirls is responsible for the lines in question, so I'd like a second from him.

Thanks for the contribution and tests!

@lesserwhirls
Copy link
Collaborator

First off, I echo @dopplershift - thank you @ScottWales for the contribution and tests!

It's a minor change, and I don't see anything wrong there. At some point soon, however, we should probably look at the module urlparse (python2, in 3 it is urllib.parse) to handle joining the fragments of the url.

We can accept this as is as far as I am concerned just to get it working now. That said - @ScottWales, are you feeling frisky, because you could try to use urlparse in the fix to see how it goes.

Python 2:

import urlparse
urlparse.urljoin(base, catalog_ref)

Python 3:

import urllib.parse
urllib.parse.urljoin(base, catalog_ref)

Note that catalog_ref can be a relative path, or a fully qualified path. If relative, it is joined to base; if fully qualified, it is used as is. So a bonus of using urlparse is that it will "do the right thing" under the hood if the for relative and fully qualified catalogRefs.

 * Resolve relative hrefs to absolute URLs using urllib
 * Add CatalogRef.follow() to get a TDSCatalog from a CatalogRef
@ScottWales
Copy link
Author

Happy to help. I've modified the pull request so that CatalogRef.href now stores the absolute URL to the target catalog, calculated using urljoin from the catalog's base URL.

I've also added the CatalogRef.follow() function mentioned in #20 since it has a trivial implementation, and updated the test case to use it.

@landscape-bot
Copy link

Code Health
Code quality remained the same when pulling 1f053a3 on ScottWales:relative-url into c846074 on Unidata:master.

@lesserwhirls
Copy link
Collaborator

That's great @ScottWales - thank you!

lesserwhirls added a commit that referenced this pull request Jul 3, 2015
Preserve relative urls in catalog references

fixes #35
address part one of #20
@lesserwhirls lesserwhirls merged commit 14f2af9 into Unidata:master Jul 3, 2015
lesserwhirls added a commit that referenced this pull request Jul 3, 2015
Preserve relative urls in catalog references

address part one of #20
@lesserwhirls lesserwhirls mentioned this pull request Jul 3, 2015
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants