Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL's missing a scheme fail silently #112

Open
RobRoseKnows opened this issue Feb 2, 2019 · 5 comments
Open

URL's missing a scheme fail silently #112

RobRoseKnows opened this issue Feb 2, 2019 · 5 comments

Comments

@RobRoseKnows
Copy link

If I create a furl object without a scheme like so:

>>>from furl import furl
>>>x = furl('www.google.com')

It fails in weird ways that aren't exactly intuitive.

I would expect that furl should either raise an error here or create a URL without a scheme.

Instead I get weird results like:

>>> g.origin
'://'
>>> g.host
>>> g.netloc
@gruns
Copy link
Owner

gruns commented Feb 6, 2019

This admittedly head scratching behavior occurs because it's ambiguous whether
furl should interpret www.google.com as a domain (your intention) or a path.

See Issue #85 and my detailed answer therein here #85 (comment). Also Issue
#103.

I will forfend this issue in the future with TLD support, so furl will be able
to determine that www.google.com is a domain, not a path, because it ends with
.com, a TLD.

Does that answer your question?

@RobRoseKnows
Copy link
Author

Ah yes, that does explain the behavior. Is there a way to currently force furl to interpret as a domain if it's missing the delimiters? I'm using furl to parse URLs I'm scraping, and some of them have schemes, while some of them don't. Currently I've fixed this by just prepending "http://" to URLs missing "://" but that doesn't seem to be terribly robust.

@fearless0307
Copy link

Hi, I am thinking that can we add is_domain in init of furl class and according to that value prepend :// to the url.
I have changes the code in my local library and its work for me.

@fearless0307
Copy link

>>> f = furl('www.google.com', is_domain=True)
>>> {'url': '://www.google.com',
 'scheme': '',
 'username': None,
 'password': None,
 'host': 'www.google.com',
 'host_encoded': 'www.google.com',
 'port': None,
 'netloc': 'www.google.com',
 'origin': '://www.google.com',
 'path': {'encoded': '',
  'isdir': True,
  'isfile': False,
  'segments': [],
  'isabsolute': []},
 'query': {'encoded': '', 'params': []},
 'fragment': {'encoded': '',
  'separator': True,
  'path': {'encoded': '',
   'isdir': True,
   'isfile': False,
   'segments': [],
   'isabsolute': []},
  'query': {'encoded': '', 'params': []}}}

@gruns
Copy link
Owner

gruns commented Nov 26, 2019

This issue is married to the resolution of #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants