Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Prefer requests over urllib2 #3856

Closed
wants to merge 2 commits into from

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Jun 11, 2013

At the moment we use urllib2 for http requests, this branch prefers to use requests if it's installed. For one thing it means you can read gzipped json, which is more difficult in urllib2:

url = 'https://api.stackexchange.com/2.1/search?page=1&pagesize=10&order=desc&sort=activity&tagged=pandas&site=stackoverflow'
result = pd.read_json(url)  # in master this raises ValueError: Expected object or value

cc #3804 @jreback

Thoughts?

@hayd
Copy link
Contributor Author

hayd commented Jun 11, 2013

Probably needs some testing with and without requests installed (quite a few io tests skip for me travis).

resp = urllib2.urlopen(url)
if resp.code == 200:
lines = resp.read()
rs = read_csv(StringIO(bytes_to_str(lines)), index_col=0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, could put this as a helper function, also I seem to have missed out the bytes_to_str which is presumably there for a reason...

@jreback
Copy link
Contributor

jreback commented Jun 11, 2013

maybe a mention in install.rst of the requests module as an optional dep

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

might want to note also that read_html hasn't been tested with requests as well...

@hayd
Copy link
Contributor Author

hayd commented Jun 12, 2013

I had a go with html, but got some weird lxml error... (removed them in 2nd commit).

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

u have to catch Exception since you must be able to catch exceptions that are not due to invalid HTML and propagate them, even when lxml is not installed, the ValueError clause u changed probably caused the XMLSyntaxError to not be caught

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

it's a bit annoying since i actually have to call isinstance(e, XMLSyntaxError) but only when i know that lxml is installed (since that's where that is defined) thus all of the kludge around parsing...

@hayd
Copy link
Contributor Author

hayd commented Jun 12, 2013

Will have another play later in the week.

I actually had a similar thing in the request, checking for exceptions which the user may not have installed (I hid it behind an ImportError: pass). I definitely got into a muddle on that particular Exception...

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

i think there needs to be a mother-of-all-read-functions in io.common since there seems to be lots of repetition of the function with very tiny variations...i have an issue somewhere

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

yep...including the options NOT to read (e.g. I think html and read_csv want to read line-by-line), most others (e.g. json), io.data, etc. just need the buffer already read

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

html reads the buffer or punts the file-like object to the backend if possible. i think csv is the only iterable one so far

@hayd
Copy link
Contributor Author

hayd commented Jun 15, 2013

Do I need to worry about testing in Travis both with and without the requests module?

I'm concerned this might have some weird issues either way so push to 0.12?

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

there's the FULL_DEPS variable...check out the build matrix on a past run i don't remember which runs it's used on.

@jreback
Copy link
Contributor

jreback commented Jun 15, 2013

you could add to all if u want
update ci/install.sh
and ci/print_versions

@jreback
Copy link
Contributor

jreback commented Jul 26, 2013

see also #2426

@jreback
Copy link
Contributor

jreback commented Jul 26, 2013

related #3811

@jreback
Copy link
Contributor

jreback commented Jul 26, 2013

@hayd going to resurrect this for primary web access (with fallback to urllib2)
?

and put it io/common if not already there

then need to change uses of web acces across io to use this

and then docs!

@jreback
Copy link
Contributor

jreback commented Jul 27, 2013

see #4140 Google BigQuery is dependent on httplib2 is that a viable alternative here?

@hayd
Copy link
Contributor Author

hayd commented Jul 27, 2013

@jreback will have a loook

@hayd
Copy link
Contributor Author

hayd commented Aug 26, 2013

Not sure what to do with this one (certainly a lot has changed in io.common)

@hayd hayd closed this Aug 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants