-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Peeking at the beginning of the body (without getting the whole thing) #215
Comments
Depending upon server support, the This retrieves the first 10 bytes: require 'mechanize'
agent = Mechanize.new
page = agent.get "http://localhost", [], nil, 'Range' => 'bytes=0-9'
p page.body Apache and WEBrick support range requests, google does not. For one of the servers I tested, gzip encoding caused an indecipherable response body. I'm not sure if it's a bug of the server or not, though. Mechanize does not copy headers across redirects at this time, I will fix that. Adding fetching of the first 400 bytes is possible, but I'm unsure about how to create good API for it. |
Cool! Thanks, Dr. B! The range query is a good stop-gap measure. (Good for people who want to resume downloads, as well.) I think the API for reading the first n bytes could be really simple -- just yield the body a chunk at a time as the data streams in. If the block terminates, then the transfer could also terminate. Something like: agent.get(whatever).stream_body(blocksize=4096) do |chunk|
puts "YAY I GOT A CHUNK!! (#{chunk.size} bytes)"
break
end Would that be possible, given Mechanize's current structure? |
The greater problem is designing a good API for this. Changing Mechanize#get to stream if a block is given seems too constricting. Yielding a response object which the user can use to stream like Net::HTTP streaming seems too complicated for Mechanize: connection.request req do |res|
res.read_body do |chunk|
# …
end
end So I think different API entirely would be needed. The technical problems are minor, I foresee having to deal with content-encoding compression (streaming decompression must be added) and proper shutdown of persistent connections (using this feature may reduce performance). What is your use case for only retrieving the first kilobyte of data? |
This may be related to what I'm missing: can I use it to do a GET request but without ever touching even a bit from body? And no, HEAD request won't do. Also, I want Mechanize to handle all the redirects/authentication/cookies/whatnot - In other words, return me the response as it is at the latest point before Mechanize would start leeching body. Why: |
Plus, I may check size via content-length, or maybe some hash if the backend server provides it. |
I've found myself recently needing the ability to read just the beginning of an HTTP response (just the first 1k), and it doesn't seem like Mechanize is currently outfitted to do this.
Pluggable Parsers let me get a
body_io
, so I tried to read just the first 400 bytes and then close it. No dice, unfortunately! Mechanize ends up downloading the whole thing!Is there an easy way to do this? Or would this be a nasty hack?
The text was updated successfully, but these errors were encountered: