Skip to content

Commit

Permalink
Implement downloads (#416)
Browse files Browse the repository at this point in the history
* chore: Update puma, jquery for tests

* feat: Implement downloads
  • Loading branch information
route authored Nov 8, 2023
1 parent 46857b4 commit 14a0c6c
Show file tree
Hide file tree
Showing 19 changed files with 291 additions and 84 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
### Added
- `Ferrum::Page#disable_javascript` disables the JavaScript from the HTML source
- `Ferrum::Page#set_viewport` emulates the viewport
- `Ferrum::Downloads`
- `#files` information about downloaded files
- `#wait` wait for file download to be completed
- `#set_behavior` where and whether to store file

### Changed

Expand Down
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ gem "chunky_png", "~> 1.3"
gem "image_size", "~> 2.0"
gem "kramdown", "~> 2.0", require: false
gem "pdf-reader", "~> 2.2"
gem "puma", "~> 4.1"
gem "puma", ">= 5.6.7"
gem "rake", "~> 13.0"
gem "redcarpet", require: false, platform: :mri
gem "rspec", "~> 3.8"
Expand Down
85 changes: 65 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ based on Ferrum and Mechanize.
* [Navigation](https://github.com/rubycdp/ferrum#navigation)
* [Finders](https://github.com/rubycdp/ferrum#finders)
* [Screenshots](https://github.com/rubycdp/ferrum#screenshots)
* [Cleaning Up](https://github.com/rubycdp/ferrum#cleaning-up)
* [Network](https://github.com/rubycdp/ferrum#network)
* [Downloads](https://github.com/rubycdp/ferrum#downloads)
* [Proxy](https://github.com/rubycdp/ferrum#proxy)
* [Mouse](https://github.com/rubycdp/ferrum#mouse)
* [Keyboard](https://github.com/rubycdp/ferrum#keyboard)
Expand All @@ -49,6 +49,7 @@ based on Ferrum and Mechanize.
* [Animation](https://github.com/rubycdp/ferrum#animation)
* [Node](https://github.com/rubycdp/ferrum#node)
* [Tracing](https://github.com/rubycdp/ferrum#tracing)
* [Clean Up](https://github.com/rubycdp/ferrum#clean-up)
* [Thread safety](https://github.com/rubycdp/ferrum#thread-safety)
* [Development](https://github.com/rubycdp/ferrum#development)
* [Contributing](https://github.com/rubycdp/ferrum#contributing)
Expand Down Expand Up @@ -411,25 +412,6 @@ browser.mhtml(path: "google.mhtml") # => 87742
```


## Cleaning Up

#### reset

Closes browser tabs opened by the `Browser` instance.

```ruby
# connect to a long-running Chrome process
browser = Ferrum::Browser.new(url: 'http://localhost:9222')

browser.go_to("https://github.com/")

# clean up, lest the tab stays there hanging forever
browser.reset

browser.quit
```


## Network

`browser.network`
Expand Down Expand Up @@ -608,6 +590,50 @@ Toggles ignoring cache for each request. If true, cache will not be used.
browser.network.cache(disable: true)
```


## Downloads

`browser.downloads`

#### files `Array<Hash>`

Returns all information about downloaded files as a `Hash`.

```ruby
browser.go_to("http://localhost/attachment.pdf")
browser.downloads.files # => [{"frameId"=>"E3316DF1B5383D38F8ADF7485005FDE3", "guid"=>"11a68745-98ac-4d54-9b57-9f9016c268b3", "url"=>"http://localhost/attachment.pdf", "suggestedFilename"=>"attachment.pdf", "totalBytes"=>4911, "receivedBytes"=>4911, "state"=>"completed"}]
```

#### wait(timeout)

Waits until the download is finished.

```ruby
browser.go_to("http://localhost/attachment.pdf")
browser.downloads.wait
```

or

```ruby
browser.go_to("http://localhost/page")
browser.downloads.wait { browser.at_css("#download").click }
```

#### set_behavior(\*\*options)

Sets behavior in case of file to be downloaded.

* options `Hash`
* :save_path `String` absolute path of where to store the file
* :behavior `Symbol` `deny | allow | allowAndName | default`, `allow` by default

```ruby
browser.go_to("https://example.com/")
browser.downloads.set_behavior(save_path: "/tmp", behavior: :allow)
```


## Proxy

You can set a proxy with a `:proxy` option:
Expand Down Expand Up @@ -1210,6 +1236,25 @@ Accepts block, records trace and by default returns trace data from `Tracing.tra
only one trace config can be active at a time per browser.


## Clean Up

#### reset

Closes browser tabs opened by the `Browser` instance.

```ruby
# connect to a long-running Chrome process
browser = Ferrum::Browser.new(url: 'http://localhost:9222')

browser.go_to("https://github.com/")

# clean up, lest the tab stays there hanging forever
browser.reset

browser.quit
```


## Thread safety ##

Ferrum is fully thread-safe. You can create one browser or a few as you wish and
Expand Down
2 changes: 1 addition & 1 deletion lib/ferrum/browser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class Browser
delegate %i[go_to goto go back forward refresh reload stop wait_for_reload
at_css at_xpath css xpath current_url current_title url title
body doctype content=
headers cookies network
headers cookies network downloads
mouse keyboard
screenshot pdf mhtml viewport_size device_pixel_ratio
frames frame_by main_frame
Expand Down
60 changes: 60 additions & 0 deletions lib/ferrum/downloads.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# frozen_string_literal: true

module Ferrum
class Downloads
VALID_BEHAVIOR = %i[deny allow allowAndName default].freeze

def initialize(page)
@page = page
@event = Event.new.tap(&:set)
@files = {}
end

def files
@files.values
end

def wait(timeout = 5)
@event.reset
yield if block_given?
@event.wait(timeout)
@event.set
end

def set_behavior(save_path:, behavior: :allow)
raise ArgumentError unless VALID_BEHAVIOR.include?(behavior.to_sym)
raise Error, "supply absolute path for `:save_path` option" unless Pathname.new(save_path.to_s).absolute?

@page.command("Browser.setDownloadBehavior",
browserContextId: @page.context.id,
downloadPath: save_path,
behavior: behavior,
eventsEnabled: true)
end

def subscribe
subscribe_download_will_begin
subscribe_download_progress
end

def subscribe_download_will_begin
@page.on("Browser.downloadWillBegin") do |params|
@event.reset
@files[params["guid"]] = params
end
end

def subscribe_download_progress
@page.on("Browser.downloadProgress") do |params|
@files[params["guid"]].merge!(params)

case params["state"]
when "completed", "canceled"
@event.set
else
@event.reset
end
end
end
end
end
17 changes: 17 additions & 0 deletions lib/ferrum/event.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# frozen_string_literal: true

module Ferrum
class Event < Concurrent::Event
def iteration
synchronize { @iteration }
end

def reset
synchronize do
@iteration += 1
@set = false if @set
@iteration
end
end
end
end
44 changes: 16 additions & 28 deletions lib/ferrum/page.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# frozen_string_literal: true

require "forwardable"
require "ferrum/event"
require "ferrum/mouse"
require "ferrum/keyboard"
require "ferrum/headers"
require "ferrum/cookies"
require "ferrum/dialog"
require "ferrum/network"
require "ferrum/downloads"
require "ferrum/page/frames"
require "ferrum/page/screenshot"
require "ferrum/page/animation"
Expand All @@ -18,20 +20,6 @@ module Ferrum
class Page
GOTO_WAIT = ENV.fetch("FERRUM_GOTO_WAIT", 0.1).to_f

class Event < Concurrent::Event
def iteration
synchronize { @iteration }
end

def reset
synchronize do
@iteration += 1
@set = false if @set
@iteration
end
end
end

extend Forwardable
delegate %i[at_css at_xpath css xpath
current_url current_title url title body doctype content=
Expand Down Expand Up @@ -71,6 +59,11 @@ def reset
# @return [Cookies]
attr_reader :cookies

# Downloads object.
#
# @return [Downloads]
attr_reader :downloads

def initialize(target_id, browser, proxy: nil)
@frames = Concurrent::Map.new
@main_frame = Frame.new(nil, self)
Expand All @@ -91,6 +84,7 @@ def initialize(target_id, browser, proxy: nil)
@cookies = Cookies.new(self)
@network = Network.new(self)
@tracing = Tracing.new(self)
@downloads = Downloads.new(self)

subscribe
prepare_page
Expand All @@ -114,8 +108,10 @@ def go_to(url = nil)
options = { url: combine_url!(url) }
options.merge!(referrer: referrer) if referrer
response = command("Page.navigate", wait: GOTO_WAIT, **options)
error_text = response["errorText"]
raise StatusError.new(options[:url], "Request to #{options[:url]} failed (#{error_text})") if error_text
error_text = response["errorText"] # https://cs.chromium.org/chromium/src/net/base/net_error_list.h
if error_text && error_text != "net::ERR_ABORTED" # Request aborted due to user action or download
raise StatusError.new(options[:url], "Request to #{options[:url]} failed (#{error_text})")
end

response["frameId"]
rescue TimeoutError
Expand Down Expand Up @@ -259,9 +255,9 @@ def forward
history_navigate(delta: 1)
end

def wait_for_reload(sec = 1)
def wait_for_reload(timeout = 1)
@event.reset if @event.set?
@event.wait(sec)
@event.wait(timeout)
@event.set
end

Expand Down Expand Up @@ -356,6 +352,7 @@ def document_node_id
def subscribe
frames_subscribe
network.subscribe
downloads.subscribe

if @browser.options.logger
on("Runtime.consoleAPICalled") do |params|
Expand Down Expand Up @@ -398,16 +395,7 @@ def prepare_page
end
end

if @browser.options.save_path
unless Pathname.new(@browser.options.save_path).absolute?
raise Error, "supply absolute path for `:save_path` option"
end

@browser.command("Browser.setDownloadBehavior",
browserContextId: context.id,
downloadPath: @browser.options.save_path,
behavior: "allow", eventsEnabled: true)
end
downloads.set_behavior(save_path: @browser.options.save_path) if @browser.options.save_path

@browser.extensions.each do |extension|
command("Page.addScriptToEvaluateOnNewDocument", source: extension)
Expand Down
1 change: 1 addition & 0 deletions spec/browser_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@
skip "https://bugs.chromium.org/p/chromium/issues/detail?id=1444729" if browser.headless_new?

browser.go_to("/#{filename}")
browser.downloads.wait

expect(File.exist?("#{save_path}/#{filename}")).to be true
ensure
Expand Down
Loading

0 comments on commit 14a0c6c

Please sign in to comment.