Skip to content
Thibaut Barrère edited this page May 12, 2020 · 12 revisions

⚠️ This documentation is in draft mode.

Kiba Pro ParallelTransform provides an easy way to process a group of rows at the same time using a pool of threads.

In its current state, it is intended to accelerate ETL transforms doing IO operations such as HTTP requests, by going multithreaded instead of single threaded.

Currently tested against: MRI Ruby 2.4-2.7. Not tested strictly speaking against JRuby and TruffleRuby, yet, but will likely work equally (if it does not, get in touch!).

Requirements: add concurrent-ruby to your Gemfile.

Typical use

require 'kiba-pro/transforms/parallel_transform'

job = Kiba.parse do
  extend Kiba::Pro::Transforms::ParallelTransform::DSLExtension

  # SNIP

  parallel_transform(concurrency: 10) do |r|
    extra_data = get_extra_json_hash_from_http!(r.fetch(:extra_data_url))
    r.merge(extra_data: extra_data)
  end
  
  # SNIP
end

The parallel_transform call is actually a shortcut for:

transform Kiba::Pro::Transforms::ParallelTransform,
  concurrency: 10,
  on_row: -> (r) { ... transform code ... }

Technical notes

Exception handling

Handling timeouts

Working with Sidekiq