Skip to content

Commit

Permalink
bug(DataExport) - Decrease batch size for data export & use tempfile (#…
Browse files Browse the repository at this point in the history
…2786)

## Description

when running sidekiq with high amount of concurrency, the default batch
size could result in large amounts of longer running jobs. These could
make a worker consume too much memory.

This PR aims to fine-tune the batch size by reducing it by a factor 5
(so jobs will be faster and consume less memory individually). It will
also not keep the entire CSV in memory, but rather use a tempfile.
  • Loading branch information
nudded authored Nov 7, 2024
1 parent 2a0f21a commit a677283
Show file tree
Hide file tree
Showing 6 changed files with 26 additions and 6 deletions.
2 changes: 1 addition & 1 deletion app/services/data_exports/csv/invoice_fees.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def initialize(
end

def call
result.csv_lines = ::CSV.generate(headers: false) do |csv|
result.csv_file = with_csv do |csv|
invoices.each do |invoice|
serialized_invoice = serializer_klass.new(invoice).serialize

Expand Down
10 changes: 9 additions & 1 deletion app/services/data_exports/csv/invoices.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def initialize(data_export_part:, serializer_klass: V1::InvoiceSerializer)
end

def call
result.csv_lines = ::CSV.generate(headers: false) do |csv|
result.csv_file = with_csv do |csv|
invoices.each do |invoice|
csv << serialized_invoice(invoice)
end
Expand Down Expand Up @@ -53,6 +53,14 @@ def self.headers

private

def with_csv
tempfile = Tempfile.create([data_export_part.id, ".csv"])
yield CSV.new(tempfile, headers: false)

tempfile.rewind
tempfile
end

attr_reader :data_export_part, :serializer_klass, :output, :batch_size

def serialized_invoice(invoice)
Expand Down
2 changes: 1 addition & 1 deletion app/services/data_exports/export_resources_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ module DataExports
class ExportResourcesService < BaseService
EXPIRED_FAILURE_MESSAGE = 'Data Export already expired'
PROCESSED_FAILURE_MESSAGE = 'Data Export already processed'
DEFAULT_BATCH_SIZE = 100
DEFAULT_BATCH_SIZE = 20

ResourceTypeNotSupportedError = Class.new(StandardError)

Expand Down
7 changes: 6 additions & 1 deletion app/services/data_exports/process_part_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,15 @@ def initialize(data_export_part:)

def call
result.data_export_part = data_export_part
return result if data_export_part.completed

# produce CSV lines into StringIO
export_result = data_export.export_class.call(data_export_part:).raise_if_error!
data_export_part.update!(csv_lines: export_result.csv_lines, completed: true)
file = export_result.csv_file
data_export_part.update!(csv_lines: file.read, completed: true)
# Explicitely close and unlink the file
file.close
File.unlink(file.path)

# check if we are the last one to finish
if last_completed
Expand Down
6 changes: 5 additions & 1 deletion spec/services/data_exports/csv/invoice_fees_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,11 @@

expect(result).to be_success

generated_csv = result.csv_lines
file = result.csv_file
generated_csv = file.read

file.close
File.unlink(file.path)

expect(generated_csv).to eq(expected_csv)
end
Expand Down
5 changes: 4 additions & 1 deletion spec/services/data_exports/csv/invoices_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,11 @@
CSV

expect(result).to be_success
generated_csv = result.csv_lines
file = result.csv_file
generated_csv = file.read

file.close
File.unlink(file.path)
expect(generated_csv).to eq(expected_csv)
end
end
Expand Down

0 comments on commit a677283

Please sign in to comment.