Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attachment background processing #7681

Merged
merged 10 commits into from
Aug 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ gem 'pg', '~> 1.5.3'
gem 'acts_as_versioned', git: 'https://github.com/mysociety/acts_as_versioned.git',
ref: '13e928b'
gem 'active_model_otp'
gem 'activejob-uniqueness'
gem 'bcrypt', '~> 3.1.19'
gem 'cancancan', '~> 3.5.0'
gem 'charlock_holmes', '~> 0.7.7'
Expand Down
6 changes: 6 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,9 @@ GEM
activejob (7.0.6)
activesupport (= 7.0.6)
globalid (>= 0.3.6)
activejob-uniqueness (0.2.5)
activejob (>= 4.2, < 7.1)
redlock (>= 1.2, < 2)
activemodel (7.0.6)
activesupport (= 7.0.6)
activerecord (7.0.6)
Expand Down Expand Up @@ -417,6 +420,8 @@ GEM
rake (13.0.6)
recaptcha (5.14.0)
redis (4.8.1)
redlock (1.3.2)
redis (>= 3.0.0, < 6.0)
regexp_parser (2.8.1)
representable (3.2.0)
declarative (< 0.1.0)
Expand Down Expand Up @@ -560,6 +565,7 @@ PLATFORMS

DEPENDENCIES
active_model_otp
activejob-uniqueness
acts_as_versioned!
alaveteli_features!
annotate (< 3.2.1)
Expand Down
50 changes: 50 additions & 0 deletions app/controllers/attachment_masks_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
##
# Controller to process FoiAttachment objects before being served publicly by
# applying masks and censor rules.
#
class AttachmentMasksController < ApplicationController
before_action :set_no_crawl_headers
before_action :find_attachment
before_action :ensure_attachment, :ensure_referer

def wait
if @attachment.masked?
redirect_to done_attachment_mask_path(
id: @attachment.to_signed_global_id,
referer: params[:referer]
)

else
FoiAttachmentMaskJob.perform_later(@attachment)
end
end

def done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the commit message:

Once an attachment masked file is available … redirecting to the download.

Redirecting to the actual attachment seems more preferable than requiring an extra click to do so…

This is so we aren't left on the wait view which looks like nothing has happened.

…but I'm not sure I understand the conditions where this happens.

I'd have thought we could do:

def wait
  if @attachment.masked_file.attached?
    # If we have a cached file, render it
    redirect_to attachment_path(@attachment)
  else
    # otherwise, fire off a job to cache it, render the waiting page, and
    # keep refreshing until `@attachment.masked_file.attached?` returns
    # true
    FoiAttachmentMaskJob.perform_later(@attachment)
    render 'waiting'
  end
end

One to demo perhaps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the redirect_to attachment_path(@attachment) the users browser would show the previous view as this will be treated link any other download link.

We could automatically download via JS or meta tag on the done view (and change the wording to say "Your download will start automatically...").

Happy to demo

unless @attachment.masked?
redirect_to wait_for_attachment_mask_path(
id: @attachment.to_signed_global_id,
referer: params[:referer]
)
end

@show_attachment_path = params[:referer]
end

private

def set_no_crawl_headers
headers['X-Robots-Tag'] = 'noindex'
end

def find_attachment
@attachment = GlobalID::Locator.locate_signed(params[:id])
end

def ensure_attachment
raise ActiveRecord::RecordNotFound unless @attachment
end

def ensure_referer
raise RouteNotFound unless params[:referer].present?
gbp marked this conversation as resolved.
Show resolved Hide resolved
end
end
30 changes: 17 additions & 13 deletions app/controllers/attachments_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,28 @@ class AttachmentsController < ApplicationController
before_action :authenticate_attachment
before_action :authenticate_attachment_as_html, only: :show_as_html

around_action :cache_attachments
around_action :cache_attachments, only: :show_as_html

def show
# Prevent spam to magic request address. Note that the binary
# substitution method used depends on the content type
body = @incoming_message.apply_masks(
@attachment.default_body,
@attachment.content_type
)
if @attachment.masked?
render body: @attachment.body, content_type: content_type
else
FoiAttachmentMaskJob.perform_later(@attachment)

if content_type == 'text/html'
body =
Loofah.scrub_document(body, :prune).
to_html(encoding: 'UTF-8').
try(:html_safe)
Timeout.timeout(5) do
until @attachment.masked?
sleep 0.5
@attachment.reload
end
redirect_to(request.fullpath)
end
end

render body: body, content_type: content_type
rescue Timeout::Error
redirect_to wait_for_attachment_mask_path(
@attachment.to_signed_global_id,
referer: request.fullpath
)
end

def show_as_html
Expand Down
44 changes: 44 additions & 0 deletions app/jobs/foi_attachment_mask_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
##
# Job to apply masks and censor rules to FoiAttachment objects. Masked file will
# be stored as FoiAttachment#file ActiveStorage association.
#
# Example:
# FoiAttachmentMaskJob.perform(FoiAttachment.first)
#
class FoiAttachmentMaskJob < ApplicationJob
queue_as :default
unique :until_and_while_executing, on_conflict: :log

attr_reader :attachment

delegate :incoming_message, to: :attachment
delegate :info_request, to: :incoming_message

def perform(attachment)
@attachment = attachment

body = AlaveteliTextMasker.apply_masks(
gbp marked this conversation as resolved.
Show resolved Hide resolved
attachment.unmasked_body,
attachment.content_type,
masks
)

if attachment.content_type == 'text/html'
body =
Loofah.scrub_document(body, :prune).
to_html(encoding: 'UTF-8').
try(:html_safe)
end

attachment.update(body: body, masked_at: Time.zone.now)
end

private

def masks
{
censor_rules: info_request.applicable_censor_rules,
masks: info_request.masks
}
end
end
33 changes: 21 additions & 12 deletions app/models/foi_attachment.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# == Schema Information
# Schema version: 20220916134847
# Schema version: 20230717201410
#
# Table name: foi_attachments
#
Expand All @@ -16,6 +16,7 @@
# updated_at :datetime
# prominence :string default("normal")
# prominence_reason :text
# masked_at :datetime
#

# models/foi_attachment.rb:
Expand All @@ -32,6 +33,7 @@ class FoiAttachment < ApplicationRecord

belongs_to :incoming_message,
inverse_of: :foi_attachments
has_one :raw_email, through: :incoming_message, source: :raw_email

has_one_attached :file, service: :attachments

Expand Down Expand Up @@ -81,7 +83,7 @@ def delete_cached_file!
end

def body=(d)
self.hexdigest = Digest::MD5.hexdigest(d)
self.hexdigest ||= Digest::MD5.hexdigest(d)

ensure_filename!
file.attach(
Expand All @@ -95,20 +97,14 @@ def body=(d)
end

# raw body, encoded as binary
def body(tries: 0, delay: 1)
def body
return @cached_body if @cached_body

if file.attached?
if masked?
@cached_body = file.download
else
# we've lost our cached attachments for some reason. Reparse them.
raise if tries > BODY_MAX_TRIES
sleep [delay, BODY_MAX_DELAY].min

incoming_message.parse_raw_email!(true)
reload

body(tries: tries + 1, delay: delay * 2)
FoiAttachmentMaskJob.perform_now(self)
body
end
end

Expand All @@ -123,6 +119,19 @@ def default_body
text_type? ? body_as_text.string : body
end

# return the body as it is in the raw email, unmasked without censor rules
# applied
def unmasked_body
MailHandler.attachment_body_for_hexdigest(
raw_email.mail,
hexdigest: hexdigest
)
end

def masked?
file.attached? && masked_at.present? && masked_at < Time.zone.now
end

def main_body_part?
self == incoming_message.get_main_body_text_part
end
Expand Down
8 changes: 8 additions & 0 deletions app/models/info_request.rb
Original file line number Diff line number Diff line change
Expand Up @@ -807,6 +807,10 @@ def tag_string=(tag_string)
end

def expire(options={})
# Clear any attachment masked_at timestamp, forcing attachments to be
# reparsed
clear_attachment_masks!

# Clear out cached entries, by removing files from disk (the built in
# Rails fragment cache made doing this and other things too hard)
foi_fragment_cache_directories.each { |dir| FileUtils.rm_rf(dir) }
Expand All @@ -822,6 +826,10 @@ def expire(options={})
reindex_request_events
end

def clear_attachment_masks!
foi_attachments.update_all(masked_at: nil)
end

# Removes anything cached about the object in the database, and saves
def clear_in_database_caches!
incoming_messages.each(&:clear_in_database_caches!)
Expand Down
11 changes: 11 additions & 0 deletions app/views/attachment_masks/done.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<% @title = _('Attachment available for download.') %>
<h1><%= @title %></h1>

<p>
<%= _('The attachment has now been processed and is available for ' \
'download.') %>
</p>

<p>
<%= link_to _('Download attachment'), @show_attachment_path, class: 'button' %>
</p>
12 changes: 12 additions & 0 deletions app/views/attachment_masks/wait.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<% @title = _('Attachment processing...') %>
<h1><%= @title %></h1>

<p>
<%= _('We don\'t have this attachment in our cache at present, we are now ' \
'processing your request and this page will reload once the ' \
'attachment is available.') %>
</p>

<% content_for :javascript_head do %>
<meta http-equiv="refresh" content="2">
<% end%>
5 changes: 5 additions & 0 deletions config/initializers/active_job_uniqueness.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
require 'redis_connection'

ActiveJob::Uniqueness.configure do |config|
config.redlock_servers = [RedisConnection.instance]
end
23 changes: 3 additions & 20 deletions config/initializers/sidekiq.rb
Original file line number Diff line number Diff line change
@@ -1,26 +1,9 @@
require File.expand_path('../load_env.rb', __dir__)

def redis_config
{ url: ENV['REDIS_URL'], password: ENV['REDIS_PASSWORD'] }.
merge(redis_sentinel_config)
end

def redis_sentinel_config
return {} unless ENV['REDIS_SENTINELS']

sentinels = ENV['REDIS_SENTINELS'].split(',').map do |ip_and_port|
ip, port = ip_and_port.split(/:(\d+)$/)
ip = Regexp.last_match[1] if ip =~ /\[(.*?)\]/
{ host: ip, port: port&.to_i || 26_379 }
end

{ sentinels: sentinels, role: :master }
end
require 'redis_connection'

Sidekiq.configure_client do |config|
config.redis = redis_config
config.redis = RedisConnection.configuration
end

Sidekiq.configure_server do |config|
config.redis = redis_config
config.redis = RedisConnection.configuration
end
7 changes: 7 additions & 0 deletions config/routes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,13 @@ def matches?(request)
:via => :get,
:constraints => { :part => /\d+/ }

#### Attachment controller
resources :attachment_masks, only: [], path: :attachments do
get 'wait', on: :member, as: :wait_for
get 'done', on: :member
end
####

match '/request_event/:info_request_event_id' => 'request#show_request_event',
:as => :info_request_event,
:via => :get
Expand Down
5 changes: 5 additions & 0 deletions db/migrate/20230717201410_add_masked_at_to_foi_attachments.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
class AddMaskedAtToFoiAttachments < ActiveRecord::Migration[7.0]
def change
add_column :foi_attachments, :masked_at, :datetime
end
end
7 changes: 7 additions & 0 deletions lib/mail_handler/backends/mail_backend.rb
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,13 @@ def get_attachment_attributes(mail)
end
end

def attachment_body_for_hexdigest(mail, hexdigest:)
attributes = get_attachment_attributes(mail).find do |attrs|
attrs[:hexdigest] == hexdigest
end
attributes&.fetch(:body)
end

# Format
def address_from_name_and_email(name, email)
unless MySociety::Validate.is_valid_email(email)
Expand Down
28 changes: 28 additions & 0 deletions lib/redis_connection.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
require File.expand_path('../config/load_env.rb', __dir__)

##
# Module to parse Redis ENV variables into usable configuration for Sidekiq and
# ActiveJob::Uniqueness gems.
#
module RedisConnection
def self.instance
Redis.new(configuration)
end

def self.configuration
{ url: ENV['REDIS_URL'], password: ENV['REDIS_PASSWORD'] }.
merge(sentinel_configuration)
end

def self.sentinel_configuration
return {} unless ENV['REDIS_SENTINELS']

sentinels = ENV['REDIS_SENTINELS'].split(',').map do |ip_and_port|
ip, port = ip_and_port.split(/:(\d+)$/)
ip = Regexp.last_match[1] if ip =~ /\[(.*?)\]/
{ host: ip, port: port&.to_i || 26_379 }
end

{ sentinels: sentinels, role: :master }
end
end
Loading