Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical: thread safety issues while using @instance vars (<=v3.0.2) #39

Closed
fernandomm opened this issue Feb 11, 2022 · 18 comments
Closed
Labels

Comments

@fernandomm
Copy link
Contributor

Describe the bug
It looks like there are is a thread safety issue with the server. If there are concurrent connections, ctx might start mixing message info between connections.

To Reproduce
Steps to reproduce the behavior:

  1. Run server.rb
# frozen_string_literal: true

# server.rb

require 'midi-smtp-server'

class MySmtpd < MidiSmtpServer::Smtpd

  def on_message_data_event(ctx)
    # Output for debug
    logger.debug("mail reveived at: [#{ctx[:server][:local_ip]}:#{ctx[:server][:local_port]}]. From: [#{ctx[:envelope][:from]}]. Recipient: [#{ctx[:envelope][:to]}]. Authorization id: #{ctx[:server][:authorization_id]}...")

    raise 'Error' if "<#{ctx[:server][:authorization_id]}>" != ctx[:envelope][:from]
    raise 'Error' if ctx[:envelope][:from] != ctx[:envelope][:to].first

    # Save message
    File.binwrite(File.join('/tmp/', "#{ctx[:server][:authorization_id]}-#{SecureRandom.uuid}.msg"), ctx[:message][:data])
  end

  def on_auth_event(_ctx, _authorization_id, authentication_id, authentication)
    authentication_id
  end
end

server = MySmtpd.new(
  ports: '2525',
  hosts: '0.0.0.0',
  pipelining_extension: true,
  max_processings: 50,
  max_connections: 500,
  auth_mode: :AUTH_REQUIRED
)

flag_status_ctrl_c_pressed = false

# try to gracefully shutdown on Ctrl-C
trap('INT') do
  # print an empty line right after ^C
  puts
  # notify flag about Ctrl-C was pressed
  flag_status_ctrl_c_pressed = true
  # signal exit to app
  exit 0
end

# Output for debug
server.logger.info("Starting MySmtpd [#{MidiSmtpServer::VERSION::STRING}|#{MidiSmtpServer::VERSION::DATE}] (Simple example) ...")

# setup exit code
at_exit do
  # check to shutdown connection
  if server
    # Output for debug
    server.logger.info('Ctrl-C interrupted, exit now...') if flag_status_ctrl_c_pressed
    # info about shutdown
    server.logger.info('Shutdown MySmtpd...')
    # stop all threads and connections gracefully
    server.stop
  end
  # Output for debug
  server.logger.info('MySmtpd down!')
end

# Start the server
server.start

# Run on server forever
server.join
  1. Run multiple instances of client.rb with unique emails. More connections = faster to reproduce. Example:
EMAIL=email1@example.com ruby client.rb
EMAIL=email2@example.com ruby client.rb
EMAIL=email3@example.com ruby client.rb
# client.rb
require 'rubygems'
require 'mail'

smtp_connection = Net::SMTP.new('localhost', 2525)
smtp_connection.set_debug_output $stderr
smtp_connection.start('localhost', ENV.fetch('EMAIL'), 'anything')

Mail.defaults do
  delivery_method :smtp_connection, { connection: smtp_connection }
end

Kernel.loop do |i|
  Mail.deliver do
    from     ENV.fetch('EMAIL')
    to       ENV.fetch('EMAIL')
    subject  "Test email #{i}"
    html_part do
      content_type 'text/html; charset=UTF-8'
      body '<html><body><p>This is the <b>email</b> HTML body example.</p></body></html>'
    end
  end
end

smtp_connection.finish
  1. It may take a few or several seconds, but at some point one of the clients will fail with:
-> "354 Enter message, ending with \".\" on a line by itself\r\n"
writing message from String
wrote 617 bytes
-> "451 Requested action aborted: local error in processing\r\n"
/Users/fernando/.rvm/rubies/ruby-2.6.8/lib/ruby/2.6.0/net/smtp.rb:969:in `check_response': 451 Requested action aborted: local error in processing (Net::SMTPServerBusy)

And when you check the server log, you will see this:

2022-02-11 15:40:15 +0000: [DEBUG] >>> 354 Enter message, ending with "." on a line by itself
2022-02-11 15:40:15 +0000: [DEBUG] mail reveived at: [172.28.0.18:2525]. From: [<email1@example.com>]. Recipient: [["<email3@example.com>"]]. Authorization id: email3@example.com...
2022-02-11 15:40:15 +0000: [ERROR] Error (MidiSmtpServer::Smtpd451Exception)

As you can see, ctx[:server][:authorization_id] got a value from another thread. I've also experienced issues with ctx[:envelope][:from] and ctx[:envelope][:to] values.

Expected behavior

ctx values should be consistent when there are multiple connections.

System (please complete the following information):

  • OS: Ubuntu 20.04
  • Ruby: 2.6.8
  • MidiSmtpServer version: 3.0.2
@TomFreudenberg
Copy link
Member

Hi Fernando @fernandomm

first of all: thanks for using this component and giving feedback!

You report something very mystic - never occurred until today.

If I check the session initialization at:

https://github.com/4commerce-technologies-AG/midi-smtp-server/blob/master/lib/midi-smtp-server.rb#L510-L519

it is plain usage of ruby Thread class and looks not breaking for me.

I am not sure if this is the server part or maybe an issue from continuously sending via Mail component or maybe also by ENV.fetch inside the loop.

Could you please try:

  1. to store the ENV.fetch value and use a inner (static) variable
  2. to use RSET command before next delivery
  3. to use Mikel Mail component instead

Looking forward to your feedback.

After all it would be great when we get a test for this case.

@TomFreudenberg
Copy link
Member

In addition you may also disable the pipelining once for a test run

pipelining_extension: false,

@fernandomm
Copy link
Contributor Author

Thank you for the quick reply and also for the amazing project.

I was debugging the issue locally and found out that it might be caused by the @cmd_data instance variable defined here.

I replaced it with a local variable and have been running the same test for 5+ minutes. It looks like it fixed the problem.

I'm not sure why it's defined as an instance variable since it looks like it's only used inside that method. But I've also run the project's tests and it looks like i didn't break anything.

I will review your suggestions to check if they might fix the problem without my changes. Otherwise, I could submit a PR with the fix that I mentioned.

@TomFreudenberg
Copy link
Member

Hey Fernando @fernandomm

what a CATCH ...

cmd_data for sure is not correct to be a an instance variable - crazy that this was not faulty until now


Same here

@auth_values = [

and here

@auth_values = Base64.decode64(encoded_auth_response).split("\x00")

and here

@auth_values = [

should be only a local var for auth_values


Fernando, I will create an update tommorrow for a new release to fix this.

I would like to add you to the Contributor list.

https://github.com/4commerce-technologies-AG/midi-smtp-server/blob/master/CONTRIBUTORS.md

If you send me a PR just added yourself - I can merge that too!


For now - THANKS A LOT for that fix
Tom

@TomFreudenberg
Copy link
Member

@gencer - not sure if you already noticed this issue - guess its an important update for you!

@TomFreudenberg TomFreudenberg changed the title Thread safety issues? Critical: ctx values may get wrong data due to thread safety issues while using @instance vars (<=v3.0.2) Feb 11, 2022
@TomFreudenberg
Copy link
Member

This will also need a fix for the v2.3.2 release

https://github.com/4commerce-technologies-AG/midi-smtp-server/blob/release/2.3.2/lib/midi-smtp-server.rb#L783

and

@auth_values

@gencer
Copy link
Contributor

gencer commented Feb 11, 2022

@gencer - not sure if you already noticed this issue - guess its an important update for you!

No. Did not noticed this before. Perhaps due to some custom usages but this is serious (looking to codes and testing here since I got your notification). Thank you @TomFreudenberg for pinging me!

@fernandomm
Copy link
Contributor Author

@TomFreudenberg I'm glad I could help.

I had to create a quick patch since this was affecting a production service. It's available in the following branch https://github.com/fernandomm/midi-smtp-server/tree/thread-safety-fix

Should I submit it as a PR?

I'm just not sure about how you would like to test this. In my experience it will usually happen when there is some I/O ( Eg.: saving a file ) and multiple connections. I'm able to reproduce it with this test but it makes the test suite slow.

@TomFreudenberg
Copy link
Member

Hey Fernando @fernandomm

cool that it already works on your production environment.

I will check tomorrow if there are any special cases for the @auth_values we may have overseen currently.

I cherry pick your commit from the above branch. Please append in there also your github id to the CONTRIBUTORS list.

For the test I also have a look how we can manage the long runner tests but at least it should run.

@fernandomm
Copy link
Contributor Author

Please append in there also your github id to the CONTRIBUTORS list.

Ok, I've added myself in a separate commit to the same branch: fernandomm@7e2c17f. Thanks!

@TomFreudenberg
Copy link
Member

Hey guys @gencer @fernandomm

this issue is fixed !

Fernando, again thanks a lot for working on this.

I have applied a new test based on yours at test/stress if you may have a look.

With some random load during the auth processes I could easily manage to get the ctx overwritten in a false manner.

If you both like and willing, I would be happy if you could run the tests once with the old code (v3.0.2) and post here that you also get a fail in that.

I will post a new Gem for 3.0.3 as well as 2.3.3 tonight.

Have a nice weekend
Tom

@TomFreudenberg TomFreudenberg removed their assignment Feb 12, 2022
@TomFreudenberg TomFreudenberg changed the title Critical: ctx values may get wrong data due to thread safety issues while using @instance vars (<=v3.0.2) Critical: thread safety issues while using @instance vars (<=v3.0.2) Feb 13, 2022
@TomFreudenberg
Copy link
Member

TomFreudenberg commented Feb 13, 2022

All published and online

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Feb 13, 2022

To make sure that the new StressTest will catch the @\instance var error also on github Workflow, I have reposted the old library.

The workflow shows an error as supposed !

grafik

grafik

So we can rely on tests also for that issue.

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Feb 15, 2022

Hey guys @gencer @fernandomm

does it work on your side?

Cheers,
Tom

@gencer
Copy link
Contributor

gencer commented Feb 15, 2022

Hey guys @gencer @fernandomm

does it work on your side?

Cheers, Tom

Hey @TomFreudenberg

I'll test this ASAP and will get back to you.

@fernandomm
Copy link
Contributor Author

Hey guys @gencer @fernandomm

does it work on your side?

Cheers, Tom

I've been running it in a staging environment. It works fine so far.

Thanks again for the quick fix.

@TomFreudenberg
Copy link
Member

In case nothing additional popped up :-) I guess it's done

Cheers
guys

@gencer
Copy link
Contributor

gencer commented Feb 18, 2022

In case nothing additional popped up :-) I guess it's done

Cheers guys

@TomFreudenberg

I still did not tested yet, I'll test on this weekend. Will write if anything goes wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants