Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DCMTK only benchmark #639

Merged
merged 2 commits into from
Jun 4, 2021
Merged

Conversation

chsasank
Copy link
Contributor

@chsasank chsasank commented Jun 4, 2021

Added benchmark for dcmtk scu/scp for comparison. Current benchmarks have pynetdicom only version but no dcmtk only version for a fair comparison.

Here are the benchmarks on my MBA M1:

$ python pynetdicom/tests/benchmark_script.py 
Use yappi (y/n:)
n
Which benchmark do you wish to run?
  1. Storage SCU, 1000 datasets over 1 association
  2. Storage SCU, 1 dataset per association over 1000 associations
  3. Storage SCP, 1000 datasets over 1 association
  4. Storage SCP, 1000 datasets over 1 association (write)
  5. Storage SCP, 1000 datasets over 1 association (write fast)
  6. Storage SCP, 1000 datasets over 1 association (write fastest)
  7. Storage SCP, 1 dataset per association over 1000 associations
  8. Storage SCP, 1000 datasets per association over 10 simultaneous associations
  9. Storage SCU/SCP, 1000 datasets over 1 association
  10. Storage DCMTK SCU/SCP, 1000 datasets over 1 association
benchmark id time
1 9.27 s
2 31.89 s
3 8.69 s
4 12.63 s
5 9.32 s
6 6.37 s
7 57.08 s
8 11.97 s
9 12.59 s
10 2.00 s

It's sobering to see performance of 9 vs 10. DCMTK is roughly 6 times faster than pynetdicom. Do you think if there is something that can be done to speed pynetdicom up?

@chsasank
Copy link
Contributor Author

chsasank commented Jun 4, 2021

I did some yappi profiling. Here's what benchmark 1's most common functions:

ncalls tottime percall cumtime percall filename:lineno(function)
2005 0.008664 4.321e-06 23.06 0.0115 threading.py:270(Condition.wait)
1 0.02993 0.02993 20.18 20.18 dul.py:356(DULServiceProvider.run)
1 0.006017 0.006017 20.16 20.16 association.py:622(Association._run_reactor)
1 1.1e-05 1.1e-05 20.16 20.16 association.py:581(Association.run)
1000 0.01985 1.985e-05 20.15 0.02015 association.py:1570(Association.send_c_store)
1025 0.003362 3.28e-06 16.55 0.01614 threading.py:540(Event.wait)
3000 0.01107 3.69e-06 12.59 0.004196 dsutils.py:70(encode)
3000 0.645 0.000215 12.55 0.004185 filewriter.py:551(write_dataset)
270000 1.832 6.784e-06 8.647 3.202e-05 filewriter.py:456(write_data_element)
15008 0.03981 2.652e-06 6.62 0.0004411 queue.py:153(Queue.get)
2022 0.00373 1.845e-06 6.529 0.003229 dimse.py:171(DIMSEServiceProvider.get_msg)
1000 0.01236 1.236e-05 1.497 0.001497 dimse.py:288(DIMSEServiceProvider.send_msg)
270000 0.5057 1.873e-06 1.292 4.787e-06 filebase.py:58(DicomBytesIO.write_tag)
273000 0.2116 7.752e-07 1.237 4.53e-06 filebase.py:216(DicomBytesIO.init)
273000 0.6998 2.563e-06 1.025 3.755e-06 filebase.py:170(DicomBytesIO.init)
5005 0.01506 3.009e-06 0.9823 0.0001963 fsm.py:49(StateMachine.do_action)

And benchmark 3's functions

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000371 0.000371 10.17 10.17 socketserver.py:215(ThreadedAssociationServer.serve_forever)
21 0.000351 1.671e-05 10.17 0.4841 selectors.py:402(PollSelector.select)
2 5.1e-05 2.55e-05 10.17 5.086 threading.py:859(Thread.run)
1 6.2e-05 6.2e-05 9.51 9.51 subprocess.py:1074(Popen.wait)
1 3.8e-05 3.8e-05 9.51 9.51 subprocess.py:1772(Popen._wait)
1 1.1e-05 1.1e-05 9.51 9.51 subprocess.py:1759(Popen._try_wait)
1 2.3e-05 2.3e-05 9.441 9.441 association.py:581(Association.run)
1 0.02955 0.02955 9.431 9.431 dul.py:356(DULServiceProvider.run)
1 0.03111 0.03111 9.411 9.411 association.py:622(Association._run_reactor)
1000 0.006313 6.313e-06 1.644 0.001644 association.py:3250(Association._serve_request)
1000 0.01326 1.326e-05 1.611 0.001611 service_class.py:1428(StorageServiceClass.SCP)
1000 0.007846 7.846e-06 1.536 0.001536 dimse.py:288(DIMSEServiceProvider.send_msg)
2000 0.006835 3.417e-06 0.9025 0.0004513 dsutils.py:70(encode)
2000 0.04761 2.381e-05 0.8803 0.0004401 filewriter.py:551(write_dataset)

It's clear that most of the time in SCU is encoding. But it's not clear what's blocking in SCP.

@scaramallion
Copy link
Member

scaramallion commented Jun 4, 2021

There are certainly some optimisations left, off the top of my head:

  • Implementing a dataset decoder/encoder designed solely for command set and DIMSE elements, perhaps a minor gain?
  • Use the encoded dataset when acting as a C-STORE SCU if it's available and no encoding change is needed, reasonable gain under those conditions. This one is actually on my task list
  • Changing the Association and DUL to use event blocking rather than a strict run loop delay, but I've avoided that change because of how complex it is to reason about threading. Probably a reasonable gain
  • Adding some compiled extensions, but it's not something I particularly want to do.
  • Perhaps moving to async?

Hmm, that command set/DIMSE dataset encoder/decoder idea sounds like fun...

@codecov
Copy link

codecov bot commented Jun 4, 2021

Codecov Report

Merging #639 (d011056) into master (151e9cd) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff             @@
##            master      #639    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           27        27            
  Lines         8476      8659   +183     
==========================================
+ Hits          8476      8659   +183     
Impacted Files Coverage Δ
pynetdicom/ae.py 100.00% <0.00%> (ø)
pynetdicom/dul.py 100.00% <0.00%> (ø)
pynetdicom/fsm.py 100.00% <0.00%> (ø)
pynetdicom/pdu.py 100.00% <0.00%> (ø)
pynetdicom/acse.py 100.00% <0.00%> (ø)
pynetdicom/dimse.py 100.00% <0.00%> (ø)
pynetdicom/timer.py 100.00% <0.00%> (ø)
pynetdicom/utils.py 100.00% <0.00%> (ø)
pynetdicom/events.py 100.00% <0.00%> (ø)
pynetdicom/status.py 100.00% <0.00%> (ø)
... and 15 more

@scaramallion scaramallion merged commit b3222cb into pydicom:master Jun 4, 2021
@chsasank
Copy link
Contributor Author

chsasank commented Jun 5, 2021

Thanks for the answer.

Changing the Association and DUL to use event blocking rather than a strict run loop delay, but I've avoided that change because of how complex it is to reason about threading. Probably a reasonable gain

Perhaps moving to async?

I think both are same? Async makes everything event based and we don't need to have while True loop in reactors anymore. @arunkant and I have been thinking about async a lot these days. Refer to some discussion on #527.

Do you have any advice on how do we go about this? I've had hard time wrapping all the complexity of the code around my head to know where do I even start?

As you probably know asyncio uses transports and protocols to create a network application. If we were to move to this, how do different classes below fit in this framework?

image

Any advice is appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants