Add DCMTK only benchmark #639

chsasank · 2021-06-04T17:08:33Z

Added benchmark for dcmtk scu/scp for comparison. Current benchmarks have pynetdicom only version but no dcmtk only version for a fair comparison.

Here are the benchmarks on my MBA M1:

$ python pynetdicom/tests/benchmark_script.py 
Use yappi (y/n:)
n
Which benchmark do you wish to run?
  1. Storage SCU, 1000 datasets over 1 association
  2. Storage SCU, 1 dataset per association over 1000 associations
  3. Storage SCP, 1000 datasets over 1 association
  4. Storage SCP, 1000 datasets over 1 association (write)
  5. Storage SCP, 1000 datasets over 1 association (write fast)
  6. Storage SCP, 1000 datasets over 1 association (write fastest)
  7. Storage SCP, 1 dataset per association over 1000 associations
  8. Storage SCP, 1000 datasets per association over 10 simultaneous associations
  9. Storage SCU/SCP, 1000 datasets over 1 association
  10. Storage DCMTK SCU/SCP, 1000 datasets over 1 association

benchmark id	time
1	9.27 s
2	31.89 s
3	8.69 s
4	12.63 s
5	9.32 s
6	6.37 s
7	57.08 s
8	11.97 s
9	12.59 s
10	2.00 s

It's sobering to see performance of 9 vs 10. DCMTK is roughly 6 times faster than pynetdicom. Do you think if there is something that can be done to speed pynetdicom up?

chsasank · 2021-06-04T18:19:17Z

I did some yappi profiling. Here's what benchmark 1's most common functions:

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
2005	0.008664	4.321e-06	23.06	0.0115	threading.py:270(Condition.wait)
1	0.02993	0.02993	20.18	20.18	dul.py:356(DULServiceProvider.run)
1	0.006017	0.006017	20.16	20.16	association.py:622(Association._run_reactor)
1	1.1e-05	1.1e-05	20.16	20.16	association.py:581(Association.run)
1000	0.01985	1.985e-05	20.15	0.02015	association.py:1570(Association.send_c_store)
1025	0.003362	3.28e-06	16.55	0.01614	threading.py:540(Event.wait)
3000	0.01107	3.69e-06	12.59	0.004196	dsutils.py:70(encode)
3000	0.645	0.000215	12.55	0.004185	filewriter.py:551(write_dataset)
270000	1.832	6.784e-06	8.647	3.202e-05	filewriter.py:456(write_data_element)
15008	0.03981	2.652e-06	6.62	0.0004411	queue.py:153(Queue.get)
2022	0.00373	1.845e-06	6.529	0.003229	dimse.py:171(DIMSEServiceProvider.get_msg)
1000	0.01236	1.236e-05	1.497	0.001497	dimse.py:288(DIMSEServiceProvider.send_msg)
270000	0.5057	1.873e-06	1.292	4.787e-06	filebase.py:58(DicomBytesIO.write_tag)
273000	0.2116	7.752e-07	1.237	4.53e-06	filebase.py:216(DicomBytesIO.init)
273000	0.6998	2.563e-06	1.025	3.755e-06	filebase.py:170(DicomBytesIO.init)
5005	0.01506	3.009e-06	0.9823	0.0001963	fsm.py:49(StateMachine.do_action)

And benchmark 3's functions

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
1	0.000371	0.000371	10.17	10.17	socketserver.py:215(ThreadedAssociationServer.serve_forever)
21	0.000351	1.671e-05	10.17	0.4841	selectors.py:402(PollSelector.select)
2	5.1e-05	2.55e-05	10.17	5.086	threading.py:859(Thread.run)
1	6.2e-05	6.2e-05	9.51	9.51	subprocess.py:1074(Popen.wait)
1	3.8e-05	3.8e-05	9.51	9.51	subprocess.py:1772(Popen._wait)
1	1.1e-05	1.1e-05	9.51	9.51	subprocess.py:1759(Popen._try_wait)
1	2.3e-05	2.3e-05	9.441	9.441	association.py:581(Association.run)
1	0.02955	0.02955	9.431	9.431	dul.py:356(DULServiceProvider.run)
1	0.03111	0.03111	9.411	9.411	association.py:622(Association._run_reactor)
1000	0.006313	6.313e-06	1.644	0.001644	association.py:3250(Association._serve_request)
1000	0.01326	1.326e-05	1.611	0.001611	service_class.py:1428(StorageServiceClass.SCP)
1000	0.007846	7.846e-06	1.536	0.001536	dimse.py:288(DIMSEServiceProvider.send_msg)
2000	0.006835	3.417e-06	0.9025	0.0004513	dsutils.py:70(encode)
2000	0.04761	2.381e-05	0.8803	0.0004401	filewriter.py:551(write_dataset)

It's clear that most of the time in SCU is encoding. But it's not clear what's blocking in SCP.

scaramallion · 2021-06-04T21:59:10Z

There are certainly some optimisations left, off the top of my head:

Implementing a dataset decoder/encoder designed solely for command set and DIMSE elements, perhaps a minor gain?
Use the encoded dataset when acting as a C-STORE SCU if it's available and no encoding change is needed, reasonable gain under those conditions. This one is actually on my task list
Changing the Association and DUL to use event blocking rather than a strict run loop delay, but I've avoided that change because of how complex it is to reason about threading. Probably a reasonable gain
Adding some compiled extensions, but it's not something I particularly want to do.
Perhaps moving to async?

Hmm, that command set/DIMSE dataset encoder/decoder idea sounds like fun...

codecov · 2021-06-04T22:08:41Z

Codecov Report

Merging #639 (d011056) into master (151e9cd) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff             @@
##            master      #639    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           27        27            
  Lines         8476      8659   +183     
==========================================
+ Hits          8476      8659   +183

Impacted Files	Coverage Δ
pynetdicom/ae.py	`100.00% <0.00%> (ø)`
pynetdicom/dul.py	`100.00% <0.00%> (ø)`
pynetdicom/fsm.py	`100.00% <0.00%> (ø)`
pynetdicom/pdu.py	`100.00% <0.00%> (ø)`
pynetdicom/acse.py	`100.00% <0.00%> (ø)`
pynetdicom/dimse.py	`100.00% <0.00%> (ø)`
pynetdicom/timer.py	`100.00% <0.00%> (ø)`
pynetdicom/utils.py	`100.00% <0.00%> (ø)`
pynetdicom/events.py	`100.00% <0.00%> (ø)`
pynetdicom/status.py	`100.00% <0.00%> (ø)`
... and 15 more

chsasank · 2021-06-05T08:46:52Z

Thanks for the answer.

Changing the Association and DUL to use event blocking rather than a strict run loop delay, but I've avoided that change because of how complex it is to reason about threading. Probably a reasonable gain

Perhaps moving to async?

I think both are same? Async makes everything event based and we don't need to have while True loop in reactors anymore. @arunkant and I have been thinking about async a lot these days. Refer to some discussion on #527.

Do you have any advice on how do we go about this? I've had hard time wrapping all the complexity of the code around my head to know where do I even start?

As you probably know asyncio uses transports and protocols to create a network application. If we were to move to this, how do different classes below fit in this framework?

Any advice is appreciated

chsasank added 2 commits June 4, 2021 22:36

add dcmtk benchmark

5272030

remove unused var

d011056

scaramallion merged commit b3222cb into pydicom:master Jun 4, 2021

chsasank deleted the sasank/benchmark branch June 5, 2021 08:48

chsasank mentioned this pull request Jun 7, 2021

[WIP] Move to asyncio #642

Closed

scaramallion mentioned this pull request Jun 18, 2021

storescu.py is ~35x slower than DCMTK's storescu when saving ultrasound studies #650

Closed

chsasank mentioned this pull request Jul 2, 2021

Random association aborts with Windows Server over network #653

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DCMTK only benchmark #639

Add DCMTK only benchmark #639

chsasank commented Jun 4, 2021 •

edited

Loading

chsasank commented Jun 4, 2021

scaramallion commented Jun 4, 2021 •

edited

Loading

codecov bot commented Jun 4, 2021 •

edited

Loading

chsasank commented Jun 5, 2021 •

edited

Loading

Add DCMTK only benchmark #639

Add DCMTK only benchmark #639

Conversation

chsasank commented Jun 4, 2021 • edited Loading

chsasank commented Jun 4, 2021

scaramallion commented Jun 4, 2021 • edited Loading

codecov bot commented Jun 4, 2021 • edited Loading

Codecov Report

chsasank commented Jun 5, 2021 • edited Loading

chsasank commented Jun 4, 2021 •

edited

Loading

scaramallion commented Jun 4, 2021 •

edited

Loading

codecov bot commented Jun 4, 2021 •

edited

Loading

chsasank commented Jun 5, 2021 •

edited

Loading