Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: Performance with uploading instances to MinIO #154

Closed
JoeBatt1989 opened this issue Sep 16, 2022 · 4 comments · Fixed by #166 or #169
Closed

Investigation: Performance with uploading instances to MinIO #154

JoeBatt1989 opened this issue Sep 16, 2022 · 4 comments · Fixed by #166 or #169
Labels
bug Something isn't working

Comments

@JoeBatt1989
Copy link

Description

Currently in an environment we are seeing that saving a study to MinIO is taking around 1 second per slice. This ticket is to track the investigation of that. Unsure where the issue currently lies.

Steps to reproduce

  1. Deploy MIG and MinIO to an environment
  2. Send a study to benchmark

Expected behavior

Study is uploaded to storage within an acceptable amount of time

Actual behavior

Study taking > 10 mins in some cases to save

@JoeBatt1989 JoeBatt1989 added the bug Something isn't working label Sep 16, 2022
@mocsharp
Copy link
Collaborator

Please share the environment running IG & MinIO.

  • CPU cores
  • RAM
  • Disk size/speed
  • Network speed

@JoeBatt1989
Copy link
Author

Hi @mocsharp. Details of the env.

DGX box
1 gpu, 8 vCPU, 32GB ram, Up to 25 Gbps, 225 GB NVMe SSD
Both Head nodes
2 vCPU, 8 GB ram, Seems about 300Mbps

All boxes are attached to a EFS instance.
https://docs.aws.amazon.com/efs/latest/ug/performance.html

@mocsharp
Copy link
Collaborator

I ran 5 studies using MONAI Deploy Lite and each study was completed within 1-2 mins (upload took no longer than a minute each). The MIG container was set to use only 2 CPUs + 8GB ram.

Container stats after all 5 studies are completed.

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
148bd3484592   mdl-orthanc    1.74%     295.1MiB / 31.05GiB   0.93%     1.07MB / 964MB    1.22GB / 156MB    66
c35bf2a24db7   mdl-minio      0.03%     244.4MiB / 31.05GiB   0.77%     1.76GB / 1.04GB   4.2GB / 5.52GB    20
d7ab6113b075   mdl-rabbitmq   2.40%     154.9MiB / 31.05GiB   0.49%     915kB / 897kB     42.5MB / 3.71MB   45
46899c9fcd8a   mdl-mongodb    0.58%     116.9MiB / 31.05GiB   0.37%     162kB / 1.37MB    173MB / 12.2MB    39
ea62b3f26676   mdl-ig         0.63%     673MiB / 8GiB         8.21%     1.01GB / 980MB    21.1MB / 234MB    21
29f205cb59d5   mdl-tm         0.00%     156.6MiB / 31.05GiB   0.49%     974MB / 776MB     964MB / 90.1kB    21
3eadb0315746   mdl-wm         0.10%     79.47MiB / 31.05GiB   0.25%     23.6MB / 6.41MB   17.7MB / 0B       23

@mocsharp
Copy link
Collaborator

Added ability to switch to disk for storing incoming data before uploading to storage service in PR #166.

Time measured from the first instance is received to the time the workflow request is sent.

When 10 studies (588 instances per study) are sent to IG continuously, using the disk is much faster than memory:

# Memory
Workflow request published to md.workflow.request, message ID=2cff618e-d1a3-4115-ae80-e5e6b4f411b7. Payload took 00:02:25.1398701 to complete.
Workflow request published to md.workflow.request, message ID=a766b539-2a89-41f4-8b13-9ba27a3bac6d. Payload took 00:05:46.5375501 to complete.
Workflow request published to md.workflow.request, message ID=f407ac25-5b54-4e40-b97f-fac16e914874. Payload took 00:09:04.7963726 to complete.
Workflow request published to md.workflow.request, message ID=05c44068-b0e2-446c-b63a-fbedaf7585c4. Payload took 00:11:53.2644454 to complete.
Workflow request published to md.workflow.request, message ID=1a0cf1ec-2cdc-458e-b3ae-c2845cb1daee. Payload took 00:14:17.0859691 to complete.
Workflow request published to md.workflow.request, message ID=60dd9547-2482-4559-9ac5-50d4bd5bbef4. Payload took 00:16:23.0448137 to complete.
Workflow request published to md.workflow.request, message ID=ebedd87f-68ba-4161-9dbc-8304045c75d0. Payload took 00:18:01.6260370 to complete.
Workflow request published to md.workflow.request, message ID=0b8dd992-98de-4ea2-9c5a-9d941f3d0c78. Payload took 00:19:19.5292454 to complete.
Workflow request published to md.workflow.request, message ID=9b0038bd-dbd1-4e2b-9a1d-864f9c0c5c94. Payload took 00:20:02.9085077 to complete.
Workflow request published to md.workflow.request, message ID=d5d00669-c9d8-482a-8c71-a9b975f179d7. Payload took 00:20:11.4483895 to complete.


# Disk
Workflow request published to md.workflow.request, message ID=c09cf45a-9cb9-4ea5-8236-cbe77ff374e7. Payload took 00:00:51.2866232 to complete.
Workflow request published to md.workflow.request, message ID=2196c06f-54d6-401e-9263-869bf13f7741. Payload took 00:01:27.5621902 to complete.
Workflow request published to md.workflow.request, message ID=fda8e1ba-3f37-4a6e-a141-48e3b1fc0e84. Payload took 00:01:57.8332724 to complete.
Workflow request published to md.workflow.request, message ID=42a61e75-fe33-4be6-824b-5352f886a1cf. Payload took 00:02:37.3700811 to complete.
Workflow request published to md.workflow.request, message ID=d5511b17-3909-4034-82a8-5abc7ebc06fc. Payload took 00:03:19.2335252 to complete.
Workflow request published to md.workflow.request, message ID=d058139a-7b7d-4cf1-a832-e712a1767fa5. Payload took 00:04:23.0854715 to complete.
Workflow request published to md.workflow.request, message ID=651af33d-845f-491f-8fdc-a6f0a544d74a. Payload took 00:05:07.1626852 to complete.
Workflow request published to md.workflow.request, message ID=d1e7e27d-9b15-4580-978b-ba0067d08eef. Payload took 00:05:56.3368932 to complete.
Workflow request published to md.workflow.request, message ID=050ca6c7-5daf-4a86-a608-8af9fc748600. Payload took 00:06:26.4512770 to complete.
Workflow request published to md.workflow.request, message ID=63477934-7244-4a11-8c5b-681b125b0ebb. Payload took 00:07:03.9482112 to complete.

Similarly for send a single study:

# Memory
Workflow request published to md.workflow.request, message ID=382d5c48-48fd-44cd-b059-d19dd907f91a. Payload took 00:01:11.5303211 to complete.

# Disk
Workflow request published to md.workflow.request, message ID=3adc8dd6-2dcb-42a9-89b7-3ec021866f87. Payload took 00:00:46.1229609 to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment