Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix batch input - Nvidia DALI #2455

Merged
merged 5 commits into from
Jul 25, 2023
Merged

Conversation

jagadeeshi2i
Copy link
Collaborator

@jagadeeshi2i jagadeeshi2i commented Jul 11, 2023

Description

Adds fix for batch input support

Type of change

Modify custom handler to support batch input

ts_log.log

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Signed-off-by: jagadeesh <jagadeeshj@ideas2it.com>
@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Merging #2455 (4ab075f) into master (255a047) will not change coverage.
The diff coverage is n/a.

❗ Current head 4ab075f differs from pull request most recent head 98eac4e. Consider uploading reports for the commit 98eac4e to get more accurate results

@@           Coverage Diff           @@
##           master    #2455   +/-   ##
=======================================
  Coverage   72.66%   72.66%           
=======================================
  Files          78       78           
  Lines        3669     3669           
  Branches       58       58           
=======================================
  Hits         2666     2666           
  Misses        999      999           
  Partials        4        4           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: jagadeesh <jagadeeshj@ideas2it.com>
@jagadeeshi2i jagadeeshi2i changed the title [WIP] fix batch input fix batch input Jul 11, 2023
@jagadeeshi2i jagadeeshi2i changed the title fix batch input fix batch input - Nvidia DALI Jul 11, 2023
@agunapal agunapal self-requested a review July 13, 2023 00:31
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jagadeeshi2i Could you please try running the benchmark tool with your changes.

Few weeks ago, I tried this and I was noticing issues with mismatch related to expected batch_size response and the actual response.

@chauhang chauhang requested a review from agunapal July 13, 2023 05:34
@jagadeeshi2i
Copy link
Collaborator Author

jagadeeshi2i commented Jul 13, 2023

model metric with batch 5
model_metrics.log

Hi @jagadeeshi2i Could you please try running the benchmark tool with your changes.

Few weeks ago, I tried this and I was noticing issues with mismatch related to expected batch_size response and the actual response.

Hi @agunapal I have added the model_metrics.log from benchmark tool.

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jagadeeshi2i I am guessing you have some uncommitted changes.

I tried your example

Getting this error

2023-07-13T23:02:21,869 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG -     response = self.pipe.run(source=batch_tensor)
2023-07-13T23:02:21,869 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - TypeError: Pipeline.run() got an unexpected keyword argument 'source'

@jagadeeshi2i
Copy link
Collaborator Author

jagadeeshi2i commented Jul 14, 2023

Hi @jagadeeshi2i I am guessing you have some uncommitted changes.

I tried your example

Getting this error

2023-07-13T23:02:21,869 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG -     response = self.pipe.run(source=batch_tensor)
2023-07-13T23:02:21,869 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - TypeError: Pipeline.run() got an unexpected keyword argument 'source'

Did you generate new model.dali file ? I have made change to the serialization file -

jpegs = dali.fn.external_source(dtype=types.UINT8, name="source", batch=False)

@agunapal
Copy link
Collaborator

@jagadeeshi2i
Yes I did.

(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ python serialize_dali_pipeline.py --config dali_config.json
/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/nvidia/dali/backend.py:46: Warning: DALI support for Python 3.10 is experimental and some functionalities may not work.
  deprecation_warning("DALI support for Python 3.10 is experimental and some functionalities "
Saved ./model.dali
(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ../image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler custom_handler.py --extra-files ../image_classifier/index_to_name.json,./model.dali,./dali_config.json
(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ mkdir model_store
(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ mv resnet-18.mar model_store/
(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ 
(torchserve) ubuntu@ip-172-31-7-107:~/fork/serve/examples/nvidia_dali$ 

I will try it again and update

@agunapal
Copy link
Collaborator

dali_logs.txt

hi @jagadeeshi2i I tried it again. I see the same error. I have attached the complete log. Can you please check

Signed-off-by: jagadeesh <jagadeeshj@ideas2it.com>
@jagadeeshi2i
Copy link
Collaborator Author

dali_logs.txt

hi @jagadeeshi2i I tried it again. I see the same error. I have attached the complete log. Can you please check

Sorry my bad. I have updated the requirements.txt file to point the latest dali version.

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jagadeeshi2i I tried it. It works.

@agunapal agunapal requested a review from msaroufim July 25, 2023 01:58
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping to unblock I did not test. Will reiterate the DALI code really needs a test

@lxning lxning merged commit 31b42e8 into pytorch:master Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants