Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix swig template memory leak #859

Closed
wants to merge 2 commits into from

Conversation

praveenraja1
Copy link

Problem:

Device gets stuck if left switched on for prolonged period of time for more than 30 hrs.

Observation:

Pmon docker seems to be leaking memory over time, leading to system wide out-of-memory condition.
Python process in pmon such as psud, thermalctld were all leaking memory over time.
The issue was traced back to table.set() operations that were done in these python process.
We have swig template files which are used to generate the code mappings for python to cpp conversions.

In these constructor functions, we have allocated new references using PySequence_GetItem(), which these references were not marked with Py_DECREF, when done using them. This was causing stale objects to present which was leaking memory over time.
/usr/lib/python3/dist-packages/swsscommon/swsscommon.py:184: size=2851 B, count=49, average=58 B
/usr/lib/python3.9/tracemalloc.py:505: size=1456 B, count=24, average=61 B
/usr/lib/python3.9/tracemalloc.py:67: size=1344 B, count=21, average=64 B
/usr/lib/python3.9/tracemalloc.py:558: size=1224 B, count=24, average=51 B
/usr/lib/python3.9/tracemalloc.py:498: size=1104 B, count=23, average=48 B
/usr/lib/python3.9/tracemalloc.py:193: size=768 B, count=16, average=48 B
/usr/lib/python3.9/tracemalloc.py:533: size=576 B, count=1, average=576 B
/usr/lib/python3.9/threading.py:574: size=536 B, count=1, average=536 B

Fix:

Used Py_DECREF in constructor functions.

Unit test:

Tested using a custom py script which does table.set in while loop (and checking tracemalloc) to ensure memory leak doesnt happen.

Allowed the system to be up for sometime to ensure memory usage of pmon docker doesnt increase like earlier.

Check show platform fan/psu to ensure that pmon functionality isnt affected due to this change.

@@ -164,6 +164,7 @@
temp.push_back(std::pair< std::string,std::string >());
PyObject *item = PySequence_GetItem($input, i);
if (!PyTuple_Check(item) || PyTuple_Size(item) != 2) {
Py_DECREF(item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_DECREF

Thanks for the bugfix! the code LGTM. Is it possible to add unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we already have a case:

def test_SelectMemoryLeak():
to test this scenario, but I don't know why it was not exposed by this case. I'm investigating it.

lguohan pushed a commit that referenced this pull request May 28, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak #859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
Pterosaur added a commit to Pterosaur/sonic-swss-common that referenced this pull request May 29, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak sonic-net#859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
Pterosaur added a commit to Pterosaur/sonic-swss-common that referenced this pull request May 29, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak sonic-net#859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
Pterosaur added a commit to Pterosaur/sonic-swss-common that referenced this pull request May 29, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak sonic-net#859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
yxieca pushed a commit that referenced this pull request Jun 3, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak #859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
arfeigin pushed a commit to arfeigin/sonic-swss-common that referenced this pull request Jun 27, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak sonic-net#859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
@Pterosaur
Copy link
Contributor

Issue has been fixed, thanks

@Pterosaur Pterosaur closed this Jul 11, 2024
Pterosaur added a commit to Pterosaur/sonic-swss-common that referenced this pull request Aug 28, 2024
Fix the issue sonic-net/sonic-buildimage#17025 about Redis set activity

Description
The issue reports a memory leak on the Redis set operations

Reason
Didn't decrease the reference count after PySequence_GetItem
Use the inappropriate Swig API and didn't release the string of SWIG_AsPtr_std_string
Fix:
Refer PR: Fix swig template memory leak sonic-net#859 from @praveenraja1
Replace the API SWIG_AsPtr_std_string to SWIG_AsVal_std_string
Add unit test
To monitor there is no dramatic memory increment after a huge amount of Redis set operations.

Signed-off-by: Ze Gan <zegan@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants