Memory Leak when using spacy FASTAPI #10496

MariamRiaz · 2022-03-15T10:25:32Z

MariamRiaz
Mar 15, 2022

Hi All,

I am utilizing fastapi to serve two of spacy models as RESTAPI, ""en_core_web_lg" and "de_core_news_lg". We send our text data to this RESTAPI to get dependency information from the models. As we were testing our large dataset we experienced a memory leak on this process when the data reached to a million data points. (Keep in mind that the length of our text is around 30-60 token in average and not a lengthy text). Upon further investigation it seems that the process memory consumption is increasing gradually as the data points it processes increases. If you can help us in resolving this issue it would be very helpful.

MariamRiaz · 2022-03-15T11:29:29Z

MariamRiaz
Mar 15, 2022
Author

The processes are killed because there is no free memory anymore. The interesting part here is the memory usage of the gunicorn processes. So dumped the memory usage during operations:

$ smem -k -P python

  PID User     Command                         Swap      USS      PSS      RSS
  954 www-data python3 /srv/apps/spacyapi/    12.3M   700.0K     1.0M     5.3M
  964 root     /usr/bin/python3 /usr/bin/f     1.3M     9.6M    10.0M    13.8M
22157 root     /usr/bin/python /usr/bin/sm        0    12.6M    12.8M    14.8M
1451 www-data python3 /srv/apps/spacyapi/   570.3M     1.5G     1.5G     1.5G
1449 www-data python3 /srv/apps/spacyapi/   437.9M     1.6G     1.6G     1.6G
1453 www-data python3 /srv/apps/spacyapi/   423.4M     1.6G     1.6G     1.7G
1416 www-data python3 /srv/apps/spacyapi/   608.9M     1.7G     1.7G     1.7G
1452 www-data python3 /srv/apps/spacyapi/   316.4M     1.7G     1.7G     1.7G
1433 www-data python3 /srv/apps/spacyapi/   336.6M     1.8G     1.8G     1.8G
1420 www-data python3 /srv/apps/spacyapi/   165.2M     1.9G     1.9G     2.0G
1448 www-data python3 /srv/apps/spacyapi/   505.2M     1.9G     1.9G     2.0G
1457 www-data python3 /srv/apps/spacyapi/    98.5M     1.9G     2.0G     2.0G
1443 www-data python3 /srv/apps/spacyapi/   104.9M     2.1G     2.1G     2.1G
1479 www-data python3 /srv/apps/spacyapi/    89.1M     2.5G     2.5G     2.5G
1467 www-data python3 /srv/apps/spacyapi/   177.4M     2.7G     2.7G     2.7G

As you can see, each gunicorn process uses ~2GB RAM (the PSS Column) and additionally ~500MB Swap space.

After a restart of spacy fastapi:

$ smem -k -P python

  PID User     Command                         Swap      USS      PSS      RSS
  964 root     /usr/bin/python3 /usr/bin/f     1.3M     9.5M     9.8M    13.4M
22920 www-data python3 /srv/apps/spacyapi/     3.4M     8.9M    10.1M    20.4M
22965 root     /usr/bin/python /usr/bin/sm        0    11.2M    11.3M    13.5M
22933 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22926 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22925 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22932 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22924 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22931 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22929 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22923 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22927 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22928 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22922 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G
22930 www-data python3 /srv/apps/spacyapi/    84.0K     2.0G     2.0G     2.0G

After the restart, it still uses 2GB RAM, but nearly no swap space.
So anyhow, during operation, it uses more and more swap space until swap is full, the stuff is getting killed.

0 replies

ljvmiranda921 · 2022-03-16T07:05:52Z

ljvmiranda921
Mar 16, 2022

Hi @MariamRiaz , the increase in memory may be caused by the growing vocab as your server accepts a lot of requests, although yes it's unusual that it grows that quickly. It's hard to pinpoint what the exact cause is: it could be the gunicorn server itself and not the logic behind getting the dependency information.

One thing you can try out is to force restart a worker after a certain number of requests. If you're using FastAPI and gunicorn/uvicorn, you can set --max-requests to some number, to alleviate the memory leak.

2 replies

MariamRiaz Mar 23, 2022
Author

Hi @ljvmiranda921 I have investigated the issue and it is indeed an issue of growing vocab. Can you please tell me if force restart of the worker in gunicorn/uvicorn will also reset the vocab of spacy???

ljvmiranda921 Mar 24, 2022

Hi @MariamRiaz , it might be able to help, but I haven't tested that fully so can't say how much it will alleviate the memory problems. You can also try out some of the suggestions here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak when using spacy FASTAPI #10496

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Memory Leak when using spacy FASTAPI #10496

MariamRiaz Mar 15, 2022

Replies: 2 comments · 2 replies

MariamRiaz Mar 15, 2022 Author

ljvmiranda921 Mar 16, 2022

MariamRiaz Mar 23, 2022 Author

ljvmiranda921 Mar 24, 2022

MariamRiaz
Mar 15, 2022

Replies: 2 comments 2 replies

MariamRiaz
Mar 15, 2022
Author

ljvmiranda921
Mar 16, 2022

MariamRiaz Mar 23, 2022
Author