How to prevent vocabulary from constantly growing? #9369

mbrunecky · 2021-10-04T23:18:34Z

mbrunecky
Oct 4, 2021

I apologize that this is essentially a repeat of what I have saved as:
Streaming Data Memory Growth (reprise) GitHub #5083

I have a 'prediction server', which receives a document, runs the NER or SpanCat pipeline, returns predicted entities and drops the document. And it keeps growing in memory.

More specifically, the nlp.vocab keeps growing, as each request (document) contains a new names, misspelled words etc.

I tried using (various) variations of the code suggested in #5083 (see below). But no matter what I do, the call
nlp.vocab.strings._reset_and_load(minimal_strings)
does not reduce the vocabulary, it just keeps growing.

The code posted under #5083:

import spacy
import random
import string


# Streaming Data Memory Growth (reprise) GitHub #5083
def generate_strings(as_tuples=False):
    while True:
        s = ''.join([random.choice("          " + string.ascii_letters + string.digits) for n in range(20)])
        yield s

def main():
    nlp = spacy.blank('xx')
    minimal_strings = set(nlp.vocab.strings) | set([nlp.vocab.strings[lex.orth] for lex in nlp.vocab])
    current_vocab_size = len(nlp.vocab)
    for i, doc in enumerate(nlp.pipe(generate_strings())):
        if not i % 10000:
            print(i, len(nlp.vocab), len(nlp.vocab.strings), doc.text)
            if len(nlp.vocab) > current_vocab_size:
                minimal_strings.update([nlp.vocab.strings[lex.orth] for lex in nlp.vocab])
                current_vocab_size = len(nlp.vocab)
 
            nlp.vocab.strings._reset_and_load(minimal_strings)

if __name__ == '__main__':
    main()

I am puzzled by the line:
minimal_strings.update([nlp.vocab.strings[lex.orth] for lex in nlp.vocab])
Would not that simply make those 'minimal_strings' equal to what is the current vocabulary (and hence prevent any 'reduction')?

But even when I skip that update and keep calling:
nlp.vocab.strings._reset_and_load(minimal_strings)
using the original, unchanged, unmodified minimal_strings, the vocabulary does not 'shrink'. It keeps growing. Almost like
_reset_and_load() only made a union of what was there before the call and the 'minimal_strings' (i.e. no 'reset').

Any suggestion what I may be doing wrong, or better: Is there any better way to keep my server from growing?

adrianeboyd · 2021-10-06T07:38:43Z

adrianeboyd
Oct 6, 2021

The typical solution is to periodically reload the pipeline, obviously as long as you don't have any saved Doc objects that rely on the current strings.

The script from #5083 worked in spacy v2 because there was a hard-coded limit for the size of the lexeme cache, which was removed in v3 since it's something that should be up to the user. In v2 the string store would still grow in the same way as in v3, but the lexeme cache wouldn't.

A similar v3 version could look something like this. Here the entire pipeline is reloaded to keep everything in sync and the strings are reset to the original minimal set required by the freshly initialized components.

import spacy
import random
from string import ascii_letters, digits


def generate_strings(as_tuples=False):
    while True:
        s = "".join(
            [random.choice(" " * 10 + ascii_letters + digits) for n in range(20)]
        )
        yield s


def main():
    nlp = spacy.blank("xx")
    nlp_bytes = nlp.to_bytes()
    minimal_strings = set(nlp.vocab.strings) | set(
        nlp.vocab.strings[lex.orth] for lex in nlp.vocab
    )
    current_vocab_size = len(nlp.vocab)
    for i, doc in enumerate(nlp.pipe(generate_strings())):
        if not i % 10000:
            print(i, len(nlp.vocab), len(nlp.vocab.strings), doc.text)
            if len(nlp.vocab) > 10000:
                nlp.from_bytes(nlp_bytes)
                nlp.vocab.strings._reset_and_load(minimal_strings)


if __name__ == "__main__":
    main()

Depending on the pipeline components, you might be able to reduce what's serialized and reloaded it a bit, in particular the vocab lookups and vectors, but they can also grow depending on what your pipeline does (retokenization adds vectors, setting certain lexeme properties adds lookups entries).

Just reloading the vocab won't necessarily work because other components may have cache entries that reference strings or lexemes (here, it's the tokenizer cache). In general, the overall pipeline design is that a component can assume that something that it has added to the string store is always there in the future.

1 reply

mbrunecky Oct 7, 2021
Author

Thank you very much and apologize for taking long to respond (I was upgrading my Spacy 3.0 to 3.1 and that made me upgrade CUDA because apparently torch does not support combination torch==1.9.1+cu101, I had to go cu102).

Reading your code, I do not understand why I would want to reload strings after I reloaded the model. That would give me back all the strings I wanted to get rid of. And I assume all my pipeline components have been reloaded - no cached references to vocab.

However, from a more practical point, it looks like nlp.from_bytes(nlp_bytes) takes about the same time (in fact, on some models longer) than loading the model from scratch. Keeping a 'copy' of the model as nlp_bytes (0.5 to 1 GB depending on model) to 'save' memory is counter-productive.

So I went ahead with some experimentation. And when using CPU only, reloading the model works, both for my transformer+spancat model (trained CPU only) or en_core_web _trf (I admit I did not check how much memory gets released).

The problems start when using GPU.
When I reload my (transformer+spancat) model using nlp.from_bytes(nlp_bytes), my prediction requests (invoking pipeline) gives me:
Processing exception: <class 'RuntimeError'> Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select) <traceback object at 0x000002DBB692FD80>

When I delete my model (del nlp) and load a new instance of my (transformer+spancat) model from scratch, the GPU memory does not get released, and I get:
Processing exception: <class 'TypeError'> a bytes-like object is required, not 'cupy._core.core.ndarray' <traceback object at 0x00000256AFBB7DC0>

Both errors above seem to suggest that attempts to delete/reload a transformer model on GPU leave behind some cached data (global state)

Now, the deleting and loading a new instance of en_core_web_trf (used for NER only) behaves differently. I do not get any error, but after the reload, the GPU CUDA usage goes to zero. It is not much to begin with, but delete/reload seems to keep the GPU memory (I see no release/allocate), and the pipeline stops using GPU.

Perhaps the difference between my model and en_core_web_trf is that my model is trained using CPU only. And it uses more GPU memory when used with GPU.

I am not sure if 'unloading' the model using GPU is something that can be easily fixed. In my use cases, this is not a critical issue, I was planning on not using GPU in production predictions (due to costs). Worst case, I may resort to killing a process and starting a new one - my clients can wait 3-5 seconds.

adrianeboyd · 2021-10-08T07:05:50Z

adrianeboyd
Oct 8, 2021

Please understand that this script was just meant to update the previous example for v3, to demonstrate what needs to be reloaded in the general case. However, I just took another look at nlp.from_bytes()/nlp.from_disk(), and it turns out that it isn't quite going to work due to some memory pools that aren't reset, so the memory usage will continue to grow slowly even though the size of the lexeme cache and string store are limited.

So it would be better to use spacy.load(), it's just that that wasn't going to work in this particular setup with nlp.pipe with an infinite generator. (nlp.from_bytes() and nlp.from_disk() keep any existing strings in the string store, so if you don't reset it to the original "minimal" strings, the string store will keep growing. You can try it out with the demo script to see!)

When you reload a model, you do need to be sure that you've previously called spacy.require_gpu() in the right scope. With multiprocessing with spawn it matters whether you've called it globally in the top-level script, or you can add it again right before you load the models.

If you are using models with torch on GPU, you want to add this to have cupy and pytorch share the same memory pool:

set_gpu_allocator("pytorch")

See the example here: https://spacy.io/usage/embeddings-transformers#transformers-runtime

And a related issue about GPU memory usage: #8984 (comment)

1 reply

mbrunecky Oct 8, 2021
Author

Thank you @adrianeboyd for a detailed explanation.
Yes, in my case the load() calls were made from different threads, and I assumed spacy.require_gpu() is"global". I will do more testing after I handle couple higher priority interrupts.

mbrunecky · 2021-10-12T01:22:59Z

mbrunecky
Oct 12, 2021
Author

Thank you again. Great advice.
The key was 'call spacy.require_gpu() in the right scope'. I added the call just before the (re) load(). Verified using both my models and en_core_web_lg (and trf) in a single-threaded/single-model server and multi-threade/single-model one. Even in aggressive testing (reloading whenever vocabulary reaches 4000) it works great, because loads take 2-3 seconds and my requests (depending on model) take 0.1 to 3 sec. And it keeps my server memory usage under control (typically below 2G).
I am not sure I need to call thinc.set_gpu_allocator("pytorch") every time, but it does not appear to hurt.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prevent vocabulary from constantly growing? #9369

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to prevent vocabulary from constantly growing? #9369

mbrunecky Oct 4, 2021

Replies: 3 comments · 2 replies

adrianeboyd Oct 6, 2021

mbrunecky Oct 7, 2021 Author

adrianeboyd Oct 8, 2021

mbrunecky Oct 8, 2021 Author

mbrunecky Oct 12, 2021 Author

mbrunecky
Oct 4, 2021

Replies: 3 comments 2 replies

adrianeboyd
Oct 6, 2021

mbrunecky Oct 7, 2021
Author

adrianeboyd
Oct 8, 2021

mbrunecky Oct 8, 2021
Author

mbrunecky
Oct 12, 2021
Author