High RAM usage when loading FastText Model on Google Colab #2502

rianrajagede · 2019-05-28T05:23:27Z

Problem description

I want to load FastText pre-trained model using Gensim. I run this script in Google Colab with ~12GB RAM but it always crashes, with Colab's message: "Your session crashed after using all available RAM."

Steps/code/corpus to reproduce

# Download dan unzip model
!wget https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz
!gunzip -k cc.en.300.bin.gz

# Install / Upgrade Gensim
!pip install --upgrade gensim

# Load model method 1 
from gensim.models.fasttext import FastText, load_facebook_vectors
model = load_facebook_vectors("cc.en.300.bin.gz")

# Load model method 2 
from gensim.models.fasttext import FastText
model = FastText.load_fasttext_format('cc.en.300.bin')

I didn't use both methods at the same time, I only use one of them. Then, I restart the runtime to clear the memory if I want to run another method. I use method 2 to avoid issue #2378. Both method crash Colab by using all available RAM. At first, I think the problem is the model but if I check the size each model, their size is far below 12GB:

cc.en.300.bin.gz     4.19 GB
cc.en.300.bin        6.74 GB

and if I load using FastText Python module, it works:

# Install FasText
!git clone https://github.com/facebookresearch/fastText.git
!pip install fastText/.
# Load model
import fastText
model = fastText.load_model("cc.en.300.bin")

Versions

Linux-4.14.79+-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0]
NumPy 1.16.3
SciPy 1.3.0
gensim 3.7.3
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

gojomo · 2019-05-29T17:05:07Z

Does it crash on the load, or shortly thereafter when you start using the vectors? Because: doing common operations like most_similar() requires the creation of a cache of unit-normalized vectors, which roughly doubles the required RAM – which would easily explain exhausting 12G RAM.

rianrajagede · 2019-05-29T21:28:29Z

it crash on the loading process, stopped in the syntax above. I haven't used the model at all.

ptiwari2407 · 2020-08-12T06:11:55Z

@gojomo , then how can i get around using most_similar() on google colab ?

gojomo · 2020-08-12T20:09:22Z

If most_similar() is the specific operation you need, there's no getting around it. You'd need to use a smaller model or machine with more memory.

There are a bunch of major memory inefficiencies and unnecessary over-allocations in gensim's FastText support up through the current released version, 3.8.3. They'll be fixed in the eventual gensim-4.0.0 release, so those FB models might be more usable within 12GB, but those & other changes are still being tested & further improved, and there's not yet any certain date for a 4.0.0 release. An advanced user capable of running in-development code that's checked-out from Github and built locally could use that fixed code now & help test it, but I'm not sure Google Colab would support that.

gojomo · 2020-10-20T18:53:33Z

This has been much-improved by #2698, #2944, & other recent work that will be available in the 4.0.0 release, so closing this issue.

italodamato · 2020-12-07T10:32:56Z

I'm getting the same error. The colab runtime crashes loading the model:

model = gensim.models.fasttext.load_facebook_model('.../crawl-300d-2M-subword/crawl-300d-2M-subword.bin')

piskvorky · 2020-12-07T10:39:15Z

@italodamato please post your versions, like @rianrajagede did above (or open a new ticket).

italodamato · 2020-12-07T11:12:21Z

numpy==1.18.5
scipy==1.4.1
gensim==3.8.3

gcc 7.5.0
python 3.6.9
Ubuntu 18.04

@piskvorky

piskvorky · 2020-12-07T11:35:23Z

In that case see @gojomo's answer above.

The 4.0 beta release is here: https://github.com/RaRe-Technologies/gensim/releases/tag/4.0.0beta

italodamato · 2020-12-07T12:26:35Z

Upgraded to 4.0 but it keeps crashing.

gojomo · 2020-12-07T18:34:37Z

It looks like you're using an even larger model (crawl-300d-2M-subword.bin, 7.24GB) than the original report.

As the gensim-4.0.0-beta has removed the major sources of unnecessary memory usage in Gensim's implementation, if you are still getting "crashed after using all available RAM" errors, your main ways forward are likely to be: (1) moving to a system with more RAM at Colab or elsewhere; (2) if it's possible your other uses of RAM are contributing the usage, reducing those usages.

italodamato · 2020-12-07T19:20:04Z

I'll try with his model. I'm not sure what the difference is between the two though.

italodamato · 2020-12-07T19:29:18Z

It's crashed again. I don't have any other things in memory when I do it.

gojomo · 2020-12-08T23:07:43Z

On a VM with 32G RAM, with Python 3.6 under Ubuntu 18.04 & gensim-4.0.0b, I used gensim.models.fasttext.load_facebook_model in a Jupyter notebook to load crawl-300d-2M-subword.bin.

It took almost 4 minutes of wall clock time (!) but completed without error. top reported the process virtual-memory usage as about 10.5GB.

Gensim in Python likely has more overhead than Facebook's C++ fasttext code for loading the same models, so in constrained environments, there will always be some models on-the-margin that could be loaded in one but not the other. But other than that, I don't see anything broken here with regard to memory usage, and can only recommend working in an environment with more memory if you need to use such large models.

(Note, though that other serious issues may remain in loading full native FastText models, such as #2969.)

mpenkov added Hacktoberfest Issues marked for hacktoberfest help wanted labels Sep 28, 2019

philipphager mentioned this issue Jan 8, 2020

load_facebook_model memory footprint #2724

Closed

gojomo closed this as completed Oct 20, 2020

mpenkov mentioned this issue Oct 28, 2020

Update changelog for 4.0.0 release #2981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High RAM usage when loading FastText Model on Google Colab #2502

High RAM usage when loading FastText Model on Google Colab #2502

rianrajagede commented May 28, 2019

gojomo commented May 29, 2019

rianrajagede commented May 29, 2019

ptiwari2407 commented Aug 12, 2020

gojomo commented Aug 12, 2020

gojomo commented Oct 20, 2020

italodamato commented Dec 7, 2020

piskvorky commented Dec 7, 2020

italodamato commented Dec 7, 2020

piskvorky commented Dec 7, 2020 •

edited

Loading

italodamato commented Dec 7, 2020

gojomo commented Dec 7, 2020

italodamato commented Dec 7, 2020

italodamato commented Dec 7, 2020

gojomo commented Dec 8, 2020

High RAM usage when loading FastText Model on Google Colab #2502

High RAM usage when loading FastText Model on Google Colab #2502

Comments

rianrajagede commented May 28, 2019

Problem description

Steps/code/corpus to reproduce

Versions

gojomo commented May 29, 2019

rianrajagede commented May 29, 2019

ptiwari2407 commented Aug 12, 2020

gojomo commented Aug 12, 2020

gojomo commented Oct 20, 2020

italodamato commented Dec 7, 2020

piskvorky commented Dec 7, 2020

italodamato commented Dec 7, 2020

piskvorky commented Dec 7, 2020 • edited Loading

italodamato commented Dec 7, 2020

gojomo commented Dec 7, 2020

italodamato commented Dec 7, 2020

italodamato commented Dec 7, 2020

gojomo commented Dec 8, 2020

piskvorky commented Dec 7, 2020 •

edited

Loading