-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Gensim docker container #1368
Changes from 9 commits
e148719
b5e27b5
e653d3f
a40b1df
330b9ad
a371056
800aae6
498ae79
9afaca7
19ce3ca
ee3c4cf
9fd041e
1cc3398
4825a47
e3a9699
d0f09bb
36a5025
2db28f4
25865a9
1add68d
ce4656d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
FROM ubuntu:16.04 | ||
|
||
MAINTAINER Daniel Baptista Dias <danielbpdias@gmail.com> | ||
|
||
ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim/archive | ||
ENV GENSIM_BRANCH gensim_docker | ||
|
||
# Installs python, pip and setup tools (with fixed versions) | ||
RUN apt-get update \ | ||
&& apt-get install -y \ | ||
ant=1.9.6-1ubuntu1 \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tmylk @menshikh-iv Should I remove version pinning from here also? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If fixation is not required, then, of course, you can remove it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pinning versions in a container is actually very good. It guarantees that it keeps running. Just don't pin the gensim version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So please keep the pins here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guys, the fixation is needed due best practices with docker. Without these versions fixed the container can broke when a components changes it versions and when someone try to get it from a docker pull. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thing: In my previous PR I forgot to add a modification in dockerfile that fix some dependencies for git dependencies to avoid this problems. I was following the docker contribution guidelines to turn this image into an official one in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks @danielbdias, I've kept the pinned versions which were already there in your PR and will add version pinning for rest of the packages which I added later. |
||
cmake=3.5.1-1ubuntu3 \ | ||
default-jdk=2:1.8-56ubuntu2 \ | ||
g++=4:5.3.1-1ubuntu1 \ | ||
git=1:2.7.4-0ubuntu1 \ | ||
libboost-all-dev=1.58.0.1ubuntu1 \ | ||
libgsl-dev=2.1+dfsg-2 \ | ||
mercurial=3.7.3-1ubuntu1 \ | ||
python3=3.5.1-3 \ | ||
python3-pip=8.1.1-2ubuntu0.4 \ | ||
python3-setuptools=20.7.0-1 \ | ||
python \ | ||
python-pip \ | ||
python-setuptools \ | ||
unzip=6.0-20ubuntu1 \ | ||
wget=1.17.1-1ubuntu1.1 \ | ||
subversion \ | ||
locales \ | ||
libopenblas-dev \ | ||
libboost-program-options-dev \ | ||
zlib1g-dev | ||
|
||
# Setup python language | ||
RUN locale-gen en_US.UTF-8 | ||
ENV LANG en_US.UTF-8 | ||
ENV LC_CTYPE en_US.UTF-8 | ||
ENV LC_ALL en_US.UTF-8 | ||
|
||
# Upgrade pip | ||
RUN pip2 install --upgrade pip | ||
RUN pip3 install --upgrade pip | ||
|
||
# Install dependencies | ||
RUN pip2 install \ | ||
cython==0.25.2 \ | ||
jupyter==1.0.0 \ | ||
matplotlib==2.0.0 \ | ||
nltk==3.2.2 \ | ||
pandas==0.19.2 \ | ||
git+https://github.com/mila-udem/blocks.git@stable \ | ||
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt | ||
|
||
RUN pip3 install \ | ||
cython==0.25.2 \ | ||
jupyter==1.0.0 \ | ||
matplotlib==2.0.0 \ | ||
nltk==3.2.2 \ | ||
pandas==0.19.2 \ | ||
git+https://github.com/mila-udem/blocks.git@stable \ | ||
-r https://raw.githubusercontent.com/mila-udem/blocks/stable/requirements.txt | ||
|
||
# avoid using old numpy version installed by blocks requirements | ||
RUN pip2 install -U numpy | ||
RUN pip3 install -U numpy | ||
|
||
# Create gensim directory and dependencies directory | ||
RUN mkdir /gensim \ | ||
&& mkdir /gensim/gensim_dependencies | ||
|
||
# Download gensim from Github | ||
RUN mkdir /gensim/download \ | ||
&& cd /gensim/download \ | ||
&& wget --quiet $GENSIM_REPOSITORY/$GENSIM_BRANCH.zip \ | ||
&& unzip $GENSIM_BRANCH.zip \ | ||
&& mv ./gensim-$GENSIM_BRANCH/* /gensim \ | ||
&& rm -rf /gensim/download \ | ||
&& cd /gensim \ | ||
&& pip2 install .[test] \ | ||
&& python2 setup.py install \ | ||
&& pip3 install .[test] \ | ||
&& python3 setup.py install | ||
|
||
# Set ENV variables for wrappers | ||
ENV FT_HOME gensim_dependencies/fastText | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace relative to absolute (for all |
||
ENV WR_HOME gensim_dependencies/wordrank | ||
ENV MALLET_HOME gensim_dependencies/mallet | ||
ENV DTM_PATH gensim_dependencies/dtm/dtm/main | ||
ENV VOWPAL_WABBIT_PATH gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add varembed path in order to run varembed tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to test_varembed_wrapper.py varembed path is not required. It only tests on these test_data files. |
||
|
||
# Install custom dependencies | ||
|
||
# Install fastText | ||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://github.com/facebookresearch/fastText.git \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please pin version for FastText/Wordrank/etc (you can use commit hash or version) |
||
&& cd /gensim/gensim_dependencies/fastText \ | ||
&& make | ||
|
||
# Install WordRank | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment all things connected with wordrank in dockerfile (ompi problem) |
||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://bitbucket.org/shihaoji/wordrank \ | ||
&& cp /gensim/docker/wordrank_install.sh /gensim/gensim_dependencies/wordrank/install.sh \ | ||
&& cd /gensim/gensim_dependencies/wordrank \ | ||
&& sh ./install.sh | ||
|
||
# Install MorphologicalPriorsForWordEmbeddings | ||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://github.com/rguthrie3/MorphologicalPriorsForWordEmbeddings.git | ||
|
||
# Install DTM | ||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://github.com/blei-lab/dtm.git \ | ||
&& cd /gensim/gensim_dependencies/dtm/dtm \ | ||
&& make | ||
|
||
# Install Mallet | ||
RUN mkdir /gensim/gensim_dependencies/mallet \ | ||
&& mkdir /gensim/gensim_dependencies/download \ | ||
&& cd /gensim/gensim_dependencies/download \ | ||
&& wget --quiet http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip \ | ||
&& unzip mallet-2.0.8.zip \ | ||
&& mv ./mallet-2.0.8/* /gensim/gensim_dependencies/mallet \ | ||
&& rm -rf /gensim/gensim_dependencies/download \ | ||
&& cd /gensim/gensim_dependencies/mallet \ | ||
&& ant | ||
|
||
# Install Vowpal wabbit | ||
RUN cd /gensim/gensim_dependencies \ | ||
&& git clone https://github.com/JohnLangford/vowpal_wabbit.git \ | ||
&& cd /gensim/gensim_dependencies/vowpal_wabbit \ | ||
&& make \ | ||
&& make install | ||
|
||
# Start gensim | ||
|
||
# Run check script | ||
RUN python2 /gensim/docker/check_fast_version.py | ||
RUN python3 /gensim/docker/check_fast_version.py | ||
|
||
# Add running permission to startup script | ||
RUN chmod +x /gensim/docker/start_jupyter_notebook.sh | ||
|
||
# Define the starting command for this container and expose its running port | ||
CMD sh -c '/gensim/docker/start_jupyter_notebook.sh 9000' | ||
EXPOSE 9000 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import sys | ||
|
||
try: | ||
from gensim.models.word2vec_inner import FAST_VERSION | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add additional info for this script (like python version, numpy/scipy/gensim version) and create alias in docker for this script There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does run from here in dockerfile with both python 2 and 3. And only the pinned versions of numpy/scipy/gensim installed in the container are used |
||
print('FAST_VERSION ok ! Retrieved with value ', FAST_VERSION) | ||
sys.exit() | ||
except ImportError: | ||
print('Failed... fall back to plain numpy (20-80x slower training than the above)') | ||
sys.exit(-1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
version: '2' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why you need docker-compose file? |
||
|
||
services: | ||
gensim: | ||
build: . | ||
ports: | ||
- 9000:9000 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
PORT=$1 | ||
NOTEBOOK_DIR=/gensim/docs/notebooks | ||
DEFAULT_URL=/notebooks/gensim%20Quick%20Start.ipynb | ||
|
||
jupyter notebook --no-browser --ip=* --port=$PORT --allow-root --notebook-dir=$NOTEBOOK_DIR --NotebookApp.token=\"\" --NotebookApp.default_url=$DEFAULT_URL |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#!/bin/bash | ||
|
||
printf "1. clean up workspace\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you add this script here? This script from wordrank repo. |
||
./clean.sh | ||
|
||
printf "\n2. install glove to construct cooccurrence matrix\n" | ||
wget http://nlp.stanford.edu/software/GloVe-1.0.tar.gz # if failed, check http://nlp.stanford.edu/projects/glove/ for the original version | ||
tar -xvzf GloVe-1.0.tar.gz; rm GloVe-1.0.tar.gz | ||
patch -p0 -i glove.patch | ||
cd glove; make clean all; cd .. | ||
|
||
printf "\n3. install hyperwords for evaluation\n" | ||
hg clone -r 56 https://bitbucket.org/omerlevy/hyperwords | ||
patch -p0 -i hyperwords.patch | ||
|
||
printf "\n4. build wordrank\n" | ||
export CC=gcc CXX=g++ # uncomment this line if you don't have an Intel compiler, but with gcc all #pragma simd are ignored as of now | ||
cmake . | ||
make clean all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add gensim version as env variable