Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Gensim docker container #1368

Merged
merged 21 commits into from
Jun 30, 2017
Merged

Conversation

parulsethi
Copy link
Contributor

@parulsethi parulsethi commented May 27, 2017

TODO:

  • Run gensim tests in docker container

# Installs python, pip and setup tools (with fixed versions)
RUN apt-get update \
&& apt-get install -y \
ant=1.9.6-1ubuntu1 \
Copy link
Contributor Author

@parulsethi parulsethi May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmylk @menshikh-iv Should I remove version pinning from here also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fixation is not required, then, of course, you can remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning versions in a container is actually very good. It guarantees that it keeps running.

Just don't pin the gensim version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please keep the pins here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, the fixation is needed due best practices with docker. Without these versions fixed the container can broke when a components changes it versions and when someone try to get it from a docker pull.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: In my previous PR I forgot to add a modification in dockerfile that fix some dependencies for git dependencies to avoid this problems.

I was following the docker contribution guidelines to turn this image into an official one in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @danielbdias, I've kept the pinned versions which were already there in your PR and will add version pinning for rest of the packages which I added later.

@parulsethi parulsethi changed the title Gensim docker container [WIP] Gensim docker container May 27, 2017
Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is a very needed feature, just left some comments to make things more clear

&& mv ./gensim-$GENSIM_VERSION/* /gensim \
&& rm -rf /gensim/download \
&& cd /gensim \
&& python setup.py install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please checkout master branch first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will already download the package from branch which is specified in GENSIM_VERSION variable (and I use a branch on my personal fork for now which has the docker folder).

ENV WR_HOME gensim_dependencies/wordrank
ENV MALLET_HOME gensim_dependencies/mallet
ENV DTM_PATH gensim_dependencies/dtm/bin/dtm-linux64
ENV VOWPAL_WABBIT_PATH gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add varembed path in order to run varembed tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to test_varembed_wrapper.py varembed path is not required. It only tests on these test_data files.

import sys

try:
from gensim.models.word2vec_inner import FAST_VERSION, MAX_WORDS_IN_BATCH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for max_words_in_batch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest commit

@menshikh-iv
Copy link
Contributor

@parulsethi What's a status of PR? Have you any problems with container?

@parulsethi
Copy link
Contributor Author

Update: Proper installation of wordrank dependencies is left now (need to install OpenMPI with multithreading enabled).
All other gensim tests (except wordrank) passes.


ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim/archive
ENV GENSIM_BRANCH gensim_docker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add gensim version as env variable

&& python3 setup.py install

# Set ENV variables for wrappers
ENV FT_HOME gensim_dependencies/fastText
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace relative to absolute (for all *_PATH and *_HOME), it's very important.


try:
from gensim.models.word2vec_inner import FAST_VERSION

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add additional info for this script (like python version, numpy/scipy/gensim version) and create alias in docker for this script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does run from here in dockerfile with both python 2 and 3. And only the pinned versions of numpy/scipy/gensim installed in the container are used

@@ -0,0 +1,7 @@
version: '2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need docker-compose file?

@@ -0,0 +1,19 @@
#!/bin/bash

printf "1. clean up workspace\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add this script here? This script from wordrank repo.


# Install fastText
RUN cd /gensim/gensim_dependencies \
&& git clone https://github.com/facebookresearch/fastText.git \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pin version for FastText/Wordrank/etc (you can use commit hash or version)

&& cd /gensim/gensim_dependencies/fastText \
&& make

# Install WordRank
Copy link
Contributor

@menshikh-iv menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment all things connected with wordrank in dockerfile (ompi problem)

docker/README.md Outdated
python2 setup.py test
```

To push the image to docker hub:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needed this block (about push to dockerhub)

docker/README.md Outdated
docker push [my_user]/gensim
```

# Run gensim image from anywhere
Copy link
Contributor

@menshikh-iv menshikh-iv Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace to Run ipython notebook with installed gensim

@menshikh-iv
Copy link
Contributor

Nice feature, thank you @parulsethi 🥇

@menshikh-iv menshikh-iv merged commit bd6db9a into piskvorky:develop Jun 30, 2017
saparina pushed a commit to saparina/gensim that referenced this pull request Jul 9, 2017
* added dockerfile

* remove fasttext from pip installs

* remove syntax errors

* remove unused imports

* modified dockerfile

* add subversion, locales

* use both python2 and python3

* upgrade numpy version

* add readme with relevant commands

* add fixed versions for wrapper dependencies

* made requested changes

* update readme

* change vw pin and remove docker-yml

* change vw version and make absolute paths for wrappers

* specify original gensim repo for download

* change maintainer

* correct missing slash

* use git clone for gensim

* correct gensim folder sequences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants