Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize #69

Open
laidig opened this issue Apr 27, 2018 · 22 comments
Open

Dockerize #69

laidig opened this issue Apr 27, 2018 · 22 comments
Labels
compatibliity related to versions, OSs etc help wanted Extra attention is needed
Milestone

Comments

@laidig
Copy link
Contributor

laidig commented Apr 27, 2018

Looks like interesting work you have done. Would you accept a version that runs in Docker?

I could contribute it in the next couple weeks.

@answerquest
Copy link
Collaborator

@laidig gladly, Please do! Thanks in advance! And if possible also put a from-scratch sequence of commands for someone to install docker and deploy the dockerized app. And also how I can update it when there are changes (or I'll leave it to you for doing at major version changes). Please let me know what you need from my end. I can download and re-upload the file into the Releases section.

@laidig
Copy link
Contributor Author

laidig commented Apr 27, 2018

There are two components to Dockerizing:

  1. Making the Dockerfile that can build docker images and adding that to the repo
  2. Using that file to build and image and the push that to docker hub.

I'd take on 1, and then we can decide on 2. Of course, docs would go with both ;)

@answerquest
Copy link
Collaborator

@laidig I'll let you take lead on this as I'm currently uninitiated regarding docker. You mentioned docs.. do share clearly on what's needed.

@laidig
Copy link
Contributor Author

laidig commented May 16, 2018

I made a first pass at this:
https://github.com/laidig/static-GTFS-manager/tree/docker

To run from Docker:
docker pull laidig/static-gtfs-manager
docker run -it -p 5000:5000 laidig/static-gtfs-manager

Your feedback, if you have any, is appreciated.

It can likely be refined, but I haven't played around with it enough to be sure that it is working well yet.

@laidig
Copy link
Contributor Author

laidig commented May 25, 2018

I made a second pass (same instructions), and the docker image is now half the size of the previous one. I might be able to get it smaller still.

@answerquest
Copy link
Collaborator

@laidig that's great.. thank you so much for giving your time for this. Sorry i'm not able to give this a spin for now, am working on bringing in another city's data format.

Question: if the program uses HDF5 file formats (.h5 files) to store and retrieve tables, then can a dockerized version of it run smoothly on windows? I'm exploring using HDF5, it's working fine in ubuntu but on windows I'm running into many issues with my old (and not updated) win7 boot.

Another query: If we make a dockerized version from an ubuntu OS, can it work in windows OS?

And then another query, dumbing it down even further: Can the docker version for windows run from double-clicking a .exe or shortcut? If not immediately, then is it possible to engineer such a solution?

@laidig
Copy link
Contributor Author

laidig commented May 29, 2018

This Docker image is running "Slim" Debian (not full Ubuntu) under another Host OS-- I'm using Mac OS. It works easily under Win10 or Windows Server because they have Hyper-V built in. It also works not as smoothly under Windows 7/8 via Docker Toolbox (https://docs.docker.com/toolbox/toolbox_install_windows/#step-2-install-docker-toolbox)

So yes, the file format should work when running on a Windows host because the code sees a Linux OS.

But that brings up another thing I have to resolve-- the current configuration I made doesn't necessarily save data across reboots-- I'll add a persistent volume to take care of that.

@laidig
Copy link
Contributor Author

laidig commented May 29, 2018

Follow up question to my last point: I'm assuming the persistent data is kept in the GTFS directory, correct?

@answerquest
Copy link
Collaborator

answerquest commented May 29, 2018

@laidig thanks, good to know it can work across OS's.

Yes, the program's persistent data stored in GTFS/db.json and GTFS/sequence.json. The other files there are actually artefacts from earlier development (I had started with working with the csv's earlier before moving to json) and I'd kept them around just to cross-check the data during development. The folders (yyyy-mm-dd-name) contain feed exports and aren't used again by the program (but are accessible to users at the program's home page in Browse.. section).

See Technical Overview wiki page for other details on how the program works.

@laidig laidig mentioned this issue May 30, 2018
@answerquest answerquest added the help wanted Extra attention is needed label Jun 18, 2018
@answerquest
Copy link
Collaborator

Hi @laidig , just a heads up, I'm working on a major overhaul that started with the way the DB is handled but has ended up including several improvements all over the place. I should be able to put something up by end of June 2018. There will be changes in the DB structure: now instead of two .json files there will be a variable number of .h5 files, one for each .txt file and thus different operators may having differing files.
Please explore if the docker thing will be able to support HDF5 format. It is not a pure-Python thing like TinyDB which was handling working with the .json files.

@laidig
Copy link
Contributor Author

laidig commented Jun 18, 2018 via email

@answerquest
Copy link
Collaborator

I should have a commit up by next week.

@answerquest answerquest added this to the ongoing milestone Sep 4, 2018
@laidig
Copy link
Contributor Author

laidig commented Sep 7, 2018

I saw your latest version and made an update of the docker image.

Do you have any set of tests (even at the level of: do this, expect this) for functionality to make sure that it's working?

@answerquest
Copy link
Collaborator

Hi @laidig , thanks for this! I was making changes myself for the next release.

From the pull request #102 I understand there's only a line to delete from .dockerignore and some edits to do in Dockerfile. Is that correct?

Asking because in the PR there's other files also getting involved so I'd rather make changes and push from my end.

@answerquest
Copy link
Collaborator

In the files and folders structure, I've renamed the GTFS folder to 'db; now. But all the other folders on the repo are also needed for the program. You had included GTFS under "volumes" heading in docker-compose.yml. Should the other folders be included there too?

@laidig
Copy link
Contributor Author

laidig commented Sep 11, 2018 via email

@answerquest
Copy link
Collaborator

@laidig do I have to edit docker-compose.yml to specify which folders will have to be persistent?

@laidig
Copy link
Contributor Author

laidig commented Nov 8, 2018 via email

@answerquest
Copy link
Collaborator

Thanks @laidig for the clarification.

I want to be able to build and deploy this project in docker from source, instead of pulling an image from docker website/repo. I followed some leads given in this guide and from what is already shared here. Sharing a full report here. I'm able to make this run, but not able to have persistent storage yet.

Build

docker build -t wri-cities/static-gtfs-manager .

It installs and creates the docker images.

Run

docker run -it -p 5000:5000 "wri-cities/static-gtfs-manager"

That works, the program launches (but doesn't launch a browser tab, thats ok), I can now operate it on http://localhost:5000 .

But the storage isn't persistent! I make data changes (create a new frequency), exit the program by Ctrl+C in terminal, then if I run it again, all the data has been reset to original.

Contents of the docker files:

docker-compose.yml :

version: '3'
services:
    static-gtfs-manager:
        ports:
            - '5000:5000'
        image: wri-cities/static-gtfs-manager
        volumes:
          - db:/app/db/

volumes:
  db:

Additional query: I want to include my config folder also as a persistent volume. But in VSCode editor when I type in config it is highlighted as a keyword. Should I put "config": instead?

.dockerignore :

.git/*
export/**/*.txt
logs/*

Dockerfile :

FROM python:3.6-slim-stretch

RUN apt-get update && apt-get -y upgrade && \
    apt-get install -y python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /app
WORKDIR /app
COPY . /app/
RUN pip3 install -r requirements.txt --user --no-cache-dir

EXPOSE 5000

CMD cd /app/ && python3 GTFSManager.py

Abridged terminal log of build process

there is something regarding tzdata that might be relevant:

$ docker build -t wri-cities/static-gtfs-manager .
Sending build context to Docker daemon  50.13MB
Step 1/8 : FROM python:3.6-slim-stretch
3.6-slim-stretch: Pulling from library/python
f17d81b4b692: Pull complete 
(...)
Digest: sha256:537edf25490a9e0685b512dcae76382d37c38c86a1c6221896f96ee6f8f02f19
Status: Downloaded newer image for python:3.6-slim-stretch
 ---> ffafb5882b66
Step 2/8 : RUN apt-get update && apt-get -y upgrade &&     apt-get install -y python3-pip     && rm -rf /var/lib/apt/lists/*
 ---> Running in ff8234c8fd52
Ign:1 http://deb.debian.org/debian stretch InRelease
Get:2 http://security.debian.org/debian-security stretch/updates InRelease [94.3 kB]
(...)
Reading package lists...
Building dependency tree...
Reading state information...
Calculating upgrade...
The following packages will be upgraded:
  tzdata
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 270 kB of archives.
(...)
Setting up tzdata (2018g-0+deb9u1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.24.1 /usr/local/share/perl/5.24.1 /usr/lib/x86_64-linux-gnu/perl5/5.24 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.24 /usr/share/perl/5.24 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype

Current default time zone: 'Etc/UTC'
Local time is now:      Fri Nov  9 03:45:20 UTC 2018.
Universal Time is now:  Fri Nov  9 03:45:20 UTC 2018.
Run 'dpkg-reconfigure tzdata' if you wish to change it.

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:

(... similar to other installations on ubuntu ...)

Removing intermediate container ff8234c8fd52
 ---> 60fdfb8ccadc
Step 3/8 : RUN mkdir -p /app
 ---> Running in 6a41e56213ce
Removing intermediate container 6a41e56213ce
 ---> d77d937ac407
Step 4/8 : WORKDIR /app
Removing intermediate container 5802f8b1aaeb
 ---> 64a105c4a8af
Step 5/8 : COPY . /app/
 ---> 09926eab527a
Step 6/8 : RUN pip3 install -r requirements.txt --user --no-cache-dir
 ---> Running in 17f0578fc86b
(...)
Removing intermediate container 17f0578fc86b
 ---> 531b4059f3ca
Step 7/8 : EXPOSE 5000
 ---> Running in 47aeb4bd1a35
Removing intermediate container 47aeb4bd1a35
 ---> 1af2e47fedb9
Step 8/8 : CMD cd /app/ && python3 GTFSManager.py
 ---> Running in 0d1c699beaf7
Removing intermediate container 0d1c699beaf7
 ---> 87b63bf07598
Successfully built 87b63bf07598
Successfully tagged wri-cities/static-gtfs-manager:latest

So, the main question : I have specified db/ folder as a persistent volume in docker-compose.yml . But that's not doing the job apparently. What will it take?

@answerquest
Copy link
Collaborator

Update: I wasn't able to figure out anything solid from the docs or the similar questions posted on stackoverflow, but through trial and error I have managed to achieve persistence of data by modifying the run command, adding a -v key. For completeness, including the build command preceding it too:

docker build -t wri-cities/static-gtfs-manager .
docker run -it -p 5000:5000 -v persistent:/app/db "wri-cities/static-gtfs-manager"

the 'persistent' word up there can be anything, it's a label. And if you use another label, that will start a different persistent data store, so with that there's an opportunity:

Opportunity for multiple feeds management through docker

If running this tool in docker, users can keep multiple GTFS feeds or versions loaded through changing the label after -v in the run command. One can do this simultaneously by changing the left-side port number after -p for successive runs

Example:

  • Operate on mumbai data: docker run -it -p 5000:5000 -v mumbai:/app/db "wri-cities/static-gtfs-manager"
  • Then, open a new terminal and operate on pune data: docker run -it -p 5001:5000 -v pune:/app/db "wri-cities/static-gtfs-manager"

Ignore the in-program URL shared; through docker you can now have two different instances of static-GTFS-Manager running with separate databases on http://localhost:5000 and http://localhost:5001 .

You can list the volumes created with this command: docker volume ls.
To see a list of commands, do docker volume
To manually browse these, open this path in a file browser with root permissions: /var/lib/docker/volumes

Of course, if running this directly from python or windows exe you can simply clone the folder and do different business in different folders ;)


More Questions arise

Ok, after play time, this still raises some questions that I don't have an answer to right now:

  • what's the point of specifying volumes with label db in docker-compose.yml if at the run command I had to use /app/db on the right side in -v persistent:/app/db ? When I tried putting just db, docker told me it needs absolute path. What if I got rid of the volumes entries in docker-compose.yml?

  • So I got the data to be persistent here, but if I want to shift work to another computer, then? It seems there will be roundabout ways of doing this, but someone with that skill level can might as well just run the program in python simply without bothering to use docker at all.

@answerquest answerquest added the compatibliity related to versions, OSs etc label Nov 9, 2018
@answerquest
Copy link
Collaborator

Update : I got rid of docker-compose.yml and tried the build and run commands again:

docker build -t static-gtfs-manager .
docker run -it -p 5000:5000 -v noyml:/app/db static-gtfs-manager

... and it worked! So it seems docker-compose.yml goes with the docker-compose up command and is needed if the docker-image is up online.

@answerquest
Copy link
Collaborator

answerquest commented Nov 12, 2018

Edit: Update for continuity for readers: Docker business sorted out! See Running with Docker on any OS . Fresh work was done at #129 , #130

@laidig pulling your comment under Packaging for Linux to here to continue the conversation here.

Nice! I’m away from my computer for the next week, do you want to make an
image in your own repository?
Now that you’re comfortable with Docker, you can also have it build
automatically with every commit to master on Github.

  • Yep no worries, I've got this covered now :) Thanks a ton for setting this up and walking me through.
  • I decided to go the local-build way right now instead of pulling from dockerhub in the main repo, as for that I'll have to get my org to set up an official account there.
  • Having the program's file structure available and editable to the user is important. They should be able to manually edit various files in config/ folder, for instance. And it's better for developers too who want to tinker around with things. For these reasons, I've set the docker run command to make the local program folder itself as local volume and the entire /app folder in the container is a volume. Let's see, if other problems pop up in future I'll come back to a more standard way.
  • I've saved away the docker-compose.yml that you had contributed, no worries! Will adapt it and put it back in when we set up official docker repo.
  • I'm a bit nervous about dockerhub getting write access to the github repo through the automatic-build linkup. Especially seeing that it's an organisation account so the risk is to more than just me. Will look more into it then decide.
  • Have created a repo on dockerhub, pushing my build up as we speak! Following this hackernoon guide.
  • Addendum: the dockerhub push certainly takes a long time! The image is 700mb. I expect it'll take long to download too. That's yet another reason for doing local build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibliity related to versions, OSs etc help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants