Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace creation error: 404 No such file or directory #2565

Closed
Spritekin opened this issue Sep 23, 2016 · 22 comments
Closed

Workspace creation error: 404 No such file or directory #2565

Spritekin opened this issue Sep 23, 2016 · 22 comments
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.

Comments

@Spritekin
Copy link

When trying to start a new workspace I get this error:

Error when starting agent
Unable to start workspace agent. Error when trying to start the workspace agent: Start of environment default failed. Error: Error response from docker API, status: 404, message: no such file or directory

Reproduction Steps:

  1. Startup command:
sudo docker run --rm --net=host \
  --name che \
  -e DOCKER_HOST=tcp://$COREOS_PRIVATE_IPV4:2375 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /var/lib/mesos/slave:/var/lib/mesos/slave \
  -v /var/log:/var/log/mesos/slave \
  -v /appdata/che/lib:/home/user/che/lib-copy \
  -v /appdata/che/workspaces:/home/user/che/workspaces \
  -v /appdata/che/storage:/home/user/che/storage \
  codenvy/che-server --remote:$COREOS_PRIVATE_IPV4

Where appdata is a host folder. In my case an NFS shared folder.

  1. Start the Che UI and create a Python 3 workspace. 8GB RAM.
  2. Change name like chetest
  3. Click Create.

Expected behavior:

I expect a workspace to be created.
Observed behavior:

The message pasted above.

Che version: Started from docker... I imagine "latest" as of Sep 23 2016
OS and version: CoreOS 1122.2.0
Docker version: 1.10.3
Che install: Docker server container

@Spritekin
Copy link
Author

Question... what are these exactly?
-v /appdata/che/lib:/home/user/che/lib-copy
-v /appdata/che/workspaces:/home/user/che/workspaces
-v /appdata/che/storage:/home/user/che/storage \

What do these have to contain? Can I just mount /appdata/che to /home/user/che and let che create its own structure?

@ghost ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Sep 23, 2016
@ghost
Copy link

ghost commented Sep 23, 2016

@Spritekin When a workspace starts, ws-agent gets mounted into it, unpacked and started. It has to exist at /home/user/che/lib/. The same concerns the terminal.

Can you change your mount bindings this way:

-v /home/user/che/lib:/home/user/che/lib-copy \
-v /home/user/che/workspaces:/home/user/che/workspaces \
-v /home/user/che/storage:/home/user/che/storage **\**

@Spritekin
Copy link
Author

So it looks for a specific folder in the host??? That's unusual. Let me try.

@ghost
Copy link

ghost commented Sep 23, 2016

@Spritekin yes. The dirs are created on the host, and then the resources are copied to che-lib in the container, and this way they show up on the host. Then, when a workspace container starts, this host path gets mounted into it.

@Spritekin
Copy link
Author

@eivantsov
I changed the path as per your suggestion and it worked. So the question is... how do I make use of my NFS instead of the /home/user/che folder.

a. Should I mount the NFS in the /home/user/che?

b. Is there any parameter that changes the workspace area in the che server so that instead of using /home/user/che it uses an arbitrary folder? So I could mount:
-v /mynfs/che/lib:/mynfs/che/lib-copy
-v /mynfs/che/workspaces:/mynfs/che/workspaces
-v /mynfs/che/storage:/mynfs/che/storage

Then when Che starts a new workspace then the /mynfs/che would be mounted instead of /home/user/che?

@Spritekin
Copy link
Author

OOOOOOHHHHH... There is a CHE_HOME constant in the container!!!!
Testing...

@TylerJewell
Copy link

At this point, I really encourage you to use the CLI. There are variables that will set the values properly for you there. Vokume mounting on different OS have quirks and we handle all of those with the CLI.

@Spritekin
Copy link
Author

Ok, didn't work... I imagine is because all the che binaries are in the /home/user/che and I thought CHE_HOME was a variable to tell che where to work. Would be nice to have something like that.

I also tried the CLI and while it started the server fine, it didn't work well. Workspaces failed to start. Maybe because I'm running on CoreOS.

So for now I'm mounting an NFS in the /home/user folder to solve this part. I got other things to test.

Luck!

@TylerJewell
Copy link

The variables for the CLI are different than some of the core ones. There is no CHE_HOME. If you see a reference to that in the docs can you please point it out as it is wrong. When you have the cli installed and type "che" it will print out the available environment variables. You can then get docs for them in the configuration section of our docs.

@Spritekin
Copy link
Author

Spritekin commented Sep 27, 2016

Hi, @eivantsov, @TylerJewell

So I know it works when the /home/user/che folder is a local folder. But I want to use a mounted NFS folder so workspaces are shared between cluster nodes.

I tried this:
a. Setup an NFS in /home/user (AWS EFS to be specific)
b. Created /home/user/che
c. Run Che server. with the mounts /home/user/che/lib, /home/user/che/workspace, /home/user/che/storage (note I didn't create the folders as I expect che to create them). Che server created the lib, workspace and storage folders in /home/user/che with owner user/group 1000. Note this proved the NFS is active and writeable.
d. I used the Che UI to create a workspace for Python 3.5.
e. Che created some additional folders in lib and workspace.
f. Che responds with:

Error when starting agent
Unable to start workspace agent. Error when trying to start the workspace agent: Start of environment default failed. Error: Error response from docker API, status: 500, message: operation not supported

And the workspace log (which leads me to think its a problem in the dockerfile):

[DOCKER] latest: Pulling from codenvy/ubuntu_python 
[DOCKER] Digest: sha256:d0ddbd0bdb470427c9a57f92d74b8ff7e0d347754b900637ce8f5a82ab922c26 
[DOCKER] Status: Image is up to date for codenvy/ubuntu_python:latest 
[DOCKER] Step 1 : FROM codenvy/ubuntu_python:latest
[DOCKER] ---> 6e3ac59d6701
[ERROR] Error response from docker API, status: 500, message: operation not supported

And the Java exception in the che server:

2016-09-27 00:46:55,798[kspaceManager-0]  [ERROR] [o.e.c.a.w.s.WorkspaceRuntimes 250]   - Environment with ID 'workspace5qlu7mnwge4mzidc' is not found
org.eclipse.che.api.environment.server.exception.EnvironmentNotRunningException: Environment with ID 'workspace5qlu7mnwge4mzidc' is not found
  at org.eclipse.che.api.environment.server.CheEnvironmentEngine.stop(CheEnvironmentEngine.java:240) ~[che-core-api-workspace-4.7.2.jar:4.7.2]
  at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.start(WorkspaceRuntimes.java:248) ~[che-core-api-workspace-4.7.2.jar:4.7.2]
  at org.eclipse.che.api.workspace.server.WorkspaceManager.lambda$performAsyncStart$2(WorkspaceManager.java:650) [che-core-api-workspace-4.7.2.jar:4.7.2]
  at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:28) ~[che-core-commons-lang-4.7.2.jar:4.7.2]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_92-internal]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_92-internal]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_92-internal]
2016-09-27 00:46:55,800[kspaceManager-0]  [ERROR] [o.e.c.a.w.s.WorkspaceManager 666]    - Start of environment default failed. Error: Error response from docker API, status: 500, message: operation not supported

org.eclipse.che.api.core.ServerException: Start of environment default failed. Error: Error response from docker API, status: 500, message: operation not supported

  at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.start(WorkspaceRuntimes.java:261) ~[che-core-api-workspace-4.7.2.jar:4.7.2]
  at org.eclipse.che.api.workspace.server.WorkspaceManager.lambda$performAsyncStart$2(WorkspaceManager.java:650) ~[che-core-api-workspace-4.7.2.jar:4.7.2]
  at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:28) ~[che-core-commons-lang-4.7.2.jar:4.7.2]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_92-internal]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_92-internal]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_92-internal]

g. I enabled debug mode and debug logs (-l:DEBUG -d), no additional info was printed. At this point I would like to know which Docker operation is not supported.

h. Extracted the che.properties and changed it so it used Docker API 1.22 (it uses 1.20 by default) in the hope it is an API call available in a new version. No change.

i. Updated the properties files so runtimes are started as privileged (helps with mounting disks). Same thing.

j. Unmounted and tried in a host folder, all worked ok. Mounted and all failed again. Did this to verify my changes to the API or privileges were not the issue. I also verified that the same user 1000 and same permissions were applied to the folder.

k. Checked in the docker logs at the time of the errors. I only found:

Sep 27 10:52:33 dockerd[1212]: time="2016-09-27T10:52:33.986528563+10:00" level=error msg="Handler for POST /v1.22/containers/create returned error: operation not supported"

Still it tells me that the properties file did affect the Docker API version because in my original tests it was using v1.20

So, at this point I'm a bit confused. It should work. I will try to find in the code which command is being attempted that crashes the code when the folder is an NFS. As I said I know the NFS works because I can navigate it and the NFS can write it. Or maybe is just a missing permission problem but the point is I have no more information.

Luck!

@ghost ghost reopened this Sep 27, 2016
@ghost
Copy link

ghost commented Sep 27, 2016

@Spritekin I have found this one - http://serverfault.com/questions/763805/how-to-place-docker-images-ontop-of-an-nfs-share-in-coreos

Can we take che out of equation for a while. Can you setup NFS dirs and try to mount them manually with docker? Smth like:

docker run -ti -v $nfs:/home/user/che codenvy/ubuntu_jdk8 bash

@Spritekin
Copy link
Author

Spritekin commented Sep 28, 2016

@eivantsov
Hi,
Yes I can mount NFS folders in images and in fact I do it regularly. As I mentioned in point (c) above I successfully start the che server with the NFS mounted in the /home/user folder and all goes fine.

sudo docker run -d --net=host \
  --name che \
  -e DOCKER_HOST=tcp://$COREOS_PRIVATE_IPV4:2375 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /home/user/che/lib:/home/user/che/lib-copy \
  -v /home/user/che/workspaces:/home/user/che/workspaces \
  -v /home/user/che/storage:/home/user/che/storage \
  codenvy/che-server:nightly --remote:$COREOS_PRIVATE_IPV4 --skip:uid -l:DEBUG -d

Note I do not create the lib, workspace and storage folders. When I create a workspace (just create, not launch), Che creates all its folder structure, and it and saves the workspace inside.

So workspace creation is not the problem. The problem comes when it tries to launch the workspace in a container.

I have tracked the problem in the Java code. I also enabled the docker debug mode in order to see the json structures that Che sends to docker to start a container and it all comes down to a single flag in the mount (bind) structure. Essentially it is not accepting the volume "Z" flag when running in an NFS.
This is the JSON that che server sends to Docker in order to start the workspace/runtime container. You can see the bind parameters have the /home/user/che folders mounted as ":ro,Z"

{
  "AttachStderr":false,
  "AttachStdin":false,
  "AttachStdout":false,
  "Cmd":[],
  "CpuShares":0,
  "Entrypoint":[],
  "Env":[
    "CHE_LOCAL_CONF_DIR=/mnt/che/conf",
    "DOCKER_HOST=tcp://172.31.39.81:2375",
    "USER_TOKEN=dummy_token",
    "CHE_API_ENDPOINT=http://che-host:8080/wsmaster/api",
    "JAVA_OPTS=-Xms256m -Xmx2048m -Djava.security.egd=file:/dev/./urandom",
    "CHE_WORKSPACE_ID=workspacebdj81g3y1l90y662",
    "CHE_PROJECTS_ROOT=/projects"
  ],
  "ExposedPorts":{
    "4401/tcp":{},
    "4403/tcp":{},
    "4411/tcp":{}
  },
  "HostConfig":{
    "Binds":[
      "/home/user/che/lib/linux_amd64/terminal:/mnt/che/terminal:ro,Z",
      "/home/user/che/lib/ws-agent.tar.gz:/mnt/che/ws-agent.tar.gz:ro,Z",
      "/home/user/che/workspaces/chetest2:/projects:Z"
    ],
    "CpuShares":0,
    "ExtraHosts":["che-host:172.17.0.1"],
    "Links":[],
    "Memory":2147483648,
    "MemorySwap":-1,
    "MemorySwappiness":-1,
    "NetworkMode":"workspacebdj81g3y1l90y662_yctuuky0ek3xyn3s",
    "PortBindings":{},
    "Privileged":false,
    "PublishAllPorts":true,
    "ReadonlyRootfs":false,
    "VolumesFrom":[]
  },
  "Hostname":"",
  "Image":"eclipse-che/workspacebdj81g3y1l90y662_machinei21jwt8ahyuht2y4_che_dev-machine",
  "Labels":{},
  "NetworkDisabled":false,
  "NetworkingConfig":{
    "EndpointsConfig":{
      "workspacebdj81g3y1l90y662_yctuuky0ek3xyn3s":{
        "Aliases":["dev-machine"]
      }
    }
  },
  "OpenStdin":false,
  "StdinOnce":false,
  "Tty":false,
  "User":"",
  "Volumes":{},
  "WorkingDir":""
}

If I remove the Z and send the command manually then the workspace container is created. But then I guess it will conflict with the base che server premise that those folders may be shared (hence the Z). Also in the code there is a series of ReadWriteLocks that will be applied to those folders I guess to prevent overwrites.

Anyway, that's the thing. I have no control over that Z parameter which is preventing the workspace from starting in the NFS folder. And if I force remove the Z I have no idea of the interaction problems it may bring.

Not sure what to do now.

@ghost
Copy link

ghost commented Sep 28, 2016

@Spritekin we have added :Z suffix to fix mount issue on RHEL. Unfortunately, it is not configurable and mount binding is set in stone in code - https://github.com/eclipse/che/blob/master/plugins/plugin-docker/che-plugin-docker-machine/src/main/java/org/eclipse/che/plugin/docker/machine/ext/provider/WsAgentVolumeProvider.java#L40

I think it is possible to take it out to Che conf so that it's a user who controls it.

@Spritekin
Copy link
Author

Spritekin commented Sep 28, 2016

@eivantsov
It would be wonderful. I'm not comfortable enough with the code do do it myself. What should I do to make is a request? Should I create an issue and close this one?

@TylerJewell
Copy link

So @eivantsov - is the idea for the enhancement a way for users to override the volume mount attributes that are applied when creating workspace machine containers?

@Spritekin
Copy link
Author

@eivantsov
Ummm... sorry to ask but did you get into any decision?
Should I create an issue to request this feature?

@ghost
Copy link

ghost commented Oct 4, 2016

If @TylerJewell and @garagatyi are comfortable with it, then yes, please open a feature request.

@TylerJewell yes, the idea is to be able to provide own mount bindings in conf

@akram
Copy link

akram commented Nov 22, 2016

Hi guys,
cc @l0rd
I am facing a similar issue while trying to run che-server on OpenShift. I do not have such NFS mount stuff, however I get the same error during workspace creation which ends-up with the following error and stack trace:

Could not start workspace wksp-vv8h. Reason: Start of environment default failed. Error: Error response from docker API, status: 404, message: no such file or directory
OK

the stack trace

java.util.concurrent.ExecutionException: org.eclipse.che.api.core.ServerException: Start of environment default failed.
Error: Error response from docker API, status: 404, message: no such file or directory
	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_101]
	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_101]
	at org.eclipse.che.api.workspace.server.WorkspaceManager.lambda$performAsyncStart$1(WorkspaceManager.java:633)
~[che-core-api-workspace-5.0.0-M8-SNAPSHOT.jar:5.0.0-M8-SNAPSHOT]
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:28)
~[che-core-commons-lang-5.0.0-M8-SNAPSHOT.jar:5.0.0-M8-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_101]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
Caused by: org.eclipse.che.api.core.ServerException: Start of environment default failed. Error: Error response from
docker API, status: 404, message: no such filerror: short write

And the relevant error messages in journalctl -u docker -f:

Nov 22 07:43:30 localhost docker-current[803]: time="2016-11-22T07:43:30.862139089+01:00" level=info msg="{Action=create, LoginUID=4294967295, PID=9737}"
Nov 22 07:43:31 localhost docker-current[803]: time="2016-11-22T07:43:31.036276470+01:00" level=info msg="{Action=create, LoginUID=4294967295, PID=9737}"
Nov 22 07:43:32 localhost docker-current[803]: time="2016-11-22T07:43:32.181585785+01:00" level=info msg="{Action=tag, LoginUID=4294967295, PID=9737}"
Nov 22 07:43:32 localhost docker-current[803]: time="2016-11-22T07:43:32.184950955+01:00" level=info msg="{Action=create, LoginUID=4294967295, PID=9737}"
Nov 22 07:43:32 localhost docker-current[803]: time="2016-11-22T07:43:32.307389352+01:00" level=info msg="{Action=remove, ID=0165dad7116e4976399d9a85bfd28d5d9e55a7368bad2c3b328b292c146bdd6b, LoginUID=4294967295, PID=1804}"
Nov 22 07:43:32 localhost docker-current[803]: time="2016-11-22T07:43:32.862046177+01:00" level=error msg="Handler for POST /v1.20/containers/create returned error: no such file or directory"
Nov 22 07:43:32 localhost docker-current[803]: time="2016-11-22T07:43:32.866133203+01:00" level=info msg="{Action=remove, LoginUID=4294967295, PID=9737}"
Nov 22 07:43:32 localhost docker-current[803]: 2016-11-22 06:43:32,974[kspaceManager-8]  [ERROR] [o.e.c.a.w.s.WorkspaceManager 649]    - org.eclipse.che.api.core.ServerException: Start of environment default failed. Error: Error response from docker API, status: 404, message: no such file or directory
Nov 22 07:43:32 localhost docker-current[803]:
Nov 22 07:43:32 localhost docker-current[803]: java.util.concurrent.ExecutionException: org.eclipse.che.api.core.ServerException: Start of environment default failed. Error: Error response from docker API, status: 404, message: no such file or directory
Nov 22 07:43:32 localhost docker-current[803]:
Nov 22 07:43:32 localhost docker-current[803]:         at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_101]
Nov 22 07:43:32 localhost docker-current[803]:         at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_101]
Nov 22 07:43:32 localhost docker-current[803]:         at org.eclipse.che.api.workspace.server.WorkspaceManager.lambda$performAsyncStart$1(WorkspaceManager.java:633) ~[che-core-api-workspace-5.0.0-M8-SNAPSHOT.jar:5.0.0-M8-SNAPSHOT]

@l0rd
Copy link
Contributor

l0rd commented Nov 22, 2016

Hello @akram! Have you tried using this OpenShift template?

@akram
Copy link

akram commented Nov 22, 2016

@l0rd yep, I am using the template you are mentioning.

@akram
Copy link

akram commented Nov 22, 2016

@l0rd , you are correct, in fact, I was not using the openche.sh script and did not realise that there were the check for the Z flag that requires higher docker version or not using --selinux-enabled flag for docker. So I removed --selinux-enabled flag, and now it goes further.

For the note, I am not using NFS for storage....so, that would be great if we could document it properly

@l0rd
Copy link
Contributor

l0rd commented Nov 22, 2016

@akram that's great but beware that's under active development ;-)

Can you please open a separate issue on the template repo https://github.com/l0rd/openche providing the following info:

  1. system info: os version, openshift version, docker version
  2. your templates parameters
  3. steps needed to reproduce your error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.
Projects
None yet
Development

No branches or pull requests

4 participants