Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container flaps between 'Staging' and 'Running' => can't download a package #47

Open
tnolet opened this issue Jul 31, 2014 · 4 comments

Comments

@tnolet
Copy link

tnolet commented Jul 31, 2014

Hi,

starting a simple container using Marathon/Deimos fails because for some reason it is fails on not being able to fetch a .jar file hosted on S3. The docker image is download correctly by a slave from the public docker repo, and can be run manually with no problems.
The app inside the container is a simple 'hello-world' type java app.

Details:

mesos: 0.19.1
deimos: 0.4.0
marathon: 0.6.0-1.0
ubuntu: 14.04 trusty

docker image: tnolet/hello1
Dockerfile:

FROM ubuntu:latest

MAINTAINER Tim Nolet

RUN apt-get update -y

RUN apt-get install -y --no-install-recommends openjdk-7-jre

ENV JAVA_HOME /usr/lib/jvm/java-7-openjdk-amd64

RUN apt-get install -y curl

RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/dropwizard-0.0.1-SNAPSHOT.jar

RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/hello-world.yml

EXPOSE 8080

EXPOSE 8081

ENV SERVICE hello:0.0.1:8080:8081

CMD java -jar dropwizard-0.0.1-SNAPSHOT.jar server hello-world.yml

task file:

{
    "container": {
    "image": "docker:///tnolet/hello1",
    "options" : []
  },
  "id": "hello1",
  "instances": "1",
  "cpus": ".5",
  "mem": "512",
  "uris": [],
  "cmd": ""
}

Error in stderr in mesos gui:
Error: Unable to access jarfile dropwizard-0.0.1-SNAPSHOT.jar

output from mesos.slave-INFO on slave:

I0731 13:09:21.673143  8814 slave.cpp:1664] Got registration for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.673703  8814 slave.cpp:1783] Flushing queued task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framewor
k 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.695307  8814 slave.cpp:2018] Handling status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab o
f framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678
I0731 13:09:21.695582  8814 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d
-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.695897  8814 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a0
8d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to [email protected]:5050
I0731 13:09:21.696854  8815 slave.cpp:2145] Sending acknowledgement for status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-
a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678
I0731 13:09:21.702631  8812 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a
08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:21.859962  8816 slave.cpp:2355] Monitoring executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' in container 
'e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb'
I0731 13:09:22.687067  8813 slave.cpp:2018] Handling status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of
 framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678
I0731 13:09:22.698246  8811 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-
0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:22.699434  8811 status_update_manager.cpp:373] Forwarding status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to [email protected]:5050
I0731 13:09:22.700186  8811 slave.cpp:2145] Sending acknowledgement for status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678
I0731 13:09:22.709666  8815 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.060930  8814 slave.cpp:933] Got assigned task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.061293  8814 slave.cpp:1043] Launching task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.063863  8815 external_containerizer.cpp:433] Launching container 'cfb86a26-2821-49ce-95a0-3e4d0dfd8657'
I0731 13:09:23.080337  8814 slave.cpp:1153] Queuing task 'hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab' for executor hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab of framework '20140731-110416-606019500-5050-1090-0000
E0731 13:09:23.859387  8811 slave.cpp:2397] Termination of executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' failed: External containerizer failed (status: 1)
I0731 13:09:23.859632  8811 slave.cpp:2552] Cleaning up executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000
I0731 13:09:23.860239  8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab/runs/e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb' for gc 6.99999004926815days in the future
I0731 13:09:23.860345  8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' for gc 6.99999004838222days in the future
I0731 13:09:23.871316  8816 external_containerizer.cpp:1040] Killed the following process tree/s:
[ 

]

Again. running the following command on the slave manually starts up the container with no problems:
sudo docker run -d -P tnolet/hello1

@tnolet
Copy link
Author

tnolet commented Jul 31, 2014

I found the cause for this behaviour. This flapping happens when artifacts or executables inside docker containers are not referenced by their full path name. Because deimos adds the -w /tmp/mesos-sandbox switch for the working directory in Docker, all paths are off...

Not sure if this is a bug or just something people should be aware of.

@tnolet
Copy link
Author

tnolet commented Aug 4, 2014

This is similar to #49

@solidsnack
Copy link
Contributor

I'm just not sure what the right thing to do is. Deimos puts URLs from the Mesos task in a directory which it mounts at /tmp/mesos-sandbox so tasks can find the downloaded contents. It seems reasonable to set the working directory to that directory, too, so that frameworks which are unaware of Docker will still find the URLs they expect.

There is a patch under #49 to acknowledge the WORKDIR directive but I do wonder if there is a better policy in general.

Having Deimos dump the URLs in the "right place" could perhaps be accomplished by:

  • Downloading the files
  • Building a new image that inherits from the original one, using RUN to copy the downloaded files to .
  • Running the new image

Hopefully ENTRYPOINT and CMD and all that would be preserved in the new image.

@tnolet
Copy link
Author

tnolet commented Aug 5, 2014

I see the problem. I guess if everyone is fully aware that this is happening, there is not a big problem. Making your paths and urls fully qualified isn't always a nice way of handling things, but there are ways around it and in the end it's not a biggie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants