Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to VORC #19

Merged
merged 7 commits into from
Nov 20, 2020
Merged

Port to VORC #19

merged 7 commits into from
Nov 20, 2020

Conversation

mabelzhang
Copy link
Collaborator

@mabelzhang mabelzhang commented Nov 11, 2020

Dependent on osrf/vrx#228 and osrf/vorc#30.

The repository has been adapted to VORC, to live in a new branch.
All the individual scripts and multi-scripts ran.

Things in this PR that are different from the main branch:

  • Updated README for VORC
  • Fixed window size issue in video recording to use --windowid, so the evaluator does not need to manually tweak x y width height for recordmydesktop
  • Removed VRX requirement of sensor config and wamv URDF config. prepare_team.bash still generates an empty file with the team name, because one of the scripts looks at the file names to determine the list of teams.
  • task_config YAML files added new parameters for VORC (dependent on the PR in vrx).
    Note 1: Only trial 0 for each task is updated with coordinates for VORC. I haven’t had time to customize subsequent trials.
    Note 2: Gymkhana will need a new YAML file.
  • Removed generated/, since we don’t have permanent example files yet. Once we do, we can add them back, and probably add the directory to .gitignore, so our local files don’t continuously get committed

Issues

There may still be intermittent seg faults. If you see them, please let me know when it happens... It sometimes happens, but hasn't in my last runs, so... I don't know if they're fixed or not.

After realizing I can set gui:=true in the server Docker (duh) to debug visually, I saw that the boat was actually in the world, contrary to what I saw in the GUI that the video recording script had spun up, which actually requires a workspace to also exist on the host machine. That had a number of things broken because I don’t usually develop on my host machine.

(That itself is a huge problem, because it leads to inconsistencies between what is run in the actual competition server Dockerfile, and what is being recorded in the video - in some arbitrary environment on some evaluator’s own host machine, which could be very different from the reference environment in the server Docker. The whole point of having a Dockerfile is to have everything consistent, and videos should really be recorded from a window in Docker, as opposed to from the host machine. That really needs to be fixed.)

Other than that, there are a number of things that need to be more rigorous and follow good practices. I’m going to open followup issues for them.

Once those issues are cleaned up, similar to the video problem, things will be less error-prone, and there will be a lot less hair to pull.

To test

Follow the README :) Or this shorter version below.

First, I recommend going into vorc_server/vorc-server/run_vorc_trial.sh, and setting gui:=true in the roslaunch vorc_gazebo evaluation.launch line.
This will help the reviewer (and help me) know that the competition run really works for everyone.

Then, build the server Docker (-n for NVIDIA):

$ ./vorc_server/build_image.bash -n

Single scripts:

In the trials, please zoom out in the Gazebo GUI (build Dockerfile with gui:=true, see above), make sure the marina shows up, the robot and task objects show up, and everything looks normal.

Currently, only trial 0 objects are customized to VORC world coordinates. You can try trial 1+, but things probably won’t look right.

With the ghostship solution specific to VORC, when the task starts, you should see the robot moving forward.
With the example_team and example_team_2 solutions specific to VRX (we will remove once we have more examples), nothing will happen, but things should still run.

$ ./prepare_team.bash ghostship

$ ./prepare_task_trials.bash perception
$ ./prepare_task_trials.bash stationkeeping
$ ./prepare_task_trials.bash wayfinding

# Each of these will open the Gazebo GUI from the server Docker container.
# Please inspect that things show up correctly.
$ ./run_trial.bash -n ghostship stationkeeping 0
$ ./run_trial.bash -n ghostship wayfinding 0
$ ./run_trial.bash -n ghostship perception 0

# Make sure this runs all the way and you get a video.
# This will run, but the robot doesn’t show up for my host machine workspace. We need to fix video recording to record from Docker.
$ ./generate_trial_video.bash example_team stationkeeping 0

Batch scripts:

Note that example_team and example_team2 won’t be able to move CoRa, since they’re set up to send commands to WAM-V topics.

$ ./multi_scripts/prepare_all_teams.bash

$ ./multi_scripts/prepare_all_task_trials.bash

$ ./multi_scripts/run_one_team_one_task.bash -n example_team stationkeeping
$ ./multi_scripts/run_one_team_all_tasks.bash -n example_team

# I have run this one all the way. It will take a long time to finish.
$ ./multi_scripts/run_all_teams_all_tasks.bash -n

$ ./multi_scripts/generate_one_team_one_task_videos.bash example_team example_task
# etc.

@mabelzhang mabelzhang requested a review from caguero November 11, 2020 04:35
@mabelzhang
Copy link
Collaborator Author

@crvogt

Minor issue with the ghostship example solution.

On termination, I'm getting a Traceback:

$ ./run_trial.bash -n ghostship stationkeeping 0 
...
---------------------------------
Creating container for crvogt/ghostship:v1

rosmaster already running
/root
gzserver shut down
OK

Killing rosnodes
[ WARN] [1605066935.331941432, 320.000000000]: Shutdown request received.
[ WARN] [1605066935.331989814, 320.000000000]: Reason given for shutdown: [user request]
Starting node!!!
shutdown request: user request
Traceback (most recent call last):
  File "basic_node.py", line 42, in <module>
    sn.sendCmds()
  File "basic_node.py", line 38, in sendCmds
    self.rate.sleep()
  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/timer.py", line 103, in sleep
    sleep(self._remaining(curr_time))
  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/timer.py", line 165, in sleep
    raise rospy.exceptions.ROSInterruptException("ROS shutdown request")
rospy.exceptions.ROSInterruptException: ROS shutdown request
killing:
 * /gazebo_gui
 * /record_1605066612195773050
 * /rosout
 * /send_commands
killed
[ INFO] [1605066612.215335434]: Subscribing to /vorc/task/info
[ INFO] [1605066613.132581061, 0.036000000]: Recording to '/home/master/vorc_rostopics.bag'.
Killing roslaunch pid: 55
OK

Trial ended. Logging data
---------------------------------

For reference, with example_team solution, I get no Traceback between the Reason given for shutdown: [user request] and killing: lines:

---------------------------------
Creating container for tylerlum/vrx-competitor-example:v2.2019

Running /move_forward.sh
gzserver shut down
OK

Killing rosnodes
[ WARN] [1605060003.425509787, 50.001000000]: Shutdown request received.
[ WARN] [1605060003.425572958, 50.001000000]: Reason given for shutdown: [user request]
[ INFO] [1605059951.272962663]: Subscribing to /vorc/task/info
[ INFO] [1605059952.211171918, 0.019000000]: Recording to '/home/master/vorc_rostopics.bag'.
killing:
 * /gazebo_gui
 * /record_1605059951250528346
 * /rosout
 * /rostopic_29_1605059957326
 * /rostopic_30_1605059957325
killed
shutdown request: user request
shutdown request: user request
Killing roslaunch pid: 56
OK

Trial ended. Logging data
---------------------------------

Probably just some termination cleanup issue. Could you look into it? Not a big problem but it looks cleaner.

@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Nov 11, 2020

I created a meta-ticket #20 tracking the followup issues.
I don't think I'll have time to address all of them by myself. Help is appreciated.

The "VORC Essentials" anyone can do. It's really just playing around with the world and figuring out where to put buoys etc. Fun task. I'm happy to hand it off to someone else.

The "Infrastructure" items I'd really like to get fixed. It will make my life easier and the code more rigorous (which currently bothers me).

@crvogt
Copy link
Collaborator

crvogt commented Nov 11, 2020

@mabelzhang It must be that my node doesn't handle shutdown well and I couldn't reproduce the output, so I simplified to a bash script publishing on the cora thrust command topic. I'll check out the VORC essentials and make a note of which ones I'm handling.

@mabelzhang
Copy link
Collaborator Author

Do you think it's worthwhile to commit the example solution to the repo as well? While I was testing, I wondered a few times what is the content of example_team and wanted to just change the topic names to VORC ones, but I had no access to any code. I think it'd be helpful. For the shutdown handling it'd be helpful as well to use as reference.

@crvogt
Copy link
Collaborator

crvogt commented Nov 12, 2020

I took a look at both of the example_team* content (you can view it while the container is running with docker exec -it <container_name> bash) wondering the same thing. It's exactly the script from the tutorial page https://github.com/osrf/vrx/wiki/tutorials-Creating%20a%20Dockerhub%20image%20for%20submission. No actual node running.

I'm still curious how to properly handle shutdown scenarios and if they were handled with vrx or if the traceback occurred for each team (that presumably didn't know how to handle it). I'll see if I can figure it out going ahead because I think it would be helpful.

@mabelzhang
Copy link
Collaborator Author

Looks like there's is a basic_node.py Python script? The rospy.exceptions.ROSInterruptException looks like the shutdown signal needs to be handled via a try-catch, something like

    while not rospy.is_shutdown():
        try:
            ...
            rate.sleep()
        except rospy.ROSInterruptException:
	    break

@crvogt
Copy link
Collaborator

crvogt commented Nov 16, 2020

Ah, ok, I'll give that a try right now. Thanks!

@crvogt
Copy link
Collaborator

crvogt commented Nov 16, 2020

Added the try/except. The output looks nicer on mine, let me know if it makes a difference for you (it's uploaded).

EDIT: Trying it with the vorc-docker branch now

@crvogt
Copy link
Collaborator

crvogt commented Nov 16, 2020

So! I ran it with the wayfinding task and got a relatively clean output. I get a few Cannot kill container ... is not running messages, but otherwise nothing like you posted. Specifically, getting:

Killing rosnodes
[ WARN] [1605558728.533756556, 320.000000000]: Shutdown request received.
[ WARN] [1605558728.533790352, 320.000000000]: Reason given for shutdown: [user request]
Starting node!!!
shutdown request: user request
Complete
killing:
 * /gazebo_gui
 * /record_1605558403275765041
 * /rosout
 * /send_commands
killed
[ INFO] [1605558403.290691459]: Subscribing to /vorc/task/info
[ INFO] [1605558403.785919027, 0.070000000]: Recording to '/home/localadmin/vorc_rostopics.bag'.
Killing roslaunch pid: 55
OK

Trial ended. Logging data
---------------------------------
Copying ROS log files from server container...
OK

Creating text file for trial score
Successfully recorded trial score in /home/localadmin/vorc_ws/src/vrx-docker/utils/../generated/logs/ghostship/wayfinding/0/trial_score.txt
OK

Copying ROS log files from competitor container...
OK

Killing containers
Killing any running Docker containers matching 'vorc-competitor-*'...
Error response from daemon: Cannot kill container: 59351dbe9017: Container 59351dbe901765ddda9f2609ce3d943a925b187d7094200dd43a46c2cc455251 is not running
Removing any Docker containers matching 'vorc-competitor-*'...
59351dbe9017
Killing any running Docker containers matching 'vorc-server-*'...
Error response from daemon: Cannot kill container: c214e26e8455: Container c214e26e8455f02bd65cc290a8e77769bbef730f1b5e16bd9a5396210f7df0f4 is not running
Removing any Docker containers matching 'vorc-server-*'...
c214e26e8455
Done.

so you're suggestion looks like a good way forward for shutdown handling.

On a side note, I really struggled getting everything working with my Docker environment. I would receive the following error:

+ docker run --name vorc-server-system -e XAUTHORITY=/tmp/.docker.xauth --env=DISPLAY --env=QT_X11_NO_MITSHM=1 -v /tmp/.docker.xauth:/tmp/.docker.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro -v /tmp/.docker.xauth:/tmp/.docker.xauth -v /dev/log:/dev/log -v /dev/input:/dev/input --runtime=nvidia --privileged --security-opt seccomp=unconfined -u 1000:1000 --net vorc-network --ip 172.16.1.22 -v /home/localadmin/vorc_ws/src/vrx-docker/generated/task_generated/wayfinding:/task_generated -v /home/localadmin/vorc_ws/src/vrx-docker/generated/logs/ghostship/wayfinding/0:/vorc/logs -e ROS_MASTER_URI=http://172.16.1.22:11311 -e ROS_IP=172.16.1.22 -e VRX_EXIT_ON_COMPLETION=true -e VRX_DEBUG=false vorc-server-melodic-nvidia:latest /run_vorc_trial.sh /task_generated/worlds/wayfinding0.world /vorc/logs
docker: Error response from daemon: network vorc-network not found.
ERRO[0000] error waiting for container: context canceled 

This error is covered in maxking/docker-mailman#85 and I believe occurs when the Docker networks persist even after the containers are killed. docker network rm <network> solved this and allowed me to run the trials. If we have a troubleshooting page, it might be good to add this (unless it's common knowledge and I'm out of the loop).

@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Nov 18, 2020

Thanks for trying it out! I tried out the new ghostship solution, and the output is clean now.

The Cannot kill container I think is normal. I think it's just cleaning up and killing everything just in case some containers weren't killed cleanly, so it prints it if the container had already been killed cleanly. That's my guess.

Hmm I've seen that vorc-network not found before too, but I don't remember seeing it for vrx-network. I think the time I saw it was when I already had vrx-network running on the same IP, right before I changed the string to vorc-network, so I had to remove the vrx one manually before the vorc-network could use the same IP. I have not seen it with vorc-network in my latest runs.

Normally, in utils/vorc_network.bash, the code already took care of when a network already exists. It runs docker network rm ${NETWORK} before calling docker network create.

If that network message still happens intermittently, maybe something else still needed to be updated for VORC, but I'm not finding any other vrx things remaining...

@crvogt
Copy link
Collaborator

crvogt commented Nov 18, 2020

It's possible then that I was running vrx-docker before switching to vorc-docker. This could have led to vrx-network being on the IP that vorc-network was meant to be on?

@mabelzhang
Copy link
Collaborator Author

Yeah they are using the exact same IP and subnet mask, only the name is different. So that could have been it.

@crvogt
Copy link
Collaborator

crvogt commented Nov 18, 2020

Whoops! Thanks for straightening that out for me :)

@mabelzhang
Copy link
Collaborator Author

Update about seg fault:
It is still happening - I see the seg fault in verbose_output.txt, but the scoring plugin keeps printing afterwards.
I ran the perception task a few times, sometimes it seg faulted, sometimes it didn't. I watched the GUI, and even when it did seg fault, all 3 prescribed buoys appeared, and the GUI doesn't close until the user requested shutdown. There was a trial score.
So I don't know the implications of the seg fault. I'll keep looking.

Copy link
Contributor

@caguero caguero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manage to run all three individual tasks. Gazebo was terminated at the end of the run and I got scores in all of them. It looks good to me.

README.md Outdated Show resolved Hide resolved
@caguero
Copy link
Contributor

caguero commented Nov 19, 2020

Update about seg fault:
It is still happening - I see the seg fault in verbose_output.txt, but the scoring plugin keeps printing afterwards.
I ran the perception task a few times, sometimes it seg faulted, sometimes it didn't. I watched the GUI, and even when it did seg fault, all 3 prescribed buoys appeared, and the GUI doesn't close until the user requested shutdown. There was a trial score.
So I don't know the implications of the seg fault. I'll keep looking.

Could that segfault occur while trying to shutdown Gazebo? This is an old issue that happens sometimes in Gazebo. In any case, it doesn't seem to affect.

@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Nov 20, 2020

Maybe? The weird thing is that the segmentation fault printout is not at the very end, but near the beginning or in the middle, before the rest of the scoring plugin printouts. Though it could be a difference in when things are flushed.

@mabelzhang
Copy link
Collaborator Author

@crvogt How much work is it to create a second solution Docker image, just so that we have more than one team to test the multi-scripts? It could just be something trivial again, perhaps the boat moving backwards, i.e. "ghostship is back"...... or something more creative.

(I've deleted example_team and example_team_2 from vrx because they don't publish anything to the vorc topics.)

@mabelzhang
Copy link
Collaborator Author

I'm going to merge this now since it's been approved, so that we have a base to run the competition. Additions and fixes can be in followup PRs. I know we have at least 2 PRs coming up.

@mabelzhang mabelzhang merged commit 9ac3612 into vorc Nov 20, 2020
@mabelzhang mabelzhang deleted the vorc-docker branch November 20, 2020 05:59
@crvogt
Copy link
Collaborator

crvogt commented Nov 23, 2020

@crvogt How much work is it to create a second solution Docker image, just so that we have more than one team to test the multi-scripts? It could just be something trivial again, perhaps the boat moving backwards, i.e. "ghostship is back"...... or something more creative.

Should only take a few minutes! (new employee orientation on Friday so didn't get a chance to implement). "pihstsohg"? :D

@crvogt
Copy link
Collaborator

crvogt commented Nov 24, 2020

@mabelzhang Added a new team. dockerhub_image.txt should be crvogt/shipghost:v1

@mabelzhang
Copy link
Collaborator Author

mabelzhang commented Nov 24, 2020

Thanks! It's working for me. I'll open a new PR and add you as reviewer.
Didn't go with the Dutch name huh? Or I guess in this case it's Gaelic.

@crvogt
Copy link
Collaborator

crvogt commented Nov 25, 2020

Ahaha, it's been anglicized :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants