-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imitation learning with dagger #906
base: master
Are you sure you want to change the base?
Conversation
<?xml version="1.0" encoding="UTF-8"?> | ||
<module type="PYTHON_MODULE" version="4"> | ||
<component name="NewModuleRootManager"> | ||
<content url="file://$MODULE_DIR$" /> | ||
<orderEntry type="jdk" jdkName="Python 3.6 (flow)" jdkType="Python SDK" /> | ||
<orderEntry type="sourceFolder" forTests="false" /> | ||
</component> | ||
<component name="PyDocumentationSettings"> | ||
<option name="format" value="PLAIN" /> | ||
<option name="myDocStringFormat" value="Plain" /> | ||
</component> | ||
</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please remove this file.
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to commit this file?
"""Multi-agent I-210 example. | ||
Trains a non-constant number of agents, all sharing the same policy, on the | ||
highway with ramps network. | ||
""" | ||
import os | ||
import numpy as np | ||
|
||
from ray.tune.registry import register_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems identical to existing code?
""" | ||
# Implementation in Tensorflow | ||
|
||
def __init__(self, veh_id, action_network, multiagent, car_following_params=None, time_delay=0.0, noise=0, fail_safe=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring so we can know what action_network is.
with tf.variable_scope(policy_scope, reuse=tf.AUTO_REUSE): | ||
self.build_network() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need an AUTO_REUSE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put an AUTO_REUSE here so that the same variables will be reused when the graph is rerun (so copies of the variables (weights/biases) don't get recreated)
self.action_predictions = pred_action | ||
print("TYPE: ", type(self.obs_placeholder)) | ||
|
||
if self.inject_noise == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: conventionally you don't need to check a bool like this
Defines input, output, and training placeholders for neural net | ||
""" | ||
self.obs_placeholder = tf.placeholder(shape=[None, self.obs_dim], name="obs", dtype=tf.float32) | ||
self.action_placeholder = tf.placeholder(shape=[None, self.action_dim], name="action", dtype=tf.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for stochastic algorithms, they are parametrized by a mean and standard deviation of a gaussian that you sample from. It'd be cool to add this as an option here so we can use PPO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This current implementation can be used for deterministic algorithms like DDPG and TD3 which is great
if len(observation.shape)<=1: | ||
observation = observation[None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good check!
# network expects an array of arrays (matrix); if single observation (no batch), convert to array of arrays | ||
if len(observation.shape)<=1: | ||
observation = observation[None] | ||
ret_val = self.sess.run([self.action_predictions], feed_dict={self.obs_placeholder: observation})[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make it clear here that this is returning 1 accel and will not operate correctly if you pass a batch
* deleting unworking params from SumoChangeLaneParams * deleted unworking params, sublane working in highway : * moved imports inside functions * Apply suggestions from code review * bug fixes * bug fix Co-authored-by: Aboudy Kreidieh <[email protected]>
* added function to kernel/vehicle to get number of not departed vehiles * fixed over indentation of the docstring * indentation edit * pep8 Co-authored-by: AboudyKreidieh <[email protected]>
* changed _departed_ids, and _arrived_ids in the update function * fixed bug in get_departed_ids and get_arrived_ids
Time-Space Diagram greyed regions
Add accel penalty, stop penalty, mpg reward, and ability to compute reward for any vehicles upstream of you (i.e. make you less greedy and more social)
* New energy class to inventory multiple energy models Co-authored-by: Joy Carpio <[email protected]>
* Add time-space diagram plotting to experiment.py
* prereq dict added to query * prereq checking mechanism implemented, not tested yet * prereq checking tested * change to more flexible filter handling * make safety_rate and safety_max_value floats * ignore nulls in fact_top_scores * fix typo * remove unneeded import * replace uneccessary use of list to set * add queries to pre-bin histogram data * fix the serialization issue with set, convert to list before write as json * fix query * fix query * fixed query bug Co-authored-by: liljonnystyle <[email protected]>
* update tacoma power demand query, meters/Joules -> mpg conversion
* fix some implementation errors in energy models * pull i210_dev and fix flake8
Add --multi_node flag
* implement HighwayNetwork for Time-Space Diagrams (#979) * fixed h-baselines bug (#982) * Replicated changes in 867. Done bug (#980) * Aimsun changes minus reset * removed crash attribute * tensorflow 1.15.2 * merge custom output and failsafes to master (#981) * add write_to_csv() function to master * include pipeline README.md * add data pipeline __init__ * add experiment.py changes * add write_to_csv() function to master * change warning print to ValueError message * update to new update_accel methods * add display_warnings boolean * add get_next_speed() function to base vehicle class * revert addition of get_next_speed * merge custom output and failsafes to master * add write_to_csv() function to master * add display_warnings boolean * add get_next_speed() function to base vehicle class * revert addition of get_next_speed * revert change to get_feasible_action call signature * change print syntax to be python3.5 compliant * add tests for new failsafe features * smooth default to True * rearrange raise exception for test coverage * moved simulation logging to the simulation kernel (#991) * add 210 edgestarts for backwards compatibility (#985) * fastforward PR 989 * fix typo * Requirements update (#963) * updated requirements.txt and environment.yml * Visualizer tests fixes * remove .func * move all miles_per_* rewards to instantaneous_mpg * update reward fns to new get_accel() method * made tests faster * some fixes to utils * change the column order, modify the pipeline to use SUMO emission file * write metadata to csv * change apply_acceleration smoothness setting * make save_csv return the file paths Co-authored-by: AboudyKreidieh <[email protected]> Co-authored-by: liljonnystyle <[email protected]> Co-authored-by: Kathy Jang <[email protected]> Co-authored-by: Nathan Lichtlé <[email protected]> Co-authored-by: akashvelu <[email protected]> Co-authored-by: Brent Zhao <[email protected]>
* refactor tsd to allow for axes offsets * update time-space plotter unit tests
Pull request information
Description
Adds functionality to do imitation learning (with DAgger), to train a model to imitate an expert.