Ray merge #2

eugenevinitsky · 2018-11-11T22:53:39Z

Merge the upstream master to get ray 0.5.3

…ment (ray-project#2995) * simplify vec batch requirements * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-models.rst

…ors (ray-project#2967) * update * link it * warn about truncation * fix * Update rllib-training.rst * deprecate tests failing

… of PPO (ray-project#2974) * fix * fix * fix it * propagate conf to action dist * move carla example too * rr * Update policies.py * wip * lint

…ava worker. (ray-project#3002) This fixes a bug in which Java actor methods inherit the resource requirements of the actor creation task.

* remove legacy * remove reshaper

This tests the case in which a worker is blocked in a call to ray.get or ray.wait, and then the worker dies. Then later, the object that the worker was waiting for becomes available. We need to make sure not to try to send a message to the dead worker and then die. Related to ray-project#2790.

before fix,RAY_FUN_CACHE use only get method ,can only get null fix : put after create

…in. (ray-project#2862)

…t#3018)

…ay-project#3003) Move function/actor exporting & loading code to function_manager.py to prepare the code change for function descriptor for python.

This commit fix some small defects. 1. Remove a comment that should have been removed in ray-project#3003 2. Remove `redis_protected_mode` that is never used in `ray.init()` 3. Fix `object_id_seed` that is forgotten to be passed into `ray._init()` 4. Remove several redundant brackets.

…-project#2935)

…ect#2837) * Introduce concept of resources required for placement. * Add placement resources to task spec * Update java worker * Update taskinfo.java

…ject#3029) Improve logging message when plasma store is started.

* Update rsync command * Escape rsync locations * Fix the accidental variable move * Update rsync to use -s flag

## What do these changes do? 1. Add a configuration item `driver.resource-path`. 2. Load driver resources from the local path which is specified in the `ray.conf`. Before this change, we should add all driver resources(like user's jar package, dependencies package and config files) into `classpath`. After this change, we should add the driver resources into the mount path which we can configure it in `ray.conf`, and we shouldn't configure `classpath` for driver resources any more. ## Related issue number N/A

* bugfix: env exists check error * support to avoid re-build pyarrow in project * bugfix: adapt gtest for centos lib64 * bugfix: check gtest lib exists in the directory * bugfix: find gtest with checking all libs exists * prefix RAY_ to thirdparty env variables to avoid conflicts with other module * arrow use glog from ray * change the glog and gtest install dir

This PR improves some java codes, and removes some duplicated code.

## What do these changes do? Fix the issue how we load driver resources by a specified path. Also this addressed the comments from the related PR [3044](ray-project#3044). ## Related PRs: [ray-project#3044](ray-project#3044) and [ray-project#3001](ray-project#3001).

…l plasma java lib (ray-project#3047)

…y states (ray-project#3032)

* fix er * update

…roject#2766)

…ight (ray-project#3061)

…oject#3068) ## What do these changes do? Fix the misleading comments in code for: - `EPISODES_THIS_ITER` - `EPISODES_TOTAL` Had noted it before and planned to fix it along with some other changes but seemed very relevant to stay next to ray-project#3058 so sending this now.

…e to UI. (ray-project#3397) * Saving * Fix cmake and remove object/task search boxes. * Add comment

…ray-project#3385)

* frac ppo * gpu test

* Add script for running stress tests. * Add an actor tree test where actors die with some probability * Improve test. * Small fix * Update tests. * Minor change

…ay-project#3395)

…PG algorithms (ray-project#3384)

…#3409)

…fined ObjectDirectory (ray-project#3403)

* Add regression test * Request actor creation if no actor location found * Comments * Address comments * Increase test timeout * Trigger test

* batch norm * lint * fix dqn/ddpg update ops * bn model * Update tf_policy_graph.py * Update multi_gpu_impl.py * Apply suggestions from code review Co-Authored-By: ericl <[email protected]>

* Adding logo to readme * Updating link * Add badge * Addressing comments * Moving logo * Change align * Move image

…t#3448) This includes a fix so the TensorFlow op releases memory properly (apache/arrow#3061) and the possibility to store arrow data structures in plasma (apache/arrow#2832). ray-project#3404

ray.wait depends on callbacks from the GCS to decide when an object has appeared in the cluster. The raylet crashes if a callback is received for a wait request that has already completed, but this actually can happen, depending on the order of calls. More precisely: 1. Objects A and B are put in the cluster. 2. Client calls ray.wait([A, B], num_returns=1). 3. Client subscribes to locations for A and B. Locations are cached for both, so callbacks are posted for each. 4. Callback for A fires. The wait completes and the request is removed. 5. Callback for B fires. The wait request no longer exists and raylet crashes.

AboudyKreidieh

This is too long for me to actually review. Should i look at any files specifically? If you haven't made any changes though, then LGTM

cathywu · 2018-12-26T01:09:49Z

The master branch is out of date with this ray_merge branch, can we update it? @eugenevinitsky

This is actually the cause of my issues with flow-project/flow#338, in which is that flow-project/ray:master is incompatible with the rllib_visualizer in flow:master.

ericl and others added 30 commits September 30, 2018 18:36

[rllib] Default to truncate_episodes and add some more config validat…

e4bea8d

…ors (ray-project#2967) * update * link it * warn about truncation * fix * Update rllib-training.rst * deprecate tests failing

[rllib] Propagate model options correctly in ARS / ES, to action dist…

b45bed4

… of PPO (ray-project#2974) * fix * fix * fix it * propagate conf to action dist * move carla example too * rr * Update policies.py * wip * lint

[Java] Fix the required-resources issue of actor member function in J…

fcef4ed

…ava worker. (ray-project#3002) This fixes a bug in which Java actor methods inherit the resource requirements of the actor creation task.

[rllib] Remove legacy multiagent support (ray-project#2975)

2019b41

* remove legacy * remove reshaper

fix bug: (ray-project#3000)

9c606ea

before fix,RAY_FUN_CACHE use only get method ,can only get null fix : put after create

Change logfile names and also allow plasma store socket to be passed …

cc7e2ec

…in. (ray-project#2862)

Update links to use latest 0.5.3 wheels instead of 0.5.2. (ray-projec…

d73ee36

…t#3018)

Move function/actor exporting & loading code to function_manager.py (r…

9948e8c

…ay-project#3003) Move function/actor exporting & loading code to function_manager.py to prepare the code change for function descriptor for python.

Suppress errors when worker or driver intentionally disconnects. (ray…

01bb073

…-project#2935)

Introduce concept of resources required for placing a task. (ray-proj…

faa31ae

…ect#2837) * Introduce concept of resources required for placement. * Add placement resources to task spec * Update java worker * Update taskinfo.java

[tune/core] Use Global State API for resources (ray-project#3004)

0651d3b

[core] Improve logging message when plasma store is started. (ray-pro…

ecd8f39

…ject#3029) Improve logging message when plasma store is started.

Bug/log syncer fails with parentheses (ray-project#2653)

2d35a97

* Update rsync command * Escape rsync locations * Fix the accidental variable move * Update rsync to use -s flag

Fix the uniqueId toString format. (ray-project#3035)

ef1f2fd

[Java] Improve some Java code (ray-project#3040)

4a2ed47

This PR improves some java codes, and removes some duplicated code.

[tune] Tweaks to Trainable and Verbosity (ray-project#2889)

f9b58d7

move make clean before cmake command, avoid always running mvn instal…

87639b9

…l plasma java lib (ray-project#3047)

[rllib] Add unit test and some better error messages for custom polic…

473ee4e

…y states (ray-project#3032)

[rllib] Don't crash printing out error message (ray-project#3054)

866c7a5

* fix er * update

[tune] Fix misleading comment (ray-project#3058)

4dc78b7

[rllib] Parallel-data loading and multi-gpu support for IMPALA (ray-p…

3c891c6

…roject#2766)

[rllib] Add more warnings when multi-agent envs might not be set up r…

6240ccb

…ight (ray-project#3061)

[Java] Add jvm-parameters in Config. (ray-project#3065)

64e5eb3

robertnishihara and others added 24 commits November 25, 2018 10:16

UI changes, fix the task timeline and add the object transfer timelin…

0f0099f

…e to UI. (ray-project#3397) * Saving * Fix cmake and remove object/task search boxes. * Add comment

[autoscaler] Allow more than 5s from node creation to first heartbeat (…

aa94d3d

…ray-project#3385)

[rllib] PPO doesn't work with fractional num gpus (ray-project#3396)

e3c088f

* frac ppo * gpu test

Add script for running stress tests. (ray-project#3378)

20b8b1d

* Add script for running stress tests. * Add an actor tree test where actors die with some probability * Improve test. * Small fix * Update tests. * Minor change

Move setproctitle to ray[debug] package (ray-project#3415)

0d56fc1

Don't put entire actor registry in debug string since it's too long (r…

c2108ca

…ay-project#3395)

[rllib] example and docs on how to use parametric actions with DQN / …

f0df97d

…PG algorithms (ray-project#3384)

[autoscaler] Update autoscaler to use heartbeat batches. (ray-project…

82863b5

…#3409)

Initialize client_id_ in ObjectManager constructor that takes user-de…

139fbf7

…fined ObjectDirectory (ray-project#3403)

Click 0.7 changes the naming convention for commands; fix this

c46ea2f

Automatically indent tune logger params (ray-project#3399)

7e319db

Remove: duplicate feed_dict constructing (ray-project#3431)

fd7e494

Fault tolerance for actor creation (ray-project#3422)

48a5935

* Add regression test * Request actor creation if no actor location found * Comments * Address comments * Increase test timeout * Trigger test

Ship Modin with Ray. (ray-project#3109)

4d2010a

[rllib] Support batch norm layers (ray-project#3369)

07d8cbf

* batch norm * lint * fix dqn/ddpg update ops * bn model * Update tf_policy_graph.py * Update multi_gpu_impl.py * Apply suggestions from code review Co-Authored-By: ericl <[email protected]>

Use actor ID for the dummy object (ray-project#3437)

447604a

[docs] Snippet did not have a code-block tag above it (ray-project#3442)

454d3aa

Update readme to contain logo (ray-project#3443)

5751261

* Adding logo to readme * Updating link * Add badge * Addressing comments * Moving logo * Change align * Move image

Bump version from 0.5.3 to 0.6.0. (ray-project#3420)

0603e0b

Add stress test for Java worker (ray-project#3424)

abd37df

Upgrade Arrow to include Plasma TensorFlow Op release fix (ray-projec…

c5b5cda

…t#3448) This includes a fix so the TensorFlow op releases memory properly (apache/arrow#3061) and the possibility to store arrow data structures in plasma (apache/arrow#2832). ray-project#3404

Update README.rst with 0.6.0 version number. (ray-project#3453)

13c8ce4

merged in ray 0.6

dcdc839

AboudyKreidieh self-assigned this Dec 3, 2018

AboudyKreidieh approved these changes Dec 3, 2018

View reviewed changes

eugenevinitsky closed this Dec 9, 2018

cathywu reopened this Dec 26, 2018

cathywu mentioned this pull request Dec 26, 2018

Small fix to rllib visualizer flow-project/flow#338

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray merge #2

Ray merge #2

eugenevinitsky commented Nov 11, 2018

AboudyKreidieh left a comment

cathywu commented Dec 26, 2018 •

edited

Loading

Ray merge #2

Are you sure you want to change the base?

Ray merge #2

Conversation

eugenevinitsky commented Nov 11, 2018

AboudyKreidieh left a comment

Choose a reason for hiding this comment

cathywu commented Dec 26, 2018 • edited Loading

cathywu commented Dec 26, 2018 •

edited

Loading