-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
70 additions
and
180 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,220 +1,110 @@ | ||
organization: OMRON SINIC X | ||
twitter: '@omron_sinicx' | ||
title: 'MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics' | ||
conference: IJCAI2020 | ||
title: 'SliceIt! - A Dual Simulator Framework for Learning Robot Food Slicing' | ||
conference: ICRA2024 | ||
resources: | ||
paper: https://arxiv.org/abs/1909.13111 | ||
code: https://github.com/omron-sinicx/multipolar | ||
video: https://www.youtube.com/embed/adUnIj83RtU | ||
blog: https://medium.com/sinicx/multipolar-multi-source-policy-aggregation-for-transfer-reinforcement-learning-between-diverse-bc42a152b0f5 | ||
description: explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. | ||
image: https://omron-sinicx.github.io/multipolar/assets/teaser.png | ||
url: https://omron-sinicx.github.io/multipolar | ||
speakerdeck: b7a0614c24014dcbbb121fbb9ed234cd | ||
paper: http://arxiv.org/abs/2404.02569 | ||
code: https://github.com/omron-sinicx/sliceit | ||
description: A real2sim2real approach to learning robotic food slicing tasks with rigid robot manipulators | ||
image: system_overview.png | ||
url: https://omron-sinicx.github.io/sliceit | ||
authors: | ||
- name: Mohammadamin Barekatain* | ||
affiliation: [1, 2] | ||
url: http://barekatain.me/ | ||
position: intern | ||
- name: Ryo Yonetani | ||
- name: Cristian C. Beltran-Hernandez | ||
affiliation: [1] | ||
position: principal investigator | ||
url: https://yonetaniryo.github.io/ | ||
url: http://cristianbehe.me/ | ||
position: Senior Researcher | ||
- name: Nicola Erbetti* | ||
affiliation: [1] | ||
position: Intern | ||
- name: Masashi Hamaya | ||
affiliation: [1] | ||
position: senior researcher | ||
position: Principal Investigator | ||
url: https://sites.google.com/view/masashihamaya/home | ||
# - name: Mai Nishimura | ||
# affiliation: [1] | ||
# url: https://denkiwakame.github.io | ||
# - name: Asako Kanezaki | ||
# affiliation: [2] | ||
# url: https://kanezaki.github.io/ | ||
contact_ids: ['github', 'omron', 2] #=> github issues, [email protected], 2nd author | ||
contact_ids: ['github', 'omron', 1] #=> github issues, [email protected], 2nd author | ||
affiliations: | ||
- OMRON SINIC X Corporation | ||
- Technical University of Munich | ||
meta: | ||
- '* work done as an intern at OMRON SINIC X.' | ||
bibtex: > | ||
# arXiv version | ||
@article{barekatain2019multipolar, | ||
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics}, | ||
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi}, | ||
journal={arXiv preprint arXiv:1909.13111}, | ||
year={2019} | ||
} | ||
# IJCAI version | ||
@inproceedings{barekatain2020multipolar, | ||
title={MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics}, | ||
author={Barekatain, Mohammadamin and Yonetani, Ryo and Hamaya, Masashi}, | ||
booktitle={International Joint Conference on Artificial Intelligence (IJCAI)}, | ||
year={2020} | ||
@article{beltran2024sliceit, | ||
title={SliceIt!--A Dual Simulator Framework for Learning Robot Food Slicing}, | ||
author={Beltran-Hernandez, Cristian C and Erbetti, Nicolas and Hamaya, Masashi}, | ||
journal={arXiv preprint arXiv:2404.02569}, | ||
year={2024} | ||
} | ||
overview: | | ||
Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. | ||
However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. | ||
In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. | ||
To address this problem, the proposed approach, **MULTI-source POLicy AggRegation (MULTIPOLAR)**, comprises two key techniques. | ||
We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. | ||
Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy”s expressiveness even when some of the source policies perform poorly. | ||
We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. | ||
Cooking robots can enhance the home experience by reducing the burden of daily chores. However, these robots must perform their tasks dexterously and safely in shared human environments, especially when handling dangerous tools such as kitchen knives. This study focuses on enabling a robot to autonomously and safely learn food-cutting tasks. More specifically, our goal is to enable a collaborative robot or industrial robot arm to perform food-slicing tasks by adapting to varying material properties using compliance control. Our approach involves using Reinforcement Learning (RL) to train a robot to compliantly manipulate a knife, by reducing the contact forces exerted by the food items and by the cutting board. However, training the robot in the real world can be inefficient, and dangerous, and result in a lot of food waste. Therefore, we proposed SliceIt!, a framework for safely and efficiently learning robot food-slicing tasks in simulation. Following a real2sim2real approach, our framework consists of collecting a few real food slicing data, calibrating our dual simulation environment (a high-fidelity cutting simulator and a robotic simulator), learning compliant control policies on the calibrated simulation environment, and finally, deploying the policies on the real robot. | ||
method: | ||
- title: subsection 1 | ||
image: method.png | ||
- title: Sim2Real2Sim Approach | ||
text: > | ||
**test text with unicode characters:** α, β, φ, ψ | ||
- title: subsection 2 | ||
image: null | ||
Our framework follows a cyclic paradigm that begins with data collection from the real world and culminates in deploying learned policies back onto physical robotic systems. The four stages are: | ||
- Data Collection: We start by gathering real-world slicing data from a few representative food items with varying material properties. This data serves as the ground truth for calibrating our simulations. | ||
- Simulation Calibration: Using optimization techniques, we fine-tune the parameters of the cutting simulator (DiSECt) to accurately replicate the real-world slicing dynamics observed in the collected data. | ||
- Policy Learning: Within our calibrated dual simulation environment, we use reinforcement learning to train control policies that enable the robot to perform compliant and adaptive slicing motions. | ||
- Real-World Deployment: Finally, the learned policies are transferred to our actual robotic platform, allowing for safe and efficient execution of slicing tasks in the real world. | ||
- title: Overall Framework | ||
image: system_diagram.png | ||
text: > | ||
**test text with TeX characters:** $\alpha$, $\beta$, $\phi$, $\psi \\$ | ||
see how it renders with $\KaTeX$. | ||
$$ E = mc^2$$ | ||
$$ \int \oint \sum \prod $$ | ||
$$ \begin{CD} A @>a>> B \\ @VbVV @AAcA \\ C @= D \end{CD} $$ | ||
- title: null | ||
image: method.png | ||
The core of our approach is the dual simulation environment, which combines two simulators running in parallel: | ||
- The Cutting Simulator (CutSim): We utilize DiSECt, a differentiable physics simulator tailored for cutting soft materials. DiSECt augments the finite element method with continuous contact and damage models, enabling realistic simulation of food cutting. We calibrate DiSECt's simulation parameters using a two-step optimization process: first, a non-gradient method quickly identifies initial parameters, followed by gradient-based fine-tuning with the Adam algorithm. | ||
- The Robotic Simulator (RoboSim): We employ the Gazebo simulator, an open-source robotics simulator compatible with the Robot Operating System (ROS). RoboSim is computationally less expensive than CutSim, allowing for faster simulation. | ||
- title: Learning Compliance Control | ||
image: rl-compliance-controller.png | ||
text: > | ||
This is a multi-line text example. | ||
"> - Flow Style" converts newlines to spaces. | ||
Using >, newline characters are converted to spaces. | ||
Newline characters and indentation are handled appropriately, and the text is represented as a single line. | ||
It's suitable when you want to collapse multi-line text into a single line, such as in configurations or descriptions where readability is key. | ||
- text: | | ||
This is a multi-line | ||
text example. | ||
"| - Block Style" preserves newlines and indentation. | ||
Using |, you can represent multi-line text that includes newline characters. | ||
Newline characters are preserved exactly as they are, along with the block's indentation. | ||
It's suitable when maintaining newlines and indentation is important, such as preserving the structure of code or prose. | ||
Our method integrates a reinforcement learning agent with a forward dynamics compliance controller (FDCC). The FDCC provides Cartesian compliance control by combining principles of impedance, admittance, and force control. | ||
During training, our RL agent learns two key elements simultaneously: | ||
1. The motion trajectory for the slicing task, essentially learning the optimal reference trajectory to provide to the compliance controller. | ||
2. The optimal control parameters for the FDCC itself, such as the Cartesian stiffness, and PD gain values. | ||
This allows the agent to learn a policy that produces compliant slicing motions while automatically tuning the compliance controller's behavior. | ||
- title: Real-World Robotic Platform | ||
image: real_system.jpg | ||
text: > | ||
Our robotic platform consists of a dual-arm system with two Universal Robots arms UR5e. One arm acts as a supporting gripper to hold the food item in place, while the other arm executes the cutting task using a custom knife gripper attachment. | ||
results: | ||
- text: | | ||
### Motion Planning (MP) Dataset | ||
markdown version | ||
|Method|Opt|Exp|Hmean| | ||
|--|--|--|--| | ||
|BF| 65.8 (63.8, 68.0)| 44.1 (42.8, 45.5) | 44.8 (43.4, 46.3)| | ||
|WA*| 68.4 (66.5, 70.4)| 35.8 (34.5, 37.1) | 40.4 (39.0, 41.8)| | ||
|**Neural A*** | **87.7 (86.6, 88.9)**| 40.1 (38.9, 41.3) | 52.0 (50.7, 53.3)| | ||
We compared the performance of our method against a baseline approach that used only the Gazebo simulator for training, without the high-fidelity DiSECt cutting simulator. | ||
The key evaluation metric was the contact force exerted during the slicing action, measured by the force-torque sensor on the robot's end-effector. Minimizing this contact force is crucial for safe and precise robotic cutting. | ||
<h3>Motion Planning (MP) Dataset</h3> | ||
<p>HTML version</p> | ||
<div class="uk-overflow-auto"> | ||
<table class="uk-table uk-table-small uk-text-small uk-table-divider"> | ||
<thead> | ||
<tr> | ||
<th>Method</th> | ||
<th>Opt</th> | ||
<th>Exp</th> | ||
<th>Hmean</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td> | ||
BF | ||
<br /> | ||
WA* | ||
</td> | ||
<td> | ||
65.8 (63.8, 68.0) | ||
<br /> | ||
68.4 (66.5, 70.4) | ||
</td> | ||
<td> | ||
44.1 (42.8, 45.5) | ||
<br /> | ||
35.8 (34.5, 37.1) | ||
</td> | ||
<td> | ||
44.8 (43.4, 46.3) | ||
<br /> | ||
40.4 (39.0, 41.8) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td> | ||
SAIL | ||
<br /> | ||
SAIL-SL | ||
<br /> | ||
BB-A* | ||
</td> | ||
<td> | ||
5.7 (4.6, 6.8) | ||
<br /> | ||
3.1 (2.3, 3.8) | ||
<br /> | ||
31.2 (28.8, 33.5) | ||
</td> | ||
<td> | ||
58.0 (56.1, 60.0) | ||
<br /> | ||
57.6 (55.7, 59.6) | ||
<br /> | ||
52.0 (50.2, 53.9) | ||
</td> | ||
<td> | ||
7.7 (6.4, 9.0) | ||
<br /> | ||
4.4 (3.5, 5.3) | ||
<br /> | ||
31.1 (29.2, 33.0) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td> | ||
Neural BF | ||
<br /> | ||
<b>Neural A*</b> | ||
</td> | ||
<td> | ||
75.5 (73.8, 77.1) | ||
<br /> | ||
<b>87.7 (86.6, 88.9)</b> | ||
</td> | ||
<td> | ||
45.9 (44.6, 47.2) | ||
<br /> | ||
40.1 (38.9, 41.3) | ||
</td> | ||
<td> | ||
52.0 (50.7, 53.4) | ||
<br /> | ||
52.0 (50.7, 53.3) | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
</div> | ||
<h3>Selected Path Planning Results</h3> | ||
<p>dummy text</p> | ||
<img | ||
src="assets/result1.png" | ||
class="uk-align-center uk-responsive-width" | ||
alt="" | ||
/> | ||
<h3>Path Planning Results on SSD Dataset</h3> | ||
<p>dummy text</p> | ||
The results show that our approach consistently applied significantly lower contact forces during slicing for every single trial. | ||
<img | ||
src="assets/result2.png" | ||
class="uk-align-center uk-responsive-width" | ||
alt="" | ||
/> | ||
Notably, our method demonstrated remarkable skill in adapting to abrupt stiffness transitions, such as when transitioning from slicing a soft vegetable to contacting the stiff cutting board. This smooth modulation of cutting motions minimized excessive impact forces, a key factor in ensuring safe and precise robotic slicing. | ||
demo: | ||
- mp4: result1.mp4 | ||
text: demo text1 demo text1 demo text1 | ||
- mp4: dual_sim_x5.mp4 | ||
text: RL training on dual simulation environment | ||
scale: 100% | ||
- mp4: real_carrot.mp4 | ||
text: Slicing policy on real robotic system | ||
scale: 100% | ||
- mp4: real_potato.mp4 | ||
text: Slicing policy on real robotic system | ||
scale: 100% | ||
- mp4: result1.mp4 | ||
text: demo text2 demo text2 demo text2 | ||
- mp4: real_tomato.mp4 | ||
text: Slicing policy on real robotic system | ||
scale: 100% | ||
- mp4: result1.mp4 | ||
text: demo text3 demo text3 demo text3 | ||
scale: 80% |