Ordinal inverse reinforcement learning applied to robot learning with small data

Colomé Figueras, Adrià; Torras, Carme

doi:10.1109/IROS47612.2022.9981731

Visualitza/Obre

2646-Ordinal-Inverse-Reinforcement-Learning-Applied-to-Robot-Learning-with-Small-Data.pdf (2,173Mb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Colomé Figueras, Adrià

Torras, Carme

Tipus de documentText en actes de congrés

Data publicació2022

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial (embargat fins 2024-12-26)

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCLOTHILDE - CLOTH manIpulation Learning from DEmonstrations (EC-H2020-741930)

Abstract

Over the last decade, the ability to teach actions to robots in a user-friendly way has gained relevance, and a practical way of teaching robots a new task is to use Inverse Reinforcement Learning (IRL). In IRL, an expert teacher shows the robot a desired behaviour and an agent builds a model of the reward. The agent can also infer a policy that performs in an optimal way within the limitations of the knowledge provided to it. However, most IRL approaches assume an (almost) optimal performance of the teaching agent, which might become unpractical if the teacher is not actually an expert. In addition, most IRL focus on discrete state-action spaces that limit their applicability to certain real-world problems such as within the context of direct Policy Search (PS) reinforcement learning. Therefore, in this paper we introduce Ordinal Inverse Reinforcement Learning (OrdIRL) for continuous state variables, in which the teacher can qualitatively evaluate robot performance by selecting one among the predefined performance levels (e.g. {bad, medium, good} for three tiers of performance). Once the OrdIRL has fit an ordinal distribution to the data, we propose to use Bayesian Optimization (BO) to either gain knowledge on the inferred model (exploration) or find a policy or action that maximizes the expected reward given the prior knowledge on the reward (exploitation). In the case of large-dimensional state-action spaces, we use Dimensionality Reduction (DR) techniques and perform the BO in the latent space. Experimental results on simulation and with a robot arm show how this approach allows for learning the reward function with small data.

Descripció

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

CitacióColome, A.; Torras, C. Ordinal inverse reinforcement learning applied to robot learning with small data. A: IEEE/RSJ International Conference on Intelligent Robots and Systems. "2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan". Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 2490-2496. ISBN 978-1-6654-7927-1. DOI 10.1109/IROS47612.2022.9981731.

URIhttp://hdl.handle.net/2117/385279

DOI10.1109/IROS47612.2022.9981731

ISBN978-1-6654-7927-1

Versió de l'editorhttps://ieeexplore.ieee.org/document/9981731

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
2646-Ordinal-In ... arning-with-Small-Data.pdf		2,173Mb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Ordinal inverse reinforcement learning applied to robot learning with small data

Visualitza/Obre

Explora