Robot Learning from Human Demonstrations
Fitxers
Títol de la revista
ISSN de la revista
Títol del volum
Autors
Correu electrònic de l'autor
Tutor / director
Tribunal avaluador
Realitzat a/amb
Tipus de document
Data
Condicions d'accés
item.page.rightslicense
Publicacions relacionades
Datasets relacionats
Projecte CCD
Abstract
Learning by demonstration is one of the most efficient methods humans use to acquire new skills. Applying this paradigm to robotics offers a path toward faster and more scalable skill acquisition, by enabling robots to learn directly from human behavior. Yet, a fundamental challenge persists: the embodiment gap between humans and robots. Differences in morphology, actuation, and control systems make it difficult for robots to directly interpret and replicate human actions. Alternatives to solving robot learning either rely on large-scale teleoperated data collection, which is costly and time-consuming, or simulation-based training, which suffers from sim-to-real transfer challenges and requires significant engineering effort to create realistic environments. This dissertation introduces a data-efficient framework for robot learning that directly leverages human demonstrations, building on recent advances in 3D perception and visual understanding from real-world imagery and videos. The first contribution proposes a 3D hand estimation method that extends state-of-the-art vision-language and segmentation models to extract accurate geometric representations of human hands from RGB-D data. Using this foundation, we then develop a highly sample-efficient grasping method that enables robots to learn where and how to grasp objects from just a single video demonstration, achieving robust generalization across new environments. Finally, we extend this capability to complex, long-horizon tasks by learning generative models of short-term behavior, providing effective intermediate representations that support end-to-end learning systems. These contributions demonstrate that robots can efficiently acquire skills directly from human demonstrations, paving the way toward more natural, scalable, and generalizable robot learning.

