Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

57.066 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Estimating position & velocity in 3D space from monocular video sequences using a deep neural network

Thumbnail
View/Open
final_paper_iccv_acrv__14_08_2017.pdf (1,293Mb)
Share:
 
 
10.1109/ICCVW.2017.173
 
  View Usage Statistics
Cita com:
hdl:2117/114873

Show full item record
Marbán González, Arturo
Srinivasan, Vignesh
Samek, Wojciech
Fernández Ruzafa, JoséMés informacióMés informacióMés informació
Casals Gelpi, AliciaMés informacióMés informacióMés informació
Document typeConference report
Defense date2018
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
This work describes a regression model based on Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks for tracking objects from monocular video sequences. The target application being pursued is Vision-Based Sensor Substitution (VBSS). In particular, the tool-tip position and velocity in 3D space of a pair of surgical robotic instruments (SRI) are estimated for three surgical tasks, namely suturing, needle-passing and knot-tying. The CNN extracts features from individual video frames and the LSTM network processes these features over time and continuously outputs a 12-dimensional vector with the estimated position and velocity values. A series of analyses and experiments are carried out in the regression model to reveal the benefits and drawbacks of different design choices. First, the impact of the loss function is investigated by adequately weighing the Root Mean Squared Error (RMSE) and Gradient Difference Loss (GDL), using the VGG16 neural network for feature extraction. Second, this analysis is extended to a Residual Neural Network designed for feature extraction, which has fewer parameters than the VGG16 model, resulting in a reduction of ~96.44 % in the neural network size. Third, the impact of the number of time steps used to model the temporal information processed by the LSTM network is investigated. Finally, the capability of the regression model to generalize to the data related to "unseen" surgical tasks (unavailable in the training set) is evaluated. The aforesaid analyses are experimentally validated on the public dataset JIGSAWS. These analyses provide some guidelines for the design of a regression model in the context of VBSS, specifically when the objective is to estimate a set of 1D time series signals from video sequences.
CitationMarbán, A., Srinivasan, V., Samek, W., Fernández, J., Casals, A. Estimating position & velocity in 3D space from monocular video sequences using a deep neural network. A: IEEE International Conference on Computer Vision Workshops. "2017 IEEE International Conference on Computer Vision Workshops (ICCVW)". Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 1460-1469. 
URIhttp://hdl.handle.net/2117/114873
DOI10.1109/ICCVW.2017.173
ISBN978-1-5386-1034-3
Publisher versionhttp://ieeexplore.ieee.org/document/8265383/
Collections
  • Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial - Ponències/Comunicacions de congressos [1.391]
  • GRINS - Grup de Recerca en Robòtica Intel·ligent i Sistemes - Ponències/Comunicacions de congressos [58]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
final_paper_iccv_acrv__14_08_2017.pdf1,293MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina