Now showing items 1-12 of 297

  • Video object linguistic grounding 

    Herrera-Palacio, Alba; Ventura, Carles; Giró Nieto, Xavier (Association for Computing Machinery (ACM), 2019)
    Conference lecture
    Restricted access - publisher's policy
    The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance ...
  • Multi-view 3D face reconstruction in the wild using siamese networks 

    Ramon, Eduard; Escur, Janna; Giró Nieto, Xavier (Computer Vision Foundation, 2019)
    Conference report
    Open Access
    In this work, we present a novel learning based approach to reconstruct 3D faces from a single or multiple images. Our method uses a simple yet powerful architecture based on siamese neural networks that helps to extract ...
  • Digitally stained confocal microscopy through deep learning 

    Combalia Escudero, Marc; Pérez Ankar, Javiera; García Herrera, Adriana; Alos, Llúcia; Vilaplana Besler, Verónica; Marqués Acosta, Fernando; Puig, Susana; Malvehy, Josep (Microtome Publishing, 2019)
    Conference report
    Open Access
    Specialists have used confocal microscopy in the ex-vivo modality to identify Basal Cell Carcinoma tumors with an overall sensitivity of 96.6% and specificity of 89.2% (Chung et al., 2004). However, this technology hasn’t ...
  • Wav2Pix: speech-conditioned face generation using generative adversarial networks 

    Cardoso Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual de la Puente, Santiago; Salvador Aguilera, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres Viñals, Jordi; Giró Nieto, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference lecture
    Restricted access - publisher's policy
    Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a ...
  • RVOS: end-to-end recurrent network for video object segmentation 

    Ventura, Carles; Bellver, Míriam; Girbau, Andreu; Salvador Aguilera, Amaia; Marqués Acosta, Fernando; Giró Nieto, Xavier (Computer Vision Foundation, 2019)
    Conference lecture
    Open Access
    Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. ...
  • Inverse cooking: recipe generation from food images 

    Salvador Aguilera, Amaia; Drozdzal, Michal; Giró Nieto, Xavier; Romero, Adriana (Computer Vision Foundation, 2019)
    Conference report
    Open Access
    People enjoy food photography because they appreciate food. Behind each meal there is a story described in a complex recipe and, unfortunately, by simply looking at a food image we do not have access to its preparation ...
  • Assessing knee OA severity with CNN attention-based end-to-end architectures 

    Górriz, Marc; Antony, Joseph; McGuinness, Kevin; Giró Nieto, Xavier; O'Connor, Noel (2019)
    Conference lecture
    Open Access
    This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules ...
  • One shot learning for generic instance segmentation in RGBD videos 

    Lin, Xiao; Casas Pla, Josep Ramon; Pardàs Feliu, Montse (Scitepress, 2019)
    Conference report
    Open Access
    Hand-crafted features employed in classical generic instance segmentation methods have limited discriminative power to distinguish different objects in the scene, while Convolutional Neural Networks (CNNs) based semantic ...
  • Linking media: adopting semantic technologies for multimodal media connection 

    Fernàndez, Dèlia; Bou Balust, Elisenda; Giró Nieto, Xavier; Riviero, Juan Carlos; Espadaler, Joan; Rodríguez, David; Colom Serra, Aleix; Rimmerk, Joan Marco; Varas, David; Massuda, Issey; Roig, Carlos (CEUR-WS.org, 2018)
    Conference report
    Open Access
    Today's media and news organizations are constantly generating large amounts of multimedia content, majorly delivered online. As the online media market grows, the management and delivery of contents is becoming a challenge. ...
  • SLAM-based 3D outdoor reconstructions from lidar data 

    Caminal Colell, Ivan; Casas Pla, Josep Ramon; Royo Royo, Santiago (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Open Access
    The use of depth (RGBD) cameras to reconstruct large outdoor environments is not feasible due to lighting conditions and low depth range. LIDAR sensors can be used instead. Most state of the art SLAM methods are devoted ...
  • PathGAN: visual scanpath prediction with generative adversarial networks 

    Assens, Marc; Giró Nieto, Xavier; McGuinness, Kevin; O'Connor, Noel (Springer, 2019)
    Conference lecture
    Restricted access - publisher's policy
    We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its ...
  • Collaborative voting of 3D features for robust gesture estimation 

    van Sabben Alsina, Daniel; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Casas Pla, Josep Ramon (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference lecture
    Open Access
    Human body analysis raises special interest because it enables a wide range of interactive applications. In this paper we present a gesture estimator that discriminates body poses in depth images. A novel collaborative ...