Now showing items 1-12 of 291

  • Assessing knee OA severity with CNN attention-based end-to-end architectures 

    Górriz, Marc; Antony, Joseph; McGuinness, Kevin; Giró Nieto, Xavier; O'Connor, Noel (2019)
    Conference lecture
    Open Access
    This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules ...
  • One shot learning for generic instance segmentation in RGBD videos 

    Lin, Xiao; Casas Pla, Josep Ramon; Pardàs Feliu, Montse (Scitepress, 2019)
    Conference report
    Open Access
    Hand-crafted features employed in classical generic instance segmentation methods have limited discriminative power to distinguish different objects in the scene, while Convolutional Neural Networks (CNNs) based semantic ...
  • Linking media: adopting semantic technologies for multimodal media connection 

    Fernàndez, Dèlia; Bou Balust, Elisenda; Giró Nieto, Xavier; Riviero, Juan Carlos; Espadaler, Joan; Rodríguez, David; Colom Serra, Aleix; Rimmerk, Joan Marco; Varas, David; Massuda, Issey; Roig, Carlos (CEUR-WS.org, 2018)
    Conference report
    Open Access
    Today's media and news organizations are constantly generating large amounts of multimedia content, majorly delivered online. As the online media market grows, the management and delivery of contents is becoming a challenge. ...
  • SLAM-based 3D outdoor reconstructions from lidar data 

    Caminal Colell, Ivan; Casas Pla, Josep Ramon; Royo Royo, Santiago (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Open Access
    The use of depth (RGBD) cameras to reconstruct large outdoor environments is not feasible due to lighting conditions and low depth range. LIDAR sensors can be used instead. Most state of the art SLAM methods are devoted ...
  • PathGAN: visual scanpath prediction with generative adversarial networks 

    Assens, Marc; Giró Nieto, Xavier; McGuinness, Kevin; O'Connor, Noel (Springer, 2019)
    Conference lecture
    Restricted access - publisher's policy
    We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its ...
  • Collaborative voting of 3D features for robust gesture estimation 

    van Sabben Alsina, Daniel; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Casas Pla, Josep Ramon (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference lecture
    Open Access
    Human body analysis raises special interest because it enables a wide range of interactive applications. In this paper we present a gesture estimator that discriminates body poses in depth images. A novel collaborative ...
  • Cross-modal embeddings for video and audio retrieval 

    Surís Coll-Vinent, Dídac; Duarte, Amanda; Salvador Aguilera, Amaia; Torres Viñals, Jordi; Giró Nieto, Xavier (Springer, 2019)
    Conference report
    Open Access
    In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projecting the audio and visual features into a common feature space, to obtain joint audio-visual embeddings. These links are used ...
  • Action tube extraction based 3D-CNN for RGB-D action recognition 

    Xu, Zhengyu; Vilaplana Besler, Verónica; Morros Rubió, Josep Ramon (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Open Access
    In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial ...
  • UPC multimodal speaker diarization system for the 2018 Albayzin challenge 

    India Massana, Miquel Àngel; Sagastiberri, Itziar; Palau Puigdevall, Ponç; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, ...
  • Shared latent structures between imaging features and biomarkers in early stages of Alzheimer's disease 

    Casamitjana Díaz, Adrià; Vilaplana Besler, Verónica; Petrone, Paula; Molinuevo, Jose Luis; Gispert, Juan Domingo (Springer International Publishing, 2018)
    Conference lecture
    Restricted access - publisher's policy
    In this work, we identify meaningful latent patterns in MR images for patients across the Alzheimer’s disease (AD) continuum. For this purpose, we apply Projection to Latent Structures (PLS) method using cerebrospinal fluid ...
  • Leishmaniasis parasite segmentation and classification using deep learning 

    Górriz, Marc; Aparicio, Albert; Raventós, Berta; Vilaplana Besler, Verónica; Sayrol Clols, Elisa; López Codina, Daniel (Springer, 2018)
    Conference lecture
    Restricted access - publisher's policy
    Leishmaniasis is considered a neglected disease that causes thousands of deaths annually in some tropical and subtropical countries. There are various techniques to diagnose leishmaniasis of which manual microscopy is ...
  • Monte-Carlo sampling applied to multiple instance learning for histological image classification 

    Combalia, Marc; Vilaplana Besler, Verónica (Springer, 2018)
    Conference lecture
    Restricted access - publisher's policy
    We propose a patch sampling strategy based on a sequential Monte-Carlo method for high resolution image classification in the context of Multiple Instance Learning. When compared with grid sampling and uniform sampling ...