El GPI fa recerca en Processament d'Imatge i Vídeo per representació, codificació, indexació i anàlisi del contingut visual. L'expertesa del grup en Morfologia i Segmentació ha estat la base de contribucions als estàndards ISO MPEG-4 i MPEG-7. La recerca en anàlisi d’imatge li ha permès participar en projectes europeus des de 1992, als programes RACE (Morpheco, coord.), ACTS (MAVT, MoMuSys, Vidas), IST (Diceman, Hypermedia, INTERFACE, ADViSOR, MASCOT, FAETHON), en xarxes d'excel·lència (SCHEMA, SIMILAR, MUSCLE), i en projectes integrats FP6 (CHIL) i FP7 (FASCINATE). El grup ha construït dues “smart rooms” al Campus Nord de la UPC, i ha fet contribucions en anàlisi visual per interacció, així com en aplicacions d’imatge biomèdica i teledetecció. Ha signat convenis de recerca amb empreses com ara Philips (París), France Telecom (Rennes), NXP (Holanda), Thomson (Princeton, USA), Alterface (Bèlgica) i nacionals com Telefónica, CCRTV, MediaPro, Fundació CELLEX, Hospital Clínic, AD Telecom o Abertis.

El GPI investiga sobre procesamiento de imagen y vídeo por representación, codificación, indexación y análisis del contenido visual. La investigación del grupo en Morfología y Segmentación ha sido la base de contribuciones a los estándares ISO MPEG-4 y MPEG-7. La investigación en análisis de imagen le ha permitido participar en proyectos europeos desde 1992, en los programas RACE (Morpheco, coord.), ACTS (MAVT, MoMuSys, Vidas), IST (Diceman, Hypermedia, INTERFACE, Advisor, MASCOT, FAETHON ), en redes de excelencia (SCHEMA, SIMILAR, MUSCLE), y en proyectos integrados FP6 (CHIL) y FP7 (Fascinate). El grupo ha construido dos “smart romos” en el Campus Nord de la UPC, y ha hecho contribuciones en análisis visual por interacción, así como en aplicaciones de imagen biomédica y teledetección. Ha firmado convenios de investigación con empresas como Philips (París), France Telecom (Rennes), NXP (Holanda), Thomson (Princeton, USA), Alterface (Bélgica) y nacionales como Telefónica, CCRTV, MediaPro, Fundación CELLEX, Hospital Clínic, AD Telecom o Abertis.

The GPI does research on image and video processing for representing, coding, indexing and analysing visual content. The expertise of the group working on morphology and segmentation has been the basis for contributions to ISO standards MPEG-4 and MPEG-7. Research on image analysis has allowed it to participate in European projects since 1992, including the programs RACE (Morpheco, coord.), ACTS (MAVT, MoMuSys, Vidas) and IST (Diceman, Hypermedia, INTERFACE, ADViSOR, MASCOT, FAETHON), the networks of excellence SCHEMA, SIMILAR and MUSCLE and the integrated FP6 and FP7 projects CHIL and FASCINATE, respectively.

The GPI does research on image and video processing for representing, coding, indexing and analysing visual content. The expertise of the group working on morphology and segmentation has been the basis for contributions to ISO standards MPEG-4 and MPEG-7. Research on image analysis has allowed it to participate in European projects since 1992, including the programs RACE (Morpheco, coord.), ACTS (MAVT, MoMuSys, Vidas) and IST (Diceman, Hypermedia, INTERFACE, ADViSOR, MASCOT, FAETHON), the networks of excellence SCHEMA, SIMILAR and MUSCLE and the integrated FP6 and FP7 projects CHIL and FASCINATE, respectively.

Recent Submissions

  • Fruit detection in an apple orchard using a mobile terrestrial laser scanner 

    Gené-Mola, Jordi; Gregorio, Eduard; Guevara, Javier; Auat, Fernando; Sanz-Cortiella, Ricardo; Escolà, Alexandre; Llorens, Jordi; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Vilaplana Besler, Verónica; Rosell-Polo, Joan R. (Elsevier, 2019-11-01)
    Article
    Restricted access - publisher's policy
    The development of reliable fruit detection and localization systems provides an opportunity to improve the crop value and management by limiting fruit spoilage and optimised harvesting practices. Most proposed systems for ...
  • Digitally stained confocal microscopy through deep learning 

    Combalia Escudero, Marc; Pérez Ankar, Javiera; García Herrera, Adriana; Alos, Llúcia; Vilaplana Besler, Verónica; Marqués Acosta, Fernando; Puig, Susana; Malvehy, Josep (Microtome Publishing, 2019)
    Conference report
    Open Access
    Specialists have used confocal microscopy in the ex-vivo modality to identify Basal Cell Carcinoma tumors with an overall sensitivity of 96.6% and specificity of 89.2% (Chung et al., 2004). However, this technology hasn’t ...
  • KFuji RGB-DS database: Fuji apple multi-modal images for fruit detection with color, depth and range-corrected IR data 

    Gené Mola, Jordi; Vilaplana Besler, Verónica; Rosell Polo, Joan Ramon; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Gregorio, Eduard (Elsevier, 2019-07-19)
    Article
    Open Access
    This article contains data related to the research article entitle “Multi-modal Deep Learning for Fruit Detection Using RGB-D Cameras and their Radiometric Capabilities” [1]. The development of reliable fruit detection and ...
  • Wav2Pix: speech-conditioned face generation using generative adversarial networks 

    Cardoso Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual de la Puente, Santiago; Salvador Aguilera, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres Viñals, Jordi; Giró Nieto, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference lecture
    Restricted access - publisher's policy
    Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a ...
  • RVOS: end-to-end recurrent network for video object segmentation 

    Ventura, Carles; Bellver, Míriam; Girbau, Andreu; Salvador Aguilera, Amaia; Marqués Acosta, Fernando; Giró Nieto, Xavier (Computer Vision Foundation, 2019)
    Conference lecture
    Open Access
    Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. ...
  • Inverse cooking: recipe generation from food images 

    Salvador Aguilera, Amaia; Drozdzal, Michal; Giró Nieto, Xavier; Romero, Adriana (Computer Vision Foundation, 2019)
    Conference report
    Open Access
    People enjoy food photography because they appreciate food. Behind each meal there is a story described in a complex recipe and, unfortunately, by simply looking at a food image we do not have access to its preparation ...
  • Assessing knee OA severity with CNN attention-based end-to-end architectures 

    Górriz, Marc; Antony, Joseph; McGuinness, Kevin; Giró Nieto, Xavier; O'Connor, Noel (2019)
    Conference lecture
    Open Access
    This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules ...
  • One shot learning for generic instance segmentation in RGBD videos 

    Lin, Xiao; Casas Pla, Josep Ramon; Pardàs Feliu, Montse (Scitepress, 2019)
    Conference report
    Open Access
    Hand-crafted features employed in classical generic instance segmentation methods have limited discriminative power to distinguish different objects in the scene, while Convolutional Neural Networks (CNNs) based semantic ...
  • Multiresolution co-clustering for uncalibrated multiview segmentation 

    Ventura, Carles; Varas, David; Vilaplana Besler, Verónica; Giró Nieto, Xavier; Marqués Acosta, Fernando (2019-05-04)
    Article
    Restricted access - publisher's policy
    We propose a technique for coherently co-clustering uncalibrated views of a scene with a contour-based representation. Our work extends the previous framework, an iterative algorithm for segmenting sequences with small ...
  • Linking media: adopting semantic technologies for multimodal media connection 

    Fernàndez, Dèlia; Bou Balust, Elisenda; Giró Nieto, Xavier; Riviero, Juan Carlos; Espadaler, Joan; Rodríguez, David; Colom Serra, Aleix; Rimmerk, Joan Marco; Varas, David; Massuda, Issey; Roig, Carlos (CEUR-WS.org, 2018)
    Conference report
    Open Access
    Today's media and news organizations are constantly generating large amounts of multimedia content, majorly delivered online. As the online media market grows, the management and delivery of contents is becoming a challenge. ...
  • Benchmark on automatic 6-month-old infant brain segmentation algorithms: the iSeg-2017 challenge 

    Wang, Li; Nie, Dong; Li, Guannan; Casamitjana Díaz, Adrià; Vilaplana Besler, Verónica (2019-02-27)
    Article
    Restricted access - publisher's policy
    Accurate segmentation of infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is an indispensable foundation for early studying of brain growth patterns and ...
  • Measuring traffic lane-changing by converting video into space–time still images 

    Sala Sanmartí, Marcel; Soriguera Martí, Francesc; Huillca, Kevin; Vilaplana Besler, Verónica (2019-06)
    Article
    Open Access
    Empirical data is needed in order to extend our knowledge of traffic behavior. Video recordings are used to enrich typical data from loop detectors. In this context, data extraction from videos becomes a challenging task. ...

View more