<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Collection:</title>
    <link>http://hdl.handle.net/2117/3333</link>
    <description />
    <pubDate>Wed, 22 May 2013 01:04:07 GMT</pubDate>
    <dc:date>2013-05-22T01:04:07Z</dc:date>
    <itunes:owner>
      <itunes:email>webmaster.bupc@upc.edu</itunes:email>
      <itunes:name>Universitat Politècnica de Catalunya. Servei de Biblioteques i Documentació</itunes:name>
    </itunes:owner>
    <itunes:explicit>no</itunes:explicit>
    <itunes:keywords />
    <item>
      <title>Gene expression data classification combining hierarchical representation and efficient feature selection</title>
      <link>http://hdl.handle.net/2117/18425</link>
      <description>Title: Gene expression data classification combining hierarchical representation and efficient feature selection
Authors: Bosio, Mattia; Bellot Pujalte, Pau; Salembier Clairon, Philippe Jean; Oliveras Vergés, Albert
Abstract: A general framework for microarray data classification is proposed in this paper. It pro-&#xD;
duces precise and reliable classifiers through a two-step approach. At first, the original&#xD;
feature set is enhanced by a new set of features called metagenes. These new features&#xD;
are obtained through a hierarchical clustering process on the original data. Two different&#xD;
metagene generation rules have been analyzed, called Treelets clustering and Euclidean&#xD;
clustering. Metagenes creation is attractive for several reasons: first, they can improve&#xD;
the classification since they broaden the available feature space and capture the com-&#xD;
mon behavior of similar genes reducing the residual measurement noise. Furthermore,&#xD;
by analyzing some of the chosen metagenes for classification with gene set enrichment&#xD;
analysis algorithms, it is shown how metagenes can summarize the behavior of func-&#xD;
tionally related probe sets. Additionally, metagenes can point out, still undocumented,&#xD;
highly discriminant probe sets numerically related to other probes endowed with prior&#xD;
biological information in order to contribute to the knowledge discovery process.&#xD;
The second step of the framework is the feature selection which applies the Improved&#xD;
Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from&#xD;
the available feature set for classification composed of genes and metagenes. Considering&#xD;
the microarray sample scarcity problem, besides the classical error rate, a reliability&#xD;
measure is introduced to improve the feature selection process. Different scoring schemes&#xD;
are studied to choose the best one using both error rate and reliability. The Linear&#xD;
Discriminant Analysis classifier (LDA) has been used throughout this work, due to its&#xD;
good characteristics, but the proposed framework can be used with almost any classifier.&#xD;
The potential of the proposed framework has been evaluated analyzing all the publicly&#xD;
available datasets offered by the Micro Array Quality Control Study, phase II (MAQC).&#xD;
The comparative results showed that the proposed framework can compete with a wide&#xD;
variety of state of the art alternatives and it can obtain the best mean performance&#xD;
if a particular setup is chosen. A Monte Carlo simulation confirmed that the proposed&#xD;
framework obtains stable and repeatable results.</description>
      <pubDate>Tue, 19 Mar 2013 18:10:53 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18425</guid>
      <dc:date>2013-03-19T18:10:53Z</dc:date>
      <itunes:author>Bosio, Mattia; Bellot Pujalte, Pau; Salembier Clairon, Philippe Jean; Oliveras Vergés, Albert</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords>feature selection, hierarchical representation, LDA, metagenes, Microarray classification, Treelets</itunes:keywords>
      <itunes:summary>A general framework for microarray data classification is proposed in this paper. It pro-&#xD;
duces precise and reliable classifiers through a two-step approach. At first, the original&#xD;
feature set is enhanced by a new set of features called metagenes. These new features&#xD;
are obtained through a hierarchical clustering process on the original data. Two different&#xD;
metagene generation rules have been analyzed, called Treelets clustering and Euclidean&#xD;
clustering. Metagenes creation is attractive for several reasons: first, they can improve&#xD;
the classification since they broaden the available feature space and capture the com-&#xD;
mon behavior of similar genes reducing the residual measurement noise. Furthermore,&#xD;
by analyzing some of the chosen metagenes for classification with gene set enrichment&#xD;
analysis algorithms, it is shown how metagenes can summarize the behavior of func-&#xD;
tionally related probe sets. Additionally, metagenes can point out, still undocumented,&#xD;
highly discriminant probe sets numerically related to other probes endowed with prior&#xD;
biological information in order to contribute to the knowledge discovery process.&#xD;
The second step of the framework is the feature selection which applies the Improved&#xD;
Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from&#xD;
the available feature set for classification composed of genes and metagenes. Considering&#xD;
the microarray sample scarcity problem, besides the classical error rate, a reliability&#xD;
measure is introduced to improve the feature selection process. Different scoring schemes&#xD;
are studied to choose the best one using both error rate and reliability. The Linear&#xD;
Discriminant Analysis classifier (LDA) has been used throughout this work, due to its&#xD;
good characteristics, but the proposed framework can be used with almost any classifier.&#xD;
The potential of the proposed framework has been evaluated analyzing all the publicly&#xD;
available datasets offered by the Micro Array Quality Control Study, phase II (MAQC).&#xD;
The comparative results showed that the proposed framework can compete with a wide&#xD;
variety of state of the art alternatives and it can obtain the best mean performance&#xD;
if a particular setup is chosen. A Monte Carlo simulation confirmed that the proposed&#xD;
framework obtains stable and repeatable results.</itunes:summary>
    </item>
    <item>
      <title>Processing multidimensional SAR and hyperspectral images with binary partition tree</title>
      <link>http://hdl.handle.net/2117/18321</link>
      <description>Title: Processing multidimensional SAR and hyperspectral images with binary partition tree
Authors: Alonso González, Alberto; Valero, Silvia; Chanussot, Jocelyn; López Martínez, Carlos; Salembier Clairon, Philippe Jean
Abstract: The current increase of spatial as well as spectral&#xD;
resolutions of modern remote sensing sensors represents a&#xD;
real opportunity for many prac&#xD;
tical applications but also&#xD;
generates important challenges in terms of image processing.&#xD;
In particular, the spatial correlation between pixels and/or the&#xD;
spectral correlation between spectral bands of a given pixel&#xD;
cannot be ignored. The traditional pixel-based representation&#xD;
of images does not facilitate the handling of these correlations.&#xD;
In this paper, we discuss the inter&#xD;
est of a particular hierarchical&#xD;
region-based representation of images based on binary&#xD;
partition tree (BPT). This representation approach is very&#xD;
flexible as it can be applied to any type of image. Here both&#xD;
optical and radar images will be discussed. Moreover, once the&#xD;
image representation is computed, it can be used for many&#xD;
different applications. Filtering, segmentation, and classifica-&#xD;
tion will be detailed in this paper. In all cases, the interest of the&#xD;
BPT representation over the classical pixel-based representa-&#xD;
tion will be highlighted</description>
      <pubDate>Thu, 14 Mar 2013 19:10:03 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18321</guid>
      <dc:date>2013-03-14T19:10:03Z</dc:date>
      <itunes:author>Alonso González, Alberto; Valero, Silvia; Chanussot, Jocelyn; López Martínez, Carlos; Salembier Clairon, Philippe Jean</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>The current increase of spatial as well as spectral&#xD;
resolutions of modern remote sensing sensors represents a&#xD;
real opportunity for many prac&#xD;
tical applications but also&#xD;
generates important challenges in terms of image processing.&#xD;
In particular, the spatial correlation between pixels and/or the&#xD;
spectral correlation between spectral bands of a given pixel&#xD;
cannot be ignored. The traditional pixel-based representation&#xD;
of images does not facilitate the handling of these correlations.&#xD;
In this paper, we discuss the inter&#xD;
est of a particular hierarchical&#xD;
region-based representation of images based on binary&#xD;
partition tree (BPT). This representation approach is very&#xD;
flexible as it can be applied to any type of image. Here both&#xD;
optical and radar images will be discussed. Moreover, once the&#xD;
image representation is computed, it can be used for many&#xD;
different applications. Filtering, segmentation, and classifica-&#xD;
tion will be detailed in this paper. In all cases, the interest of the&#xD;
BPT representation over the classical pixel-based representa-&#xD;
tion will be highlighted</itunes:summary>
    </item>
    <item>
      <title>Real-time user independent hand gesture recognition from time-of-flight camera video using static and dynamic models</title>
      <link>http://hdl.handle.net/2117/18152</link>
      <description>Title: Real-time user independent hand gesture recognition from time-of-flight camera video using static and dynamic models
Authors: Molina, Javier; Escudero-Viñolo, Marcos; Bescós Cano, Jesús; Signorelo, Alessandro; Pardàs Feliu, Montse; Ferran, Christian; Marqués Acosta, Fernando; Martínez, José Maria
Abstract: The use of hand gestures offers an alternative to the commonly used human computer interfaces, providing a more intuitive way of navigating among menus and multimedia applications. This paper presents a system for hand gesture recognition devoted to control windows applications. Starting from the images captured by a time-of-flight camera (a camera that produces images with an intensity level inversely proportional to the depth of the objects observed) the system performs hand segmentation as well as a low-level extraction of potentially relevant features which are related to the morphological representation of the hand silhouette. Classification based on these features discriminates between a set of possible static hand postures which results, combined with the estimated motion pattern of the hand, in the recognition of dynamic hand gestures. The whole system works in real-time, allowing practical interaction between user and application.</description>
      <pubDate>Fri, 08 Mar 2013 18:42:48 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18152</guid>
      <dc:date>2013-03-08T18:42:48Z</dc:date>
      <itunes:author>Molina, Javier; Escudero-Viñolo, Marcos; Bescós Cano, Jesús; Signorelo, Alessandro; Pardàs Feliu, Montse; Ferran, Christian; Marqués Acosta, Fernando; Martínez, José Maria</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>The use of hand gestures offers an alternative to the commonly used human computer interfaces, providing a more intuitive way of navigating among menus and multimedia applications. This paper presents a system for hand gesture recognition devoted to control windows applications. Starting from the images captured by a time-of-flight camera (a camera that produces images with an intensity level inversely proportional to the depth of the objects observed) the system performs hand segmentation as well as a low-level extraction of potentially relevant features which are related to the morphological representation of the hand silhouette. Classification based on these features discriminates between a set of possible static hand postures which results, combined with the estimated motion pattern of the hand, in the recognition of dynamic hand gestures. The whole system works in real-time, allowing practical interaction between user and application.</itunes:summary>
    </item>
    <item>
      <title>From global image annotation to interactive object segmentation</title>
      <link>http://hdl.handle.net/2117/17941</link>
      <description>Title: From global image annotation to interactive object segmentation
Authors: Giró Nieto, Xavier; Martos Asensio, Manel; Mohedano Robles, Eva; Pont Tuset, Jordi
Abstract: This paper presents a graphical environment for the annotation of still images that works both at the global and local scales. At the global scale, each image can be tagged with positive, negative and neutral labels referred to a semantic class from an ontology. These annotations can be used to train and evaluate an image classifier. A finer annotation at a local scale is also available for interactive segmentation of objects. This process is formulated as a selection of regions from a precomputed hierarchical partition called Binary Partition Tree. Three different semi-supervised methods have been presented and evaluated: bounding boxes, scribbles and hierarchical navigation. The implemented Java source code is published under a free software license</description>
      <pubDate>Fri, 22 Feb 2013 13:14:38 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/17941</guid>
      <dc:date>2013-02-22T13:14:38Z</dc:date>
      <itunes:author>Giró Nieto, Xavier; Martos Asensio, Manel; Mohedano Robles, Eva; Pont Tuset, Jordi</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper presents a graphical environment for the annotation of still images that works both at the global and local scales. At the global scale, each image can be tagged with positive, negative and neutral labels referred to a semantic class from an ontology. These annotations can be used to train and evaluate an image classifier. A finer annotation at a local scale is also available for interactive segmentation of objects. This process is formulated as a selection of regions from a precomputed hierarchical partition called Binary Partition Tree. Three different semi-supervised methods have been presented and evaluated: bounding boxes, scribbles and hierarchical navigation. The implemented Java source code is published under a free software license</itunes:summary>
    </item>
    <item>
      <title>Filtering and segmentation of polarimetric SAR data based on binary partition trees</title>
      <link>http://hdl.handle.net/2117/16589</link>
      <description>Title: Filtering and segmentation of polarimetric SAR data based on binary partition trees
Authors: Alonso González, Alberto; López Martínez, Carlos; Salembier Clairon, Philippe Jean
Abstract: In this paper,we propose the use of binary partition&#xD;
trees (BPT) to introduce a novel region-based and multi-scale polarimetric&#xD;
SAR (PolSAR) data representation. The BPT structure&#xD;
represents homogeneous regions in the data at different detail&#xD;
levels. The construction process of the BPT is based, firstly, on&#xD;
a region model able to represent the homogeneous areas, and,&#xD;
secondly, on a dissimilarity measure in order to identify similar&#xD;
areas and define the merging sequence. Depending on the final&#xD;
application, a BPT pruning strategy needs to be introduced. In this&#xD;
paper, we focus on the application of BPT PolSAR data representation&#xD;
for speckle noise filtering and data segmentation on the basis&#xD;
of the Gaussian hypothesis, where the average covariance or coherency&#xD;
matrices are considered as a region model. We introduce&#xD;
and quantitatively analyze different dissimilarity measures. In this&#xD;
case, and with the objective to be sensitive to the complete polarimetric&#xD;
information under the Gaussian hypothesis, dissimilarity&#xD;
measures considering the complete covariance or coherency matrices&#xD;
are employed.When confronted to PolSAR speckle filtering,&#xD;
two pruning strategies are detailed and evaluated. As presented,&#xD;
the BPT PolSAR speckle filter defined filters data according to the&#xD;
complete polarimetric information. As shown, this novel filtering&#xD;
approach is able to achieve very strong filtering while preserving&#xD;
the spatial resolution and the polarimetric information. Finally,&#xD;
the BPT representation structure is employed for high spatial&#xD;
resolution image segmentation applied to coastline detection. The&#xD;
analyses detailed in this work are based on simulated, as well as on&#xD;
real PolSAR data acquired by the ESAR system of DLR and the&#xD;
RADARSAT-2 system.</description>
      <pubDate>Thu, 27 Sep 2012 15:38:31 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/16589</guid>
      <dc:date>2012-09-27T15:38:31Z</dc:date>
      <itunes:author>Alonso González, Alberto; López Martínez, Carlos; Salembier Clairon, Philippe Jean</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>In this paper,we propose the use of binary partition&#xD;
trees (BPT) to introduce a novel region-based and multi-scale polarimetric&#xD;
SAR (PolSAR) data representation. The BPT structure&#xD;
represents homogeneous regions in the data at different detail&#xD;
levels. The construction process of the BPT is based, firstly, on&#xD;
a region model able to represent the homogeneous areas, and,&#xD;
secondly, on a dissimilarity measure in order to identify similar&#xD;
areas and define the merging sequence. Depending on the final&#xD;
application, a BPT pruning strategy needs to be introduced. In this&#xD;
paper, we focus on the application of BPT PolSAR data representation&#xD;
for speckle noise filtering and data segmentation on the basis&#xD;
of the Gaussian hypothesis, where the average covariance or coherency&#xD;
matrices are considered as a region model. We introduce&#xD;
and quantitatively analyze different dissimilarity measures. In this&#xD;
case, and with the objective to be sensitive to the complete polarimetric&#xD;
information under the Gaussian hypothesis, dissimilarity&#xD;
measures considering the complete covariance or coherency matrices&#xD;
are employed.When confronted to PolSAR speckle filtering,&#xD;
two pruning strategies are detailed and evaluated. As presented,&#xD;
the BPT PolSAR speckle filter defined filters data according to the&#xD;
complete polarimetric information. As shown, this novel filtering&#xD;
approach is able to achieve very strong filtering while preserving&#xD;
the spatial resolution and the polarimetric information. Finally,&#xD;
the BPT representation structure is employed for high spatial&#xD;
resolution image segmentation applied to coastline detection. The&#xD;
analyses detailed in this work are based on simulated, as well as on&#xD;
real PolSAR data acquired by the ESAR system of DLR and the&#xD;
RADARSAT-2 system.</itunes:summary>
    </item>
    <item>
      <title>Real-time head and hand tracking based on 2.5D data</title>
      <link>http://hdl.handle.net/2117/16136</link>
      <description>Title: Real-time head and hand tracking based on 2.5D data
Authors: Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
Abstract: A novel real-time algorithm for head and hand&#xD;
tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based&#xD;
template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached&#xD;
to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or&#xD;
closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.</description>
      <pubDate>Mon, 25 Jun 2012 16:16:54 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/16136</guid>
      <dc:date>2012-06-25T16:16:54Z</dc:date>
      <itunes:author>Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>A novel real-time algorithm for head and hand&#xD;
tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based&#xD;
template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached&#xD;
to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or&#xD;
closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.</itunes:summary>
    </item>
    <item>
      <title>Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies</title>
      <link>http://hdl.handle.net/2117/14844</link>
      <description>Title: Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies
Authors: Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardàs Feliu, Montse; Monte Moreno, Enrique
Abstract: This article presents a new approach to the problem of simultaneous tracking of several people in low-resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene, being the starting point of the processing chain. We review how the initiation and termination of tracks influences the overall tracker performance, and present a Bayesian approach to efficiently create and destroy tracks. Two Monte Carlo-based schemes adapted to the incoming 3D discrete data are introduced. First, a particle filtering technique is proposed relying on a volume likelihood function taking into account both occupancy and color information. Sparse sampling is presented as an alternative based on a sampling of the surface voxels in order to estimate the centroid of the tracked people. In this case, the likelihood function is based on local neighborhoods computations thus dramatically decreasing the computational load of the algorithm. A discrete 3D re-sampling procedure is introduced to drive these samples along time. Multiple targets are tracked by means of multiple filters, and interaction among them is modeled through a 3D blocking scheme. Tests over CLEAR-annotated database yield quantitative results showing the effectiveness of the proposed algorithms in indoor scenarios, and a fair comparison with other state-of-the-art algorithms is presented. We also consider the real-time performance of the proposed algorithm.</description>
      <pubDate>Thu, 26 Jan 2012 19:38:20 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/14844</guid>
      <dc:date>2012-01-26T19:38:20Z</dc:date>
      <itunes:author>Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardàs Feliu, Montse; Monte Moreno, Enrique</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This article presents a new approach to the problem of simultaneous tracking of several people in low-resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene, being the starting point of the processing chain. We review how the initiation and termination of tracks influences the overall tracker performance, and present a Bayesian approach to efficiently create and destroy tracks. Two Monte Carlo-based schemes adapted to the incoming 3D discrete data are introduced. First, a particle filtering technique is proposed relying on a volume likelihood function taking into account both occupancy and color information. Sparse sampling is presented as an alternative based on a sampling of the surface voxels in order to estimate the centroid of the tracked people. In this case, the likelihood function is based on local neighborhoods computations thus dramatically decreasing the computational load of the algorithm. A discrete 3D re-sampling procedure is introduced to drive these samples along time. Multiple targets are tracked by means of multiple filters, and interaction among them is modeled through a 3D blocking scheme. Tests over CLEAR-annotated database yield quantitative results showing the effectiveness of the proposed algorithms in indoor scenarios, and a fair comparison with other state-of-the-art algorithms is presented. We also consider the real-time performance of the proposed algorithm.</itunes:summary>
    </item>
    <item>
      <title>Acoustic event detection based on feature-level fusion of audio and video modalities</title>
      <link>http://hdl.handle.net/2117/13630</link>
      <description>Title: Acoustic event detection based on feature-level fusion of audio and video modalities
Authors: Butko, Taras; Canton Ferrer, Cristian; Segura Perales, Carlos; Giró Nieto, Xavier; Nadeu Camprubí, Climent; Hernando Pericás, Francisco Javier; Casas Pla, Josep Ramon
Abstract: Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When&#xD;
applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.
Description: Research article</description>
      <pubDate>Sun, 23 Oct 2011 09:26:48 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/13630</guid>
      <dc:date>2011-10-23T09:26:48Z</dc:date>
      <itunes:author>Butko, Taras; Canton Ferrer, Cristian; Segura Perales, Carlos; Giró Nieto, Xavier; Nadeu Camprubí, Climent; Hernando Pericás, Francisco Javier; Casas Pla, Josep Ramon</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When&#xD;
applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.</itunes:summary>
    </item>
    <item>
      <title>Human motion capture using scalable body models</title>
      <link>http://hdl.handle.net/2117/13393</link>
      <description>Title: Human motion capture using scalable body models
Authors: Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardàs Feliu, Montse
Abstract: This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. Scalable human body models are introduced as an ordered set of articulated models fulfilling an inclusive hierarchy. The concept of annealing is applied to derive a generic particle filtering scheme able to perform a sequential filtering over the set of models contained in the scalable human body model. Two annealing loops are employed, the standard likelihood annealing and the newly introduced structural annealing, leading to a robust, progressive and efficient analysis of the input data. The validity of this scheme is tested by performing markerless human motion capture in a multi-camera environment employing the standard HumanEva annotated datasets. Finally, quantitative results are presented and compared with other existing HMC techniques.</description>
      <pubDate>Thu, 29 Sep 2011 16:45:32 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/13393</guid>
      <dc:date>2011-09-29T16:45:32Z</dc:date>
      <itunes:author>Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardàs Feliu, Montse</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. Scalable human body models are introduced as an ordered set of articulated models fulfilling an inclusive hierarchy. The concept of annealing is applied to derive a generic particle filtering scheme able to perform a sequential filtering over the set of models contained in the scalable human body model. Two annealing loops are employed, the standard likelihood annealing and the newly introduced structural annealing, leading to a robust, progressive and efficient analysis of the input data. The validity of this scheme is tested by performing markerless human motion capture in a multi-camera environment employing the standard HumanEva annotated datasets. Finally, quantitative results are presented and compared with other existing HMC techniques.</itunes:summary>
    </item>
    <item>
      <title>Multiview depth coding based on combined color/depth segmentation</title>
      <link>http://hdl.handle.net/2117/13212</link>
      <description>Title: Multiview depth coding based on combined color/depth segmentation
Authors: Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Aflaki, Payman; Calderero Patino, Felipe; Marqués Acosta, Fernando
Abstract: In this paper, a new coding method for multiview depth video is presented. Considering the smooth structure and sharp edges of depth maps, a segmentation based approach is proposed. This allows further&#xD;
preserving the depth contours thus introducing fewer artifacts in the depth perception of the video. To reduce the cost associated with partition coding, an approximation of the depth partition is built using the decoded color view segmentation. This approximation is refined by sending some complementary information about the relevant differences between color and depth partitions. For coding the depth content of each region, a decomposition into orthogonal basis is used in this paper although similar decompositions may be also employed. Experimental results show that the proposed segmentation based depth coding method outperforms H.264/AVC and H.264/MVC by more than 2 dB at similar bitrates.</description>
      <pubDate>Thu, 15 Sep 2011 15:46:55 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/13212</guid>
      <dc:date>2011-09-15T15:46:55Z</dc:date>
      <itunes:author>Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Aflaki, Payman; Calderero Patino, Felipe; Marqués Acosta, Fernando</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>In this paper, a new coding method for multiview depth video is presented. Considering the smooth structure and sharp edges of depth maps, a segmentation based approach is proposed. This allows further&#xD;
preserving the depth contours thus introducing fewer artifacts in the depth perception of the video. To reduce the cost associated with partition coding, an approximation of the depth partition is built using the decoded color view segmentation. This approximation is refined by sending some complementary information about the relevant differences between color and depth partitions. For coding the depth content of each region, a decomposition into orthogonal basis is used in this paper although similar decompositions may be also employed. Experimental results show that the proposed segmentation based depth coding method outperforms H.264/AVC and H.264/MVC by more than 2 dB at similar bitrates.</itunes:summary>
    </item>
    <item>
      <title>Edge enhancement algorithm based on the wavelet transform for automatic edge detection in SAR images</title>
      <link>http://hdl.handle.net/2117/11057</link>
      <description>Title: Edge enhancement algorithm based on the wavelet transform for automatic edge detection in SAR images
Authors: Tello Alonso, Mª Victoria; López Martínez, Carlos; Mallorquí Franquet, Jordi Joan; Salembier Clairon, Philippe Jean
Abstract: This paper presents a novel technique for automatic edge enhancement and detection in synthetic aperture radar (SAR) images. The characteristics of SAR images justify the importance of an edge enhancement step prior to edge detection. Therefore, this paper presents a robust and unsupervised edge enhancement algorithm based on a combination of wavelet coefficients at different scales. The performance of the method is first tested on simulated images. Then, in order to complete the automatic detection chain, among the different options for the decision stage, the use of geodesic active contour is proposed. The second part of this paper suggests the extraction of the coastline in SAR images as a particular case of edge detection. Hence, after highlighting its practical interest, the technique that is theoretically presented in the first part of this paper is applied to real scenarios. Finally, the chances of its operational capability are assessed.</description>
      <pubDate>Mon, 17 Jan 2011 10:34:26 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/11057</guid>
      <dc:date>2011-01-17T10:34:26Z</dc:date>
      <itunes:author>Tello Alonso, Mª Victoria; López Martínez, Carlos; Mallorquí Franquet, Jordi Joan; Salembier Clairon, Philippe Jean</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper presents a novel technique for automatic edge enhancement and detection in synthetic aperture radar (SAR) images. The characteristics of SAR images justify the importance of an edge enhancement step prior to edge detection. Therefore, this paper presents a robust and unsupervised edge enhancement algorithm based on a combination of wavelet coefficients at different scales. The performance of the method is first tested on simulated images. Then, in order to complete the automatic detection chain, among the different options for the decision stage, the use of geodesic active contour is proposed. The second part of this paper suggests the extraction of the coastline in SAR images as a particular case of edge detection. Hence, after highlighting its practical interest, the technique that is theoretically presented in the first part of this paper is applied to real scenarios. Finally, the chances of its operational capability are assessed.</itunes:summary>
    </item>
    <item>
      <title>Integration of audiovisual sensors and technologies in a smart room</title>
      <link>http://hdl.handle.net/2117/9468</link>
      <description>Title: Integration of audiovisual sensors and technologies in a smart room
Authors: Neumann, J; Casas Pla, Josep Ramon; Macho Ciena, Dusan; Ruiz Hidalgo, Javier
Abstract: At the Technical University of Catalonia&#xD;
(UPC), a smart room has been equipped with 85 microphones&#xD;
and 8 cameras. This paper describes the setup of the&#xD;
sensors, gives an overview of the underlying hardware and&#xD;
software infrastructure and indicates possibilities for highand&#xD;
low-level multi-modal interaction. An example of&#xD;
usage of the information collected from the distributed&#xD;
sensor network is explained in detail: the system supports&#xD;
a group of students that have to solve a lab assignment&#xD;
related problem.</description>
      <pubDate>Wed, 06 Oct 2010 15:47:32 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/9468</guid>
      <dc:date>2010-10-06T15:47:32Z</dc:date>
      <itunes:author>Neumann, J; Casas Pla, Josep Ramon; Macho Ciena, Dusan; Ruiz Hidalgo, Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>At the Technical University of Catalonia&#xD;
(UPC), a smart room has been equipped with 85 microphones&#xD;
and 8 cameras. This paper describes the setup of the&#xD;
sensors, gives an overview of the underlying hardware and&#xD;
software infrastructure and indicates possibilities for highand&#xD;
low-level multi-modal interaction. An example of&#xD;
usage of the information collected from the distributed&#xD;
sensor network is explained in detail: the system supports&#xD;
a group of students that have to solve a lab assignment&#xD;
related problem.</itunes:summary>
    </item>
    <item>
      <title>Audiovisual head orientation estimation with particle filtering in multisensor scenarios</title>
      <link>http://hdl.handle.net/2117/9466</link>
      <description>Title: Audiovisual head orientation estimation with particle filtering in multisensor scenarios
Authors: Canton Ferrer, Cristian; Segura Perales, Carlos; Casas Pla, Josep Ramon; Pardàs Feliu, Montse; Hernando Pericás, Francisco Javier
Abstract: This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.</description>
      <pubDate>Wed, 06 Oct 2010 15:01:14 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/9466</guid>
      <dc:date>2010-10-06T15:01:14Z</dc:date>
      <itunes:author>Canton Ferrer, Cristian; Segura Perales, Carlos; Casas Pla, Josep Ramon; Pardàs Feliu, Montse; Hernando Pericás, Francisco Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.</itunes:summary>
    </item>
    <item>
      <title>Multi-person tracking strategies based on voxel analysis</title>
      <link>http://hdl.handle.net/2117/9463</link>
      <description>Title: Multi-person tracking strategies based on voxel analysis
Authors: Canton Ferrer, Cristian; Salvador Marcos, Jordi; Casas Pla, Josep Ramon; Pardàs Feliu, Montse
Abstract: This paper presents two approaches to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the foreground objects in the scene. Color information obtained from a zenithal camera view is added to this 3D information. The first tracking approach implements heuristic association rules between blobs labelled according to spatiotemporal connectivity criteria. Association rules are based on a cost function which considers their placement and color histogram. In the second approach, a particle filtering scheme adapted to the incoming 3D discrete data is proposed. A volume likelihood function and a discrete 3D re-sampling procedure are introduced to evaluate and drive particles. Multiple targets are tracked by means of multiple particle filters and interaction among them is modeled through a 3D blocking scheme. Evaluation over the CLEAR 2007 database yields quantitative results assessing the performance of the proposed algorithm for indoor scenarios.</description>
      <pubDate>Wed, 06 Oct 2010 14:11:42 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/9463</guid>
      <dc:date>2010-10-06T14:11:42Z</dc:date>
      <itunes:author>Canton Ferrer, Cristian; Salvador Marcos, Jordi; Casas Pla, Josep Ramon; Pardàs Feliu, Montse</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper presents two approaches to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the foreground objects in the scene. Color information obtained from a zenithal camera view is added to this 3D information. The first tracking approach implements heuristic association rules between blobs labelled according to spatiotemporal connectivity criteria. Association rules are based on a cost function which considers their placement and color histogram. In the second approach, a particle filtering scheme adapted to the incoming 3D discrete data is proposed. A volume likelihood function and a discrete 3D re-sampling procedure are introduced to evaluate and drive particles. Multiple targets are tracked by means of multiple particle filters and interaction among them is modeled through a 3D blocking scheme. Evaluation over the CLEAR 2007 database yields quantitative results assessing the performance of the proposed algorithm for indoor scenarios.</itunes:summary>
    </item>
    <item>
      <title>Skeleton and shape adjustment and tracking in multicamera environments</title>
      <link>http://hdl.handle.net/2117/9002</link>
      <description>Title: Skeleton and shape adjustment and tracking in multicamera environments
Authors: Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardàs Feliu, Montse
Abstract: In this paper we present a method for automatic body model adjustment and motion tracking in multicamera environments.We introduce a set of shape deformation parameters based on linear blend skinning,&#xD;
that allow a deformation related to the scaling of the distinct bones of the body model skeleton, and a deformation in the radial direction of a bone. The adjustment of a generic body model to a specific subject&#xD;
is achieved by the estimation of those shape deformation parameters.&#xD;
This estimation combines a local optimization method and hierarchical&#xD;
particle filtering, and uses an efficient cost function based on foreground&#xD;
silhouettes using GPU. This estimation takes into account anthropometric constraints by using a rejection sampling method of propagation of particles. We propose a hierarchical particle filtering method for motion tracking using the adjusted model. We show accurate model adjustment and tracking for distinct subjects in a 5 cameras set up.</description>
      <pubDate>Tue, 21 Sep 2010 14:06:54 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/9002</guid>
      <dc:date>2010-09-21T14:06:54Z</dc:date>
      <itunes:author>Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardàs Feliu, Montse</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>In this paper we present a method for automatic body model adjustment and motion tracking in multicamera environments.We introduce a set of shape deformation parameters based on linear blend skinning,&#xD;
that allow a deformation related to the scaling of the distinct bones of the body model skeleton, and a deformation in the radial direction of a bone. The adjustment of a generic body model to a specific subject&#xD;
is achieved by the estimation of those shape deformation parameters.&#xD;
This estimation combines a local optimization method and hierarchical&#xD;
particle filtering, and uses an efficient cost function based on foreground&#xD;
silhouettes using GPU. This estimation takes into account anthropometric constraints by using a rejection sampling method of propagation of particles. We propose a hierarchical particle filtering method for motion tracking using the adjusted model. We show accurate model adjustment and tracking for distinct subjects in a 5 cameras set up.</itunes:summary>
    </item>
  </channel>
</rss>

