Show simple item record

dc.contributor.authorXu, Xiangyu
dc.contributor.authorChen, Hao
dc.contributor.authorMoreno-Noguer, Francesc
dc.contributor.authorJeni, Lázló
dc.contributor.authorDe La Torre, Fernando
dc.contributor.otherInstitut de Robòtica i Informàtica Industrial
dc.date.accessioned2022-05-09T12:50:00Z
dc.date.available2022-05-09T12:50:00Z
dc.date.issued2022
dc.identifier.citationXu, X. [et al.]. 3D Human pose, shape and texture from low-resolution images and videos. "IEEE transactions on pattern analysis and machine intelligence", 2022, p. 1.
dc.identifier.issn0162-8828
dc.identifier.urihttp://hdl.handle.net/2117/367104
dc.description© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
dc.description.abstract3D human pose and shape estimation from monocular images has been an active research area in computer vision. Existing deep learning methods for this task rely on high-resolution input, which however, is not always available in many scenarios such as video surveillance and sports broadcasting. Two common approaches to deal with low-resolution images are applying super-resolution techniques to the input, which may result in unpleasant artifacts, or simply training one model for each resolution, which is impractical in many realistic applications. To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme. The proposed method is able to learn 3D body pose and shape across different resolutions with one single model. The self-supervision loss enforces scale-consistency of the output, and the contrastive learning scheme enforces scale-consistency of the deep features. We show that both these new losses provide robustness when learning in a weakly-supervised manner. Moreover, we extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input. Extensive experiments demonstrate that the RSC-Net can achieve consistently better results than the state-of-the-art methods for challenging low-resolution images.
dc.format.extent1 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject.lcshMachine learning
dc.subject.otherComputer Vision. Author keywords: 3D human pose and shape
dc.subject.otherLow-resolution
dc.subject.otherNeural network
dc.subject.otherSelf-supervised learning
dc.subject.otherContrastive learning
dc.title3D Human pose, shape and texture from low-resolution images and videos
dc.typeArticle
dc.subject.lemacAprenentatge automàtic
dc.contributor.groupUniversitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
dc.identifier.doi10.1109/TPAMI.2021.3070002
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9392295
dc.rights.accessOpen Access
local.identifier.drac32521441
dc.description.versionPostprint (published version)
local.citation.authorXu, X.; Chen, H.; Moreno-Noguer, F.; Jeni, L.; De La Torre, F.
local.citation.publicationNameIEEE transactions on pattern analysis and machine intelligence
local.citation.startingPage1
local.citation.endingPage1


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record