Handling variable shaped & high resolution images for multi-class classification problem

View/Open
Document typeMaster thesis
Date2020-04-20
Rights accessOpen Access
Abstract
Convolutional Neural Networks (CNNs) are usually trained using a pre-determined
fixed spatial image size. While scale-invariance is considered important for visual
representations, CNNs are not scale invariant with respect to the spatial resolution
of the input image; since a change in image dimension may lead to a non-linear
change of their output. At the same time, there are applications (e.g. in medicine)
where images come in multiple scales and shapes not leaving any space for applying
common transformations with which images are deformed and shrinked losing
important information. Leaving high-resolution information can be a big also burden,
resource-wise, with high computational costs, memory and time requirements.
Like that there has been a shift of focus in research from parameter optimization
and connections readjustment towards an improved architectural design of the network;
since different state of the art networks such as Xception, ResNext, PolyNet
and others explore the effect of different transformations on CNNs’ learning capacity.
Instead of modifying the internals of CNNs METavlitó project focuses mainly on
the pre-processing stage of the network in order to handle high-resolution images, as
well as, the variability in their shape. METavlitó proposes two components, one for
clustering images’ resolution into buckets and a training component for scale invariant
learning employing an input agnostic architecture decreasing the average GPU
memory requirements. Compared to a classic approach which follows the common
pre-processing transformations (resizing & cropping) before training, our solution,
using the same architecture controls more the overfiting, increases the accuracy by
3 ����� 5% and decreases the average GPU memory needs by approximately 43% and
thus, the total duration of the training and validation time.
Collections
Files | Description | Size | Format | View |
---|---|---|---|---|
149286.pdf | 8,813Mb | View/Open |
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder