Geometry and learning for efficient 3D perception

Orsingher, Marco

Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/5572

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Locatelli, Marco	-
dc.contributor.advisor	Bertozzi, Massimo	-
dc.contributor.advisor	Zani, Paolo	-
dc.contributor.advisor	Medici, Paolo	-
dc.contributor.author	Orsingher, Marco	-
dc.date.accessioned	2024-03-05T12:57:05Z	-
dc.date.available	2024-03-05T12:57:05Z	-
dc.date.issued	2024	-
dc.identifier.uri	https://hdl.handle.net/1889/5572	-
dc.description.abstract	Building a 3D representation of the world is a longstanding challenge in computer vision and machine learning, with applications in virtual and augmented reality, autonomous driving, industrial site scanning, cultural heritage preservation, and more. The main goal of this thesis is to develop efficient algorithms for processing 3D data, by combining classical geometry-based methods with modern deep learning approaches. Efficiency is a crucial aspect of 3D perception, since data are typically acquired by low-cost noisy sensors and must be processed on mobile platforms with limited computational budget. Furthermore, the exponential growth of 3D data sources calls for scalable and efficient processing pipelines. Our first contribution is a novel framework for multi-view 3D reconstruction in urban scenarios. We significantly improve a state-of-the-art classical approach for dense reconstruction, by designing a local-to-global optimization strategy that leads to geometrically consistent surfaces. Moreover, we show how to scale it up to arbitrarily large scenes with a divide and conquer procedure that combines view clustering and view selection, thus allowing for a massive parallelization of the 3D reconstruction process. Secondly, we present two algorithmic advances in efficient training of neural representation for novel view synthesis. We propose to speed up the learning process by focusing on informative rays, which are defined in the 2D image space by high-entropy pixels and in the 3D object space by a sparse set of cameras that ensures scene coverage, while keeping optimal relative baseline. Additionally, we leverage multi-view geometry as pseudo-ground truth to guide the neural implicit field towards high-fidelity 3D models. We also tackle the point cloud upsampling task, with the aim of refining noisy and low-resolution data from cheap range sensors into dense and uniform point clouds. To this end, we formulate the first learning-based approach that allows 3D upsampling with arbitrary scaling factors, including non-integer values, with a single trained model. The main idea is to convert the input to a probabilistic representation and to train a Transformer network to map between samples from such domain and points on the underlying object surface. This flexibility is crucial in real-world applications with computational and bandwidth constraints. Finally, we propose two novel methods for neural network compression. We first show that feature-based knowledge distillation can be improved by complementing the direct feature matching baseline with a teacher features-driven regularization loss, thus enabling the student model to learn more robust latent representations. Then, we introduce a neural compression approach that combines network pruning with self-distillation and significantly improves the sparsity-accuracy tradeoff for several perception tasks. This allows to deploy neural architectures on constrained hardware for fast inference with unprecedented performances.	en_US
dc.language.iso	Inglese	en_US
dc.publisher	Università degli studi di Parma. Dipartimento di Ingegneria e architettura	en_US
dc.publisher	Vislab Srl (an Ambarella Inc company)	en_US
dc.relation.ispartofseries	Dottorato di ricerca in Tecnologie dell'informazione	en_US
dc.rights	© Marco Orsingher, 2024	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	3D Reconstruction	en_US
dc.subject	Novel View Synthesis	en_US
dc.subject	Autonomous Driving	en_US
dc.subject	Point Cloud Processing	en_US
dc.subject	Knowledge Distillation	en_US
dc.subject	Deep Learning	en_US
dc.title	Geometry and learning for efficient 3D perception	en_US
dc.type	Doctoral thesis	en_US
dc.subject.miur	ING-INF/05	en_US
dc.rights.license	Attribution-NonCommercial-NoDerivatives 4.0 Internazionale	*
dc.rights.license	Attribution-NonCommercial-NoDerivatives 4.0 Internazionale	*
Appears in Collections:	Tecnologie dell'informazione. Tesi di dottorato

Files in This Item:

File	Description	Size	Format
tesi.pdf		6.71 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License

DSpaceUnipr

DSpaceUnipr is the institutional repository of the University of Parma. Its aim is to give visibility to the University's scholarly content and learning material.