Please use this identifier to cite or link to this item:
https://hdl.handle.net/1889/5572
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Locatelli, Marco | - |
dc.contributor.advisor | Bertozzi, Massimo | - |
dc.contributor.advisor | Zani, Paolo | - |
dc.contributor.advisor | Medici, Paolo | - |
dc.contributor.author | Orsingher, Marco | - |
dc.date.accessioned | 2024-03-05T12:57:05Z | - |
dc.date.available | 2024-03-05T12:57:05Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | https://hdl.handle.net/1889/5572 | - |
dc.description.abstract | Building a 3D representation of the world is a longstanding challenge in computer vision and machine learning, with applications in virtual and augmented reality, autonomous driving, industrial site scanning, cultural heritage preservation, and more. The main goal of this thesis is to develop efficient algorithms for processing 3D data, by combining classical geometry-based methods with modern deep learning approaches. Efficiency is a crucial aspect of 3D perception, since data are typically acquired by low-cost noisy sensors and must be processed on mobile platforms with limited computational budget. Furthermore, the exponential growth of 3D data sources calls for scalable and efficient processing pipelines. Our first contribution is a novel framework for multi-view 3D reconstruction in urban scenarios. We significantly improve a state-of-the-art classical approach for dense reconstruction, by designing a local-to-global optimization strategy that leads to geometrically consistent surfaces. Moreover, we show how to scale it up to arbitrarily large scenes with a divide and conquer procedure that combines view clustering and view selection, thus allowing for a massive parallelization of the 3D reconstruction process. Secondly, we present two algorithmic advances in efficient training of neural representation for novel view synthesis. We propose to speed up the learning process by focusing on informative rays, which are defined in the 2D image space by high-entropy pixels and in the 3D object space by a sparse set of cameras that ensures scene coverage, while keeping optimal relative baseline. Additionally, we leverage multi-view geometry as pseudo-ground truth to guide the neural implicit field towards high-fidelity 3D models. We also tackle the point cloud upsampling task, with the aim of refining noisy and low-resolution data from cheap range sensors into dense and uniform point clouds. To this end, we formulate the first learning-based approach that allows 3D upsampling with arbitrary scaling factors, including non-integer values, with a single trained model. The main idea is to convert the input to a probabilistic representation and to train a Transformer network to map between samples from such domain and points on the underlying object surface. This flexibility is crucial in real-world applications with computational and bandwidth constraints. Finally, we propose two novel methods for neural network compression. We first show that feature-based knowledge distillation can be improved by complementing the direct feature matching baseline with a teacher features-driven regularization loss, thus enabling the student model to learn more robust latent representations. Then, we introduce a neural compression approach that combines network pruning with self-distillation and significantly improves the sparsity-accuracy tradeoff for several perception tasks. This allows to deploy neural architectures on constrained hardware for fast inference with unprecedented performances. | en_US |
dc.language.iso | Inglese | en_US |
dc.publisher | Università degli studi di Parma. Dipartimento di Ingegneria e architettura | en_US |
dc.publisher | Vislab Srl (an Ambarella Inc company) | en_US |
dc.relation.ispartofseries | Dottorato di ricerca in Tecnologie dell'informazione | en_US |
dc.rights | © Marco Orsingher, 2024 | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | 3D Reconstruction | en_US |
dc.subject | Novel View Synthesis | en_US |
dc.subject | Autonomous Driving | en_US |
dc.subject | Point Cloud Processing | en_US |
dc.subject | Knowledge Distillation | en_US |
dc.subject | Deep Learning | en_US |
dc.title | Geometry and learning for efficient 3D perception | en_US |
dc.type | Doctoral thesis | en_US |
dc.subject.miur | ING-INF/05 | en_US |
dc.rights.license | Attribution-NonCommercial-NoDerivatives 4.0 Internazionale | * |
dc.rights.license | Attribution-NonCommercial-NoDerivatives 4.0 Internazionale | * |
Appears in Collections: | Tecnologie dell'informazione. Tesi di dottorato |
This item is licensed under a Creative Commons License