Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/5572
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorLocatelli, Marco-
dc.contributor.advisorBertozzi, Massimo-
dc.contributor.advisorZani, Paolo-
dc.contributor.advisorMedici, Paolo-
dc.contributor.authorOrsingher, Marco-
dc.date.accessioned2024-03-05T12:57:05Z-
dc.date.available2024-03-05T12:57:05Z-
dc.date.issued2024-
dc.identifier.urihttps://hdl.handle.net/1889/5572-
dc.description.abstractBuilding a 3D representation of the world is a longstanding challenge in computer vision and machine learning, with applications in virtual and augmented reality, autonomous driving, industrial site scanning, cultural heritage preservation, and more. The main goal of this thesis is to develop efficient algorithms for processing 3D data, by combining classical geometry-based methods with modern deep learning approaches. Efficiency is a crucial aspect of 3D perception, since data are typically acquired by low-cost noisy sensors and must be processed on mobile platforms with limited computational budget. Furthermore, the exponential growth of 3D data sources calls for scalable and efficient processing pipelines. Our first contribution is a novel framework for multi-view 3D reconstruction in urban scenarios. We significantly improve a state-of-the-art classical approach for dense reconstruction, by designing a local-to-global optimization strategy that leads to geometrically consistent surfaces. Moreover, we show how to scale it up to arbitrarily large scenes with a divide and conquer procedure that combines view clustering and view selection, thus allowing for a massive parallelization of the 3D reconstruction process. Secondly, we present two algorithmic advances in efficient training of neural representation for novel view synthesis. We propose to speed up the learning process by focusing on informative rays, which are defined in the 2D image space by high-entropy pixels and in the 3D object space by a sparse set of cameras that ensures scene coverage, while keeping optimal relative baseline. Additionally, we leverage multi-view geometry as pseudo-ground truth to guide the neural implicit field towards high-fidelity 3D models. We also tackle the point cloud upsampling task, with the aim of refining noisy and low-resolution data from cheap range sensors into dense and uniform point clouds. To this end, we formulate the first learning-based approach that allows 3D upsampling with arbitrary scaling factors, including non-integer values, with a single trained model. The main idea is to convert the input to a probabilistic representation and to train a Transformer network to map between samples from such domain and points on the underlying object surface. This flexibility is crucial in real-world applications with computational and bandwidth constraints. Finally, we propose two novel methods for neural network compression. We first show that feature-based knowledge distillation can be improved by complementing the direct feature matching baseline with a teacher features-driven regularization loss, thus enabling the student model to learn more robust latent representations. Then, we introduce a neural compression approach that combines network pruning with self-distillation and significantly improves the sparsity-accuracy tradeoff for several perception tasks. This allows to deploy neural architectures on constrained hardware for fast inference with unprecedented performances.en_US
dc.language.isoIngleseen_US
dc.publisherUniversità degli studi di Parma. Dipartimento di Ingegneria e architetturaen_US
dc.publisherVislab Srl (an Ambarella Inc company)en_US
dc.relation.ispartofseriesDottorato di ricerca in Tecnologie dell'informazioneen_US
dc.rights© Marco Orsingher, 2024en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subject3D Reconstructionen_US
dc.subjectNovel View Synthesisen_US
dc.subjectAutonomous Drivingen_US
dc.subjectPoint Cloud Processingen_US
dc.subjectKnowledge Distillationen_US
dc.subjectDeep Learningen_US
dc.titleGeometry and learning for efficient 3D perceptionen_US
dc.typeDoctoral thesisen_US
dc.subject.miurING-INF/05en_US
dc.rights.licenseAttribution-NonCommercial-NoDerivatives 4.0 Internazionale*
dc.rights.licenseAttribution-NonCommercial-NoDerivatives 4.0 Internazionale*
Appears in Collections:Tecnologie dell'informazione. Tesi di dottorato

Files in This Item:
File Description SizeFormat 
tesi.pdf6.71 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons