A comprehensive analysis of vision deep learning methods for object detection and 6D pose estimation: Real-time applications

Sapienza, Davide

Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/5385

Title:	A comprehensive analysis of vision deep learning methods for object detection and 6D pose estimation: Real-time applications
Other Titles:	Un'analisi completa dei metodi di visione di deep learning per Object Detection e 6D Pose Estimation: applicazioni real-time
Authors:	Sapienza, Davide
Issue Date:	2023
Publisher:	Università degli studi di Parma. Dipartimento di Scienze matematiche, fisiche e informatiche
Document Type:	Doctoral thesis
Abstract:	The popularity of Artificial Intelligence (AI) systems is growing rapidly, both in academia and society. In recent years, advances in computer vision and machine learning have enabled AI systems to be applied to a variety of scenarios, such as autonomous driving, robotics, and augmented reality applications. An obstacle detection system allows a car to detect and avoid potential hazards, or to brake in time to prevent an accident. Augmented reality can assist a surgeon in finding the most efficient way to make an incision, leading to better outcomes for patients. Automation of industrial processes can help reduce the risk of on-the-job injuries, by reducing the amount of wear and tear work.%of manual labor needed. These applications require the detection, identification and pose estimation of objects, to improve people's quality of life. In order to obtain a working system, many factors must be taken into account, including the choice of data in the learning process, the choice of the learning method, and the choice of hardware platforms. The current research focuses on examining various techniques to enhance accuracy, speed, and stability in two key applications: Object Detection and 6D Pose Estimation. This thesis will mainly delve into deep learning methods, which have led to breakthroughs in these fields. i) We will analyze the difficulties and characteristics of embedded Object Detection methods in detail, focusing on latencies, throughput, accuracy, memory and power consumption. We will evaluate the impact of each of these factors on the performance of the object detection system. ii) We will discuss the challenges and biases related to datasets and methods, as well as the possible solutions to address them. The importance of awareness of the inherent limitations of a given problem will be addressed. iii) Finally, a real-world case study of Object Detection and 6D Pose Estimation in underwater environments is presented, highlighting the challenges, pitfalls, and best choices for this particular scenario. The results of the experiments, on both simulated and real-world scenarios, will demonstrate that the proposed solutions are reliable and effective in detecting objects and estimating their 6D pose. The findings of this research could be used to improve accuracy and efficiency for 2D Object Detection and 6D Pose Estimation methods. La popolarità dei sistemi di intelligenza artificiale (AI) è in rapida crescita, sia nel mondo accademico che nella società in generale. Negli ultimi anni, i progressi nella computer vision e nell'apprendimento automatico hanno permesso di applicare i sistemi di AI a scenari differenti, come applicazioni di guida autonoma, robotica e realtà aumentata. Un sistema di rilevamento degli ostacoli consente a un'automobile di individuare ed evitare potenziali pericoli o di frenare in tempo per evitare un incidente. La realtà aumentata può aiutare un chirurgo a trovare il modo più efficiente per praticare un'incisione, con risultati migliori per i pazienti. L'automazione di processi industriali può contribuire a ridurre il rischio di infortuni sul lavoro, anche riducendo la quantità di lavoro usurante. Queste applicazioni richiedono il rilevamento, l'identificazione e la stima della posa degli oggetti, per migliorare la qualità della vita delle persone. Per ottenere un sistema funzionante, è necessario prendere in considerazione molti fattori, tra cui la scelta dei dati nel processo di apprendimento, la scelta del metodo di apprendimento e la scelta delle piattaforme hardware. La ricerca attuale si concentra sull'esame di varie tecniche per migliorare l'accuratezza, la velocità e la stabilità in due applicazioni chiave: il rilevamento degli oggetti e la stima della posa 6D. Questa tesi approfondirà principalmente i metodi di apprendimento basati su reti neurali artificili profonde, che hanno portato a progressi in questi campi. i) Analizzeremo in dettaglio le difficoltà e le caratteristiche dei metodi di Object Detection per ambienti embedded, concentrandoci su latenze, throughput, precisione, memoria e consumo energetico. Valuteremo l'impatto di ciascuno di questi fattori sulle prestazioni del sistema di rilevamento degli oggetti. ii) Discuteremo le sfide e i bias legati ai dataset e ai metodi, nonché le possibili soluzioni per affrontarle. Verrà affrontata l'importanza della consapevolezza dei limiti intrinseci di un dato problema. iii) Infine, verrà presentato un caso di studio reale di rilevamento di oggetti e stima della posa 6D in ambienti subacquei, evidenziando le sfide, le insidie e le scelte migliori per questo particolare scenario. I risultati degli esperimenti, sia in scenari simulati che reali, dimostreranno che le soluzioni proposte sono affidabili ed efficaci nel rilevamento di oggetti e nella stima della loro posa 6D. I risultati di questa ricerca potranno essere utilizzati per migliorare l'accuratezza e l'efficienza dei metodi di 2D Object Detection e 6D Pose Estimation.
Appears in Collections:	Matematica. Tesi di dottorato

Files in This Item:

File	Description	Size	Format
report_dottorato.pdf Restricted Access	Relazione finale	96.04 kB	Adobe PDF	View/Open Request a copy
phd_thesis_davide_sapienza.pdf	Tesi di dottorato	42.13 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License

DSpaceUnipr

DSpaceUnipr is the institutional repository of the University of Parma. Its aim is to give visibility to the University's scholarly content and learning material.