Munirathnam, Venkatesh Gurram ORCID: 0000-0002-4393-9267 (2023) Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles. PhD thesis, Dublin City University.
Abstract
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2023 |
Refereed: | No |
Supervisor(s): | Little, Suzanne and O'Connor, Noel E. |
Subjects: | Computer Science > Image processing Computer Science > Digital video |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License |
Funders: | Science Foundation Ireland via Insight Research Centre for Data Analytics, DCU |
ID Code: | 27984 |
Deposited On: | 31 Mar 2023 09:04 by Suzanne Little . Last Modified 08 Dec 2023 15:13 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 23MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record