From 2D image segmentation to 3D point cloud segmentation for large-scale outdoor scene understanding
Thesis event information
Date and time of the thesis defence
Place of the thesis defence
Lecture hall L5, Linnanmaa
Topic of the dissertation
From 2D image segmentation to 3D point cloud segmentation for large-scale outdoor scene understanding
Doctoral candidate
Master of Engineering Bike Chen
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Biomimetics and Intelligent Systems Group
Subject of study
Computer Science and Engineering
Opponent
Professor Bastian Leibe, RWTH Aachen University
Custos
Professor Juha Röning, University of Oulu
From 2D image segmentation to 3D point cloud segmentation for large-scale outdoor scene understanding
Semantic image segmentation (SiS) and point cloud segmentation (PCS) aim to separate an image and a point cloud into coherent and meaningful parts, respectively. Both tasks play an important role in robotics and autonomous driving vehicles.
For SiS, existing SiS models cannot improve the segmentation performance, and they cannot output meaningful uncertainty values without a heavy computational overhead. In the thesis, by utilizing the property of Hyperbolic space, we design a novel loss function, namely Hyperbolic uncertainty loss (HyperUL), to boost the segmentation performance. Additionally, we develop the Hyperbolic uncertainty estimation method to output meaningful uncertainties with negligible computational cost. Extensive experimental results on the Cityscapes, UAVid, and ACDC datasets demonstrate the effectiveness of the introduced HyperUL and uncertainty estimation approach.
For PCS, we focus on range image-based models for large-scale outdoor point clouds, which existing models cannot effectively and efficiently process. In this thesis, we propose a novel projection method, called scan unfolding++ (SU++), to convert a large-scale outdoor point cloud into a range image and avoid a large number of missing values. Additionally, we introduce an interpolation method, namely range-dependent K-nearest neighbor interpolation (KNNI), to fill in missing values in the generated range image further. Moreover, we design new range image-based neural networks, namely the filling missing values network (FMVNet) and the Fast FMVNet, to achieve state-of-the-art performance. Furthermore, we develop a novel post-processing component, dubbed the trainable pointwise decoder module (PDM), to refine the final pointwise predictions and boost the performance further. Additionally, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy to replace the commonly used copy-paste and copy-rotate-paste operations in data augmentation during training. Additionally, we introduce an improved VRCrop (VRCrop++) strategy and a global copy-rotate-paste (GCrop) technique to boost the performance of the segmentation models further.
For SiS, existing SiS models cannot improve the segmentation performance, and they cannot output meaningful uncertainty values without a heavy computational overhead. In the thesis, by utilizing the property of Hyperbolic space, we design a novel loss function, namely Hyperbolic uncertainty loss (HyperUL), to boost the segmentation performance. Additionally, we develop the Hyperbolic uncertainty estimation method to output meaningful uncertainties with negligible computational cost. Extensive experimental results on the Cityscapes, UAVid, and ACDC datasets demonstrate the effectiveness of the introduced HyperUL and uncertainty estimation approach.
For PCS, we focus on range image-based models for large-scale outdoor point clouds, which existing models cannot effectively and efficiently process. In this thesis, we propose a novel projection method, called scan unfolding++ (SU++), to convert a large-scale outdoor point cloud into a range image and avoid a large number of missing values. Additionally, we introduce an interpolation method, namely range-dependent K-nearest neighbor interpolation (KNNI), to fill in missing values in the generated range image further. Moreover, we design new range image-based neural networks, namely the filling missing values network (FMVNet) and the Fast FMVNet, to achieve state-of-the-art performance. Furthermore, we develop a novel post-processing component, dubbed the trainable pointwise decoder module (PDM), to refine the final pointwise predictions and boost the performance further. Additionally, we introduce a virtual range image-guided copy-rotate-paste (VRCrop) strategy to replace the commonly used copy-paste and copy-rotate-paste operations in data augmentation during training. Additionally, we introduce an improved VRCrop (VRCrop++) strategy and a global copy-rotate-paste (GCrop) technique to boost the performance of the segmentation models further.
Created 11.11.2025 | Updated 12.11.2025