File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Event-based vision for 6-DOF pose tracking and 3D mapping
| Title | Event-based vision for 6-DOF pose tracking and 3D mapping |
|---|---|
| Authors | |
| Advisors | |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Guan, W. [關偉鵬]. (2025). Event-based vision for 6-DOF pose tracking and 3D mapping. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Simultaneous Localization and Mapping (SLAM) serves as a foundational technology for emerging applications, such as robotics, autonomous driving, embodied intelligence, and augmented / virtual reality. However, traditional image-based SLAM systems still struggle with reliable pose estimation and 3D reconstruction under challenging conditions involving high-speed motion and extreme illumination variations. Event cameras, also known as dynamic vision sensors, have recently emerged as a promising alternative to standard cameras for visual perception. Instead of capturing intensity images at a fixed frame rate, event cameras asynchronously measure per-pixel brightness changes, producing a stream of events that encode the time, pixel location, and sign of the brightness changes. They offer attractive advantages, including high temporal resolution (MHz-level), high dynamic range (HDR, 140 dB), low latency (microsecond), no motion blur, and low power consumption. However, integrating event cameras into SLAM systems presents significant challenges due to the fundamentally different characteristics of asynchronous event streams compared to conventional intensity images, and new paradigm shifts are required.
This dissertation presents innovative solutions and advancements for event-based SLAM. It begins with the development of Mono-EIO, a monocular event-inertial odometry framework that tightly integrates event-corner features with IMU preintegration. These event-corner features are temporally and spatially associated using novel event-based representations with a spatial-temporal and exponential decay kernel, and are subsequently incorporated into a keyframe-based sliding window optimization framework. Mono-EIO achieves high-accuracy, real-time 6-DoF ego-motion estimation even under aggressive motion and HDR conditions. Building upon this foundation, the thesis introduces PL-EVIO, an event-based visual-inertial odometry framework that combines event cameras with standard cameras to enhance robustness. The PL-EVIO utilizes line-based event features to provide additional structural constraints in human-made environments, while point-based event and image features are effectively managed to complement each other. This framework has been successfully applied to quadrotor onboard pose feedback control, enabling complex maneuvers such as flipping and operation in low-light conditions. Additionally, the thesis includes ESVIO, the first stereo event-based visual inertial odometry framework.
The thesis also presents DEIO, a learning-optimization-combined framework that tightly coupled fuses the learning-based event data association with the IMU measurements within graph-based optimization. To the best of our knowledge, DEIO is the first learning-based event-inertial odometry, outperforming over 20 vision-based methods across 10 challenging real-world benchmarks. Finally, the thesis proposes EVI-SAM, a full SLAM system that tackles both 6-DoF pose tracking and 3D dense mapping using a monocular event camera. Its tracking module is the first hybrid approach that integrates both direct-based and feature-based methods within an event-based framework. The mapping module, on the other hand, is the first to achieve event-based dense and textured 3D reconstruction without GPU acceleration by employing a non-learning approach. This method not only successfully recovers 3D scenes structure under aggressive motions but also demonstrates superior performance compared to image-based NeRF or RGB-D cameras. Through these contributions, this dissertation significantly advances SLAM, offering robust solutions and paving the way for future research and applications in event camera. |
| Degree | Doctor of Philosophy |
| Subject | Computer vision Robot vision |
| Dept/Program | Mechanical Engineering |
| Persistent Identifier | http://hdl.handle.net/10722/360643 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Lu, P | - |
| dc.contributor.advisor | Lam, J | - |
| dc.contributor.author | Guan, Weipeng | - |
| dc.contributor.author | 關偉鵬 | - |
| dc.date.accessioned | 2025-09-12T02:02:18Z | - |
| dc.date.available | 2025-09-12T02:02:18Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Guan, W. [關偉鵬]. (2025). Event-based vision for 6-DOF pose tracking and 3D mapping. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/360643 | - |
| dc.description.abstract | Simultaneous Localization and Mapping (SLAM) serves as a foundational technology for emerging applications, such as robotics, autonomous driving, embodied intelligence, and augmented / virtual reality. However, traditional image-based SLAM systems still struggle with reliable pose estimation and 3D reconstruction under challenging conditions involving high-speed motion and extreme illumination variations. Event cameras, also known as dynamic vision sensors, have recently emerged as a promising alternative to standard cameras for visual perception. Instead of capturing intensity images at a fixed frame rate, event cameras asynchronously measure per-pixel brightness changes, producing a stream of events that encode the time, pixel location, and sign of the brightness changes. They offer attractive advantages, including high temporal resolution (MHz-level), high dynamic range (HDR, 140 dB), low latency (microsecond), no motion blur, and low power consumption. However, integrating event cameras into SLAM systems presents significant challenges due to the fundamentally different characteristics of asynchronous event streams compared to conventional intensity images, and new paradigm shifts are required. This dissertation presents innovative solutions and advancements for event-based SLAM. It begins with the development of Mono-EIO, a monocular event-inertial odometry framework that tightly integrates event-corner features with IMU preintegration. These event-corner features are temporally and spatially associated using novel event-based representations with a spatial-temporal and exponential decay kernel, and are subsequently incorporated into a keyframe-based sliding window optimization framework. Mono-EIO achieves high-accuracy, real-time 6-DoF ego-motion estimation even under aggressive motion and HDR conditions. Building upon this foundation, the thesis introduces PL-EVIO, an event-based visual-inertial odometry framework that combines event cameras with standard cameras to enhance robustness. The PL-EVIO utilizes line-based event features to provide additional structural constraints in human-made environments, while point-based event and image features are effectively managed to complement each other. This framework has been successfully applied to quadrotor onboard pose feedback control, enabling complex maneuvers such as flipping and operation in low-light conditions. Additionally, the thesis includes ESVIO, the first stereo event-based visual inertial odometry framework. The thesis also presents DEIO, a learning-optimization-combined framework that tightly coupled fuses the learning-based event data association with the IMU measurements within graph-based optimization. To the best of our knowledge, DEIO is the first learning-based event-inertial odometry, outperforming over 20 vision-based methods across 10 challenging real-world benchmarks. Finally, the thesis proposes EVI-SAM, a full SLAM system that tackles both 6-DoF pose tracking and 3D dense mapping using a monocular event camera. Its tracking module is the first hybrid approach that integrates both direct-based and feature-based methods within an event-based framework. The mapping module, on the other hand, is the first to achieve event-based dense and textured 3D reconstruction without GPU acceleration by employing a non-learning approach. This method not only successfully recovers 3D scenes structure under aggressive motions but also demonstrates superior performance compared to image-based NeRF or RGB-D cameras. Through these contributions, this dissertation significantly advances SLAM, offering robust solutions and paving the way for future research and applications in event camera. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Computer vision | - |
| dc.subject.lcsh | Robot vision | - |
| dc.title | Event-based vision for 6-DOF pose tracking and 3D mapping | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Mechanical Engineering | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991045060525803414 | - |
