File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Towards object detection in the real world
Title | Towards object detection in the real world |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Sun, P.. (2024). Towards object detection in the real world. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Seeing, recognizing and understanding objects in real world is the critical path for computer vision systems to be applied in human life and practical applications. Object detection is one of the most fundamental techniques. Towards object detection in real world, there are various challenges on its generalization, efficiency and robustness. Addressing these challenges requires systematic innovation in network architecture, training datasets, and learning algorithms.
In this dissertation, we study three important problems related to universal object detection, including (i) end-to-end object detection network architecture, (ii) object tracking in the video and (iii) object recognition in the open world.
For end-to-end object detection, we present a systematic analysis on its learning mechanism and introduce two advanced network architecture, Sparse R-CNN and OneNet, for end-to-end object detection. These methods exhibit competitive performance of accuracy, inference time and training convergence on the challenging object detection datasets, especially on crowded scenes.
For object tracking in the video, we introduce a unified framework for object detection and tracking, TransTrack, by solving these two tasks from a unified query-key perspective. Furthermore, we propose DanceTrack, a new dataset where objects are in uniform appearance and diverse motion, to inspire tracking algorithms that rely on both visual discrimination and motion analysis.
For object recognition in the open world, we demonstrate the hierarchical structure of visual concepts from the perspectives of open category and open granularity. We introduce VLPart, a detector with the ability to predict both open-vocabulary objects and their part segmentation. |
Degree | Doctor of Philosophy |
Subject | Computer vision Pattern recognition systems |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/345415 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sun, Peize | - |
dc.date.accessioned | 2024-08-26T08:59:38Z | - |
dc.date.available | 2024-08-26T08:59:38Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Sun, P.. (2024). Towards object detection in the real world. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/345415 | - |
dc.description.abstract | Seeing, recognizing and understanding objects in real world is the critical path for computer vision systems to be applied in human life and practical applications. Object detection is one of the most fundamental techniques. Towards object detection in real world, there are various challenges on its generalization, efficiency and robustness. Addressing these challenges requires systematic innovation in network architecture, training datasets, and learning algorithms. In this dissertation, we study three important problems related to universal object detection, including (i) end-to-end object detection network architecture, (ii) object tracking in the video and (iii) object recognition in the open world. For end-to-end object detection, we present a systematic analysis on its learning mechanism and introduce two advanced network architecture, Sparse R-CNN and OneNet, for end-to-end object detection. These methods exhibit competitive performance of accuracy, inference time and training convergence on the challenging object detection datasets, especially on crowded scenes. For object tracking in the video, we introduce a unified framework for object detection and tracking, TransTrack, by solving these two tasks from a unified query-key perspective. Furthermore, we propose DanceTrack, a new dataset where objects are in uniform appearance and diverse motion, to inspire tracking algorithms that rely on both visual discrimination and motion analysis. For object recognition in the open world, we demonstrate the hierarchical structure of visual concepts from the perspectives of open category and open granularity. We introduce VLPart, a detector with the ability to predict both open-vocabulary objects and their part segmentation. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Computer vision | - |
dc.subject.lcsh | Pattern recognition systems | - |
dc.title | Towards object detection in the real world | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044843667603414 | - |