Video parsing and camera pose estimation for 2D to 3D video conversion

Liu, Tianrui; 劉天瑞

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5699957

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Video parsing and camera pose estimation for 2D to 3D video conversion

Title	Video parsing and camera pose estimation for 2D to 3D video conversion
Authors	Liu, Tianrui 劉天瑞
Issue Date	2015
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu, T. [劉天瑞]. (2015). Video parsing and camera pose estimation for 2D to 3D video conversion. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5699957
Abstract	The increasing demand for 3D video contents allures the conversion of a large amount of 2D videos into 3D formats. As the contents of videos vary substantially, the performances of a fully automatic conversion technique are usually limited. It is therefore important to develop efficient semi-automatic techniques to ensure good conversion qualities. The purpose of this thesis is to build a video analysis system which is suitable to be adopted in prior to the 2D to 3D conversion processes. The system aims to automatically summarize the videos in order to relief the manual cost during the 2D to 3D conversion processes, and possibly to facilitate the depth assignment. Firstly, a shot boundary detection method is proposed for the video analysis system to parse a video into basic unit of shot. Based on a novel structure-aware histogram scheme and an adaptive double-threshold scheme, the proposed algorithm achieves improvement upon the conventional methods. The structure-aware scheme effectively integrates the structural similarity measure and local color histogram and hence significantly reduces the false alarms due to motions disturbances. The adaptive double-threshold scheme makes the algorithm effective in detecting mixing types of shot boundaries. Once a video has been detached into shots, keyframes of the shots are further summarized by gathering together those with similar contents. By modeling the keyframes as an undirected graph, the normalized cuts algorithm is employed to recursively partition the graph into clusters. Secondly, camera motion estimation is performed to examine the motion modality of the camera capturing this video shot. As the SfM method for 3D reconstruction is generally restricted to be applied to videos containing translational camera motions, this part of work contributes to the automatically identification of the videos falling in the regime of the SfM method. The camera estimation algorithm utilizes matched features and epipolar geometry constraints to incrementally compute the camera parameters for different views. Based on the camera estimation results, we proposed a method to further explore the distinguishable properties of the sequences taken by translational moving camera. Consequently, the motion modality of the camera can be identified to ensure that the video shots are suitable for the SfM method. Last but not the least, a semantic scene analysis approach which can simultaneously segment and recognize the objects contained in a scene is proposed. The proposed method contains a two-layer random forests (RF) framework. In the first layer, RF effectively labels the image by assigning object classes to superpixels. The structured RF in the second layer predicts local labels together with reliability scores to be aggregated with the initial labeling results. The proposed method achieves higher accuracy because some of the inaccuracy segmentations and implausible labeling problems have been remedied in the second layer. The semantic analysis method can be used to differentiate the immotile background regions and the motile moving objects to assist depth propagation from keyframes. In this way, the semantic scene analysis approach can facilitate the depth propagation from keyframes obtained say from a user interface.
Degree	Master of Philosophy
Subject	Image processing - Digital techniques 3-D video (Three-dimensional imaging)
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/223051
HKU Library Item ID	b5699957

DC Field	Value	Language
dc.contributor.author	Liu, Tianrui	-
dc.contributor.author	劉天瑞	-
dc.date.accessioned	2016-02-17T23:14:40Z	-
dc.date.available	2016-02-17T23:14:40Z	-
dc.date.issued	2015	-
dc.identifier.citation	Liu, T. [劉天瑞]. (2015). Video parsing and camera pose estimation for 2D to 3D video conversion. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5699957	-
dc.identifier.uri	http://hdl.handle.net/10722/223051	-
dc.description.abstract	The increasing demand for 3D video contents allures the conversion of a large amount of 2D videos into 3D formats. As the contents of videos vary substantially, the performances of a fully automatic conversion technique are usually limited. It is therefore important to develop efficient semi-automatic techniques to ensure good conversion qualities. The purpose of this thesis is to build a video analysis system which is suitable to be adopted in prior to the 2D to 3D conversion processes. The system aims to automatically summarize the videos in order to relief the manual cost during the 2D to 3D conversion processes, and possibly to facilitate the depth assignment. Firstly, a shot boundary detection method is proposed for the video analysis system to parse a video into basic unit of shot. Based on a novel structure-aware histogram scheme and an adaptive double-threshold scheme, the proposed algorithm achieves improvement upon the conventional methods. The structure-aware scheme effectively integrates the structural similarity measure and local color histogram and hence significantly reduces the false alarms due to motions disturbances. The adaptive double-threshold scheme makes the algorithm effective in detecting mixing types of shot boundaries. Once a video has been detached into shots, keyframes of the shots are further summarized by gathering together those with similar contents. By modeling the keyframes as an undirected graph, the normalized cuts algorithm is employed to recursively partition the graph into clusters. Secondly, camera motion estimation is performed to examine the motion modality of the camera capturing this video shot. As the SfM method for 3D reconstruction is generally restricted to be applied to videos containing translational camera motions, this part of work contributes to the automatically identification of the videos falling in the regime of the SfM method. The camera estimation algorithm utilizes matched features and epipolar geometry constraints to incrementally compute the camera parameters for different views. Based on the camera estimation results, we proposed a method to further explore the distinguishable properties of the sequences taken by translational moving camera. Consequently, the motion modality of the camera can be identified to ensure that the video shots are suitable for the SfM method. Last but not the least, a semantic scene analysis approach which can simultaneously segment and recognize the objects contained in a scene is proposed. The proposed method contains a two-layer random forests (RF) framework. In the first layer, RF effectively labels the image by assigning object classes to superpixels. The structured RF in the second layer predicts local labels together with reliability scores to be aggregated with the initial labeling results. The proposed method achieves higher accuracy because some of the inaccuracy segmentations and implausible labeling problems have been remedied in the second layer. The semantic analysis method can be used to differentiate the immotile background regions and the motile moving objects to assist depth propagation from keyframes. In this way, the semantic scene analysis approach can facilitate the depth propagation from keyframes obtained say from a user interface.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.subject.lcsh	Image processing - Digital techniques	-
dc.subject.lcsh	3-D video (Three-dimensional imaging)	-
dc.title	Video parsing and camera pose estimation for 2D to 3D video conversion	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5699957	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5699957	-
dc.identifier.mmsid	991018969399703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Video parsing and camera pose estimation for 2D to 3D video conversion

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats