File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Video parsing and camera pose estimation for 2D to 3D video conversion

TitleVideo parsing and camera pose estimation for 2D to 3D video conversion
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, T. [劉天瑞]. (2015). Video parsing and camera pose estimation for 2D to 3D video conversion. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5699957
AbstractThe increasing demand for 3D video contents allures the conversion of a large amount of 2D videos into 3D formats. As the contents of videos vary substantially, the performances of a fully automatic conversion technique are usually limited. It is therefore important to develop efficient semi-automatic techniques to ensure good conversion qualities. The purpose of this thesis is to build a video analysis system which is suitable to be adopted in prior to the 2D to 3D conversion processes. The system aims to automatically summarize the videos in order to relief the manual cost during the 2D to 3D conversion processes, and possibly to facilitate the depth assignment. Firstly, a shot boundary detection method is proposed for the video analysis system to parse a video into basic unit of shot. Based on a novel structure-aware histogram scheme and an adaptive double-threshold scheme, the proposed algorithm achieves improvement upon the conventional methods. The structure-aware scheme effectively integrates the structural similarity measure and local color histogram and hence significantly reduces the false alarms due to motions disturbances. The adaptive double-threshold scheme makes the algorithm effective in detecting mixing types of shot boundaries. Once a video has been detached into shots, keyframes of the shots are further summarized by gathering together those with similar contents. By modeling the keyframes as an undirected graph, the normalized cuts algorithm is employed to recursively partition the graph into clusters. Secondly, camera motion estimation is performed to examine the motion modality of the camera capturing this video shot. As the SfM method for 3D reconstruction is generally restricted to be applied to videos containing translational camera motions, this part of work contributes to the automatically identification of the videos falling in the regime of the SfM method. The camera estimation algorithm utilizes matched features and epipolar geometry constraints to incrementally compute the camera parameters for different views. Based on the camera estimation results, we proposed a method to further explore the distinguishable properties of the sequences taken by translational moving camera. Consequently, the motion modality of the camera can be identified to ensure that the video shots are suitable for the SfM method. Last but not the least, a semantic scene analysis approach which can simultaneously segment and recognize the objects contained in a scene is proposed. The proposed method contains a two-layer random forests (RF) framework. In the first layer, RF effectively labels the image by assigning object classes to superpixels. The structured RF in the second layer predicts local labels together with reliability scores to be aggregated with the initial labeling results. The proposed method achieves higher accuracy because some of the inaccuracy segmentations and implausible labeling problems have been remedied in the second layer. The semantic analysis method can be used to differentiate the immotile background regions and the motile moving objects to assist depth propagation from keyframes. In this way, the semantic scene analysis approach can facilitate the depth propagation from keyframes obtained say from a user interface.
DegreeMaster of Philosophy
SubjectImage processing - Digital techniques
3-D video (Three-dimensional imaging)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/223051
HKU Library Item IDb5699957

 

DC FieldValueLanguage
dc.contributor.authorLiu, Tianrui-
dc.contributor.author劉天瑞-
dc.date.accessioned2016-02-17T23:14:40Z-
dc.date.available2016-02-17T23:14:40Z-
dc.date.issued2015-
dc.identifier.citationLiu, T. [劉天瑞]. (2015). Video parsing and camera pose estimation for 2D to 3D video conversion. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5699957-
dc.identifier.urihttp://hdl.handle.net/10722/223051-
dc.description.abstractThe increasing demand for 3D video contents allures the conversion of a large amount of 2D videos into 3D formats. As the contents of videos vary substantially, the performances of a fully automatic conversion technique are usually limited. It is therefore important to develop efficient semi-automatic techniques to ensure good conversion qualities. The purpose of this thesis is to build a video analysis system which is suitable to be adopted in prior to the 2D to 3D conversion processes. The system aims to automatically summarize the videos in order to relief the manual cost during the 2D to 3D conversion processes, and possibly to facilitate the depth assignment. Firstly, a shot boundary detection method is proposed for the video analysis system to parse a video into basic unit of shot. Based on a novel structure-aware histogram scheme and an adaptive double-threshold scheme, the proposed algorithm achieves improvement upon the conventional methods. The structure-aware scheme effectively integrates the structural similarity measure and local color histogram and hence significantly reduces the false alarms due to motions disturbances. The adaptive double-threshold scheme makes the algorithm effective in detecting mixing types of shot boundaries. Once a video has been detached into shots, keyframes of the shots are further summarized by gathering together those with similar contents. By modeling the keyframes as an undirected graph, the normalized cuts algorithm is employed to recursively partition the graph into clusters. Secondly, camera motion estimation is performed to examine the motion modality of the camera capturing this video shot. As the SfM method for 3D reconstruction is generally restricted to be applied to videos containing translational camera motions, this part of work contributes to the automatically identification of the videos falling in the regime of the SfM method. The camera estimation algorithm utilizes matched features and epipolar geometry constraints to incrementally compute the camera parameters for different views. Based on the camera estimation results, we proposed a method to further explore the distinguishable properties of the sequences taken by translational moving camera. Consequently, the motion modality of the camera can be identified to ensure that the video shots are suitable for the SfM method. Last but not the least, a semantic scene analysis approach which can simultaneously segment and recognize the objects contained in a scene is proposed. The proposed method contains a two-layer random forests (RF) framework. In the first layer, RF effectively labels the image by assigning object classes to superpixels. The structured RF in the second layer predicts local labels together with reliability scores to be aggregated with the initial labeling results. The proposed method achieves higher accuracy because some of the inaccuracy segmentations and implausible labeling problems have been remedied in the second layer. The semantic analysis method can be used to differentiate the immotile background regions and the motile moving objects to assist depth propagation from keyframes. In this way, the semantic scene analysis approach can facilitate the depth propagation from keyframes obtained say from a user interface.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshImage processing - Digital techniques-
dc.subject.lcsh3-D video (Three-dimensional imaging)-
dc.titleVideo parsing and camera pose estimation for 2D to 3D video conversion-
dc.typePG_Thesis-
dc.identifier.hkulb5699957-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5699957-
dc.identifier.mmsid991018969399703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats