File Download
Supplementary

postgraduate thesis: START : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data

TitleSTART : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhu, X. [朱信杰]. (2015). START : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5481898
AbstractSignal Track Analytical Research Tool (START), is a parallel system for analyzing large-scale genomic data. Currently, genomic data analyses are usually performed by using custom scripts developed by individual research groups, and/or by the integrated use of multiple existing tools (such as BEDTools and Galaxy). The goals of START are 1) to provide a single tool that supports a wide spectrum of genomic data analyses that are commonly done by analysts; and 2) to greatly simplify these analysis tasks by means of a simple declarative language (STQL) with which users only need to specify what they want to do, rather than the detailed computational steps as to how the analysis task should be performed. START consists of four major components: 1) A declarative language called Signal Track Query Language (STQL), which is a SQL-like language we specifically designed to suit the needs for analyzing genomic signal tracks. 2) A STQL processing system built on top of a large-scale distributed architecture. The system is based on the Hadoop distributed storage and the MapReduce Big Data processing framework. It processes each user query using multiple machines in parallel. 3) A simple and user-friendly web site that helps users construct and execute queries, upload/download compressed data files in various formats, man-age stored data, queries and analysis results, and share queries with other users. It also provides a complete help system, detailed specification of STQL, and a large number of sample queries for users to learn STQL and try START easily. Private files and queries are not accessible by other users. 4) A repository of public data popularly used for large-scale genomic data analysis, including data from ENCODE and Roadmap Epigenomics, that users can use in their analyses.
DegreeDoctor of Philosophy
SubjectGenomics - Data processing
Parallel programming (Computer science)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/211136

 

DC FieldValueLanguage
dc.contributor.authorZhu, Xinjie-
dc.contributor.author朱信杰-
dc.date.accessioned2015-07-07T23:10:45Z-
dc.date.available2015-07-07T23:10:45Z-
dc.date.issued2015-
dc.identifier.citationZhu, X. [朱信杰]. (2015). START : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5481898-
dc.identifier.urihttp://hdl.handle.net/10722/211136-
dc.description.abstractSignal Track Analytical Research Tool (START), is a parallel system for analyzing large-scale genomic data. Currently, genomic data analyses are usually performed by using custom scripts developed by individual research groups, and/or by the integrated use of multiple existing tools (such as BEDTools and Galaxy). The goals of START are 1) to provide a single tool that supports a wide spectrum of genomic data analyses that are commonly done by analysts; and 2) to greatly simplify these analysis tasks by means of a simple declarative language (STQL) with which users only need to specify what they want to do, rather than the detailed computational steps as to how the analysis task should be performed. START consists of four major components: 1) A declarative language called Signal Track Query Language (STQL), which is a SQL-like language we specifically designed to suit the needs for analyzing genomic signal tracks. 2) A STQL processing system built on top of a large-scale distributed architecture. The system is based on the Hadoop distributed storage and the MapReduce Big Data processing framework. It processes each user query using multiple machines in parallel. 3) A simple and user-friendly web site that helps users construct and execute queries, upload/download compressed data files in various formats, man-age stored data, queries and analysis results, and share queries with other users. It also provides a complete help system, detailed specification of STQL, and a large number of sample queries for users to learn STQL and try START easily. Private files and queries are not accessible by other users. 4) A repository of public data popularly used for large-scale genomic data analysis, including data from ENCODE and Roadmap Epigenomics, that users can use in their analyses.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshGenomics - Data processing-
dc.subject.lcshParallel programming (Computer science)-
dc.titleSTART : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data-
dc.typePG_Thesis-
dc.identifier.hkulb5481898-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats