Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.

Cheung, J; Estivill, X; Khaja, R; MacDonald, JR; Lau, K; Tsui, LC; Scherer, SW

File Download

gb-2003-4-4-r25.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/gb-2003-4-4-r25
Scopus: eid_2-s2.0-0037837485
PMID: 12702206
WOS: WOS:000182696200007
Find via

Supplementary

Bookmarks:
- CiteULike: 3
Citations:
- Scopus: 194
- Web of Science: 0
- PubMed Central: 94
Appears in Collections:
- President's Office: Journal/Magazine Articles

Article: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.

Title	Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.
Authors	Cheung, J Estivill, X Khaja, R MacDonald, JR Lau, K Tsui, LC Scherer, SW
Issue Date	2003
Publisher	BioMed Central Ltd.
Citation	Genome Biology, 2003, v. 4 n. 4, p. R25 How to Cite? DOI: http://dx.doi.org/10.1186/gb-2003-4-4-r25
Abstract	BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.
Persistent Identifier	http://hdl.handle.net/10722/43556
ISSN	1465-6914
PubMed Central ID	PMC154576
ISI Accession Number ID	WOS:000182696200007

DC Field	Value	Language
dc.contributor.author	Cheung, J	en_HK
dc.contributor.author	Estivill, X	en_HK
dc.contributor.author	Khaja, R	en_HK
dc.contributor.author	MacDonald, JR	en_HK
dc.contributor.author	Lau, K	en_HK
dc.contributor.author	Tsui, LC	en_HK
dc.contributor.author	Scherer, SW	en_HK
dc.date.accessioned	2007-03-23T04:48:55Z	-
dc.date.available	2007-03-23T04:48:55Z	-
dc.date.issued	2003	en_HK
dc.identifier.citation	Genome Biology, 2003, v. 4 n. 4, p. R25	en_HK
dc.identifier.issn	1465-6914	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/43556	-
dc.description.abstract	BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.	en_HK
dc.format.extent	856152 bytes	-
dc.format.extent	25088 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/msword	-
dc.language	eng	en_HK
dc.publisher	BioMed Central Ltd.	en_HK
dc.relation.ispartof	Genome biology	en_HK
dc.subject.mesh	Artifacts	en_HK
dc.subject.mesh	Chromosomes, human	en_HK
dc.subject.mesh	Computational biology	en_HK
dc.subject.mesh	Gene duplication	en_HK
dc.subject.mesh	Genetic diseases, inborn - genetics	en_HK
dc.title	Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1465-6906&volume=4&issue=4&spage=R25:1&epage=10&date=2003&atitle=Genome-wide+detection+of+segmental+duplications+and+potential+assembly+errors+in+the+human+genome+sequence	en_HK
dc.identifier.email	Tsui, LC: tsuilc@hkucc.hku.hk	en_HK
dc.identifier.authority	Tsui, LC=rp00058	en_HK
dc.description.nature	published_or_final_version	en_HK
dc.identifier.doi	10.1186/gb-2003-4-4-r25	en_HK
dc.identifier.pmid	12702206	-
dc.identifier.pmcid	PMC154576	-
dc.identifier.scopus	eid_2-s2.0-0037837485	en_HK
dc.identifier.volume	4	en_HK
dc.identifier.issue	4	en_HK
dc.identifier.spage	R25	en_HK
dc.identifier.epage	R25	en_HK
dc.identifier.isi	WOS:000182696200007	-
dc.identifier.scopusauthorid	Cheung, J=7202072292	en_HK
dc.identifier.scopusauthorid	Estivill, X=36047834200	en_HK
dc.identifier.scopusauthorid	Khaja, R=7801610375	en_HK
dc.identifier.scopusauthorid	MacDonald, JR=7401439417	en_HK
dc.identifier.scopusauthorid	Lau, K=36722697000	en_HK
dc.identifier.scopusauthorid	Tsui, LC=7102754167	en_HK
dc.identifier.scopusauthorid	Scherer, SW=35374654500	en_HK
dc.identifier.citeulike	838938	-
dc.identifier.issnl	1465-6906	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence.

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats