A space and time efficient algorithm for constructing compressed suffix arrays

Hon, WK; Lam, TW; Sadakane, K; Sung, WK; Yiu, SM

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s00453-006-1228-8
Scopus: eid_2-s2.0-34547375123
WOS: WOS:000246152800002
Find via

Supplementary

Bookmarks:
- CiteULike: 3
Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: A space and time efficient algorithm for constructing compressed suffix arrays

Title	A space and time efficient algorithm for constructing compressed suffix arrays
Authors	Hon, WK Lam, TW Sadakane, K Sung, WK Yiu, SM
Keywords	Compression Construcion Pattern mathching Text indexing
Issue Date	2007
Publisher	Springer New York LLC. The Journal's web site is located at http://link.springer.de/link/service/journals/00453/index.htm
Citation	Algorithmica (New York), 2007, v. 48 n. 1, p. 23-36 How to Cite? DOI: http://dx.doi.org/10.1007/s00453-006-1228-8
Abstract	With the first human DNA being decoded into a sequence of about 2.8 billion characters, much biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese or Japanese. Precisely, when the alphabet size is \|∑\|, the working space is O(n log \|∑\|) bits, and the time complexity remains O(n log n), which is independent of \|∑\|. © Springer 2007.
Persistent Identifier	http://hdl.handle.net/10722/88951
ISSN	0178-4617 2023 Impact Factor: 0.9 2023 SCImago Journal Rankings: 0.905
ISI Accession Number ID	WOS:000246152800002
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Hon, WK	en_HK
dc.contributor.author	Lam, TW	en_HK
dc.contributor.author	Sadakane, K	en_HK
dc.contributor.author	Sung, WK	en_HK
dc.contributor.author	Yiu, SM	en_HK
dc.date.accessioned	2010-09-06T09:50:31Z	-
dc.date.available	2010-09-06T09:50:31Z	-
dc.date.issued	2007	en_HK
dc.identifier.citation	Algorithmica (New York), 2007, v. 48 n. 1, p. 23-36	en_HK
dc.identifier.issn	0178-4617	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/88951	-
dc.description.abstract	With the first human DNA being decoded into a sequence of about 2.8 billion characters, much biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese or Japanese. Precisely, when the alphabet size is \|∑\|, the working space is O(n log \|∑\|) bits, and the time complexity remains O(n log n), which is independent of \|∑\|. © Springer 2007.	en_HK
dc.language	eng	en_HK
dc.publisher	Springer New York LLC. The Journal's web site is located at http://link.springer.de/link/service/journals/00453/index.htm	en_HK
dc.relation.ispartof	Algorithmica (New York)	en_HK
dc.subject	Compression	en_HK
dc.subject	Construcion	en_HK
dc.subject	Pattern mathching	en_HK
dc.subject	Text indexing	en_HK
dc.title	A space and time efficient algorithm for constructing compressed suffix arrays	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0178-4617&volume=48:1&spage=23&epage=36&date=2007&atitle=A+Space+and+Time+Efficient+Algorithm+for+Constructing+Compressed+Suffix+Arrays+	en_HK
dc.identifier.email	Lam, TW:twlam@cs.hku.hk	en_HK
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	en_HK
dc.identifier.authority	Lam, TW=rp00135	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/s00453-006-1228-8	en_HK
dc.identifier.scopus	eid_2-s2.0-34547375123	en_HK
dc.identifier.hkuros	128171	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-34547375123&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	48	en_HK
dc.identifier.issue	1	en_HK
dc.identifier.spage	23	en_HK
dc.identifier.epage	36	en_HK
dc.identifier.isi	WOS:000246152800002	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Hon, WK=7004282818	en_HK
dc.identifier.scopusauthorid	Lam, TW=7202523165	en_HK
dc.identifier.scopusauthorid	Sadakane, K=7005716583	en_HK
dc.identifier.scopusauthorid	Sung, WK=13310059700	en_HK
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_HK
dc.identifier.citeulike	4736346	-
dc.identifier.issnl	0178-4617	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A space and time efficient algorithm for constructing compressed suffix arrays

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats