File Download
There are no files associated with this item.
Supplementary

Citations:
 Scopus: 0
 Appears in Collections:
Article: Constructing compressed suffix arrays with large alphabets
Title  Constructing compressed suffix arrays with large alphabets 

Authors  
Issue Date  2003 
Publisher  Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ 
Citation  Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2003, v. 2906, p. 240249 How to Cite? 
Abstract  Recent research in compressing suffix arrays has resulted in two breakthrough indexing data structures, namely, compressed suffix arrays (CSA) [7] and FMindex [5]. Either of them makes it feasible to store a fulltext index in the main memory even for a piece of text data with a few billion characters (such as human DNA). However, constructing such indexing data structures with limited working memory (i.e., without constructing suffix arrays) is not a trivial task. This paper addresses this problem. Currently, only CSA admits a spaceefficient construction algorithm [15]. For a text T of length n over an alphabet Ó, this algorithm requires O(∑n log n) time and (2Ho + 1 + ε)n bits of working space, where Ho is the 0th order empirical entropy of T and ε is any nonzero constant. This algorithm is good enough when the alphabet size ∑ is small. It is not practical for text data containing protein, Chinese or Japanese, where the alphabet may include up to a few thousand characters. The main contribution of this paper is a new algorithm which can construct CSA in O(n log n) time using (Ho + 2 + ε)n bits of working space. Note that the running time of our algorithm is independent of the alphabet size and the space requirement is smaller as it is likely that Ho > 1. This paper also makes contribution to the spaceefficient construction of FMindex. We show that FMindex can indeed be constructed from CSA directly in O(n) time. © SpringerVerlag Berlin Heidelberg 2003. 
Persistent Identifier  http://hdl.handle.net/10722/93476 
ISSN  2013 SCImago Journal Rankings: 0.310 
References 
Author Affiliations
 Kyushu University
 The University of Hong Kong
 National University of Singapore
DC Field  Value  Language 

dc.contributor.author  Hon, WK  en_HK 
dc.contributor.author  Lam, TW  en_HK 
dc.contributor.author  Sadakane, K  en_HK 
dc.contributor.author  Sung, WK  en_HK 
dc.date.accessioned  20100925T15:02:19Z   
dc.date.available  20100925T15:02:19Z   
dc.date.issued  2003  en_HK 
dc.identifier.citation  Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2003, v. 2906, p. 240249  en_HK 
dc.identifier.issn  03029743  en_HK 
dc.identifier.uri  http://hdl.handle.net/10722/93476   
dc.description.abstract  Recent research in compressing suffix arrays has resulted in two breakthrough indexing data structures, namely, compressed suffix arrays (CSA) [7] and FMindex [5]. Either of them makes it feasible to store a fulltext index in the main memory even for a piece of text data with a few billion characters (such as human DNA). However, constructing such indexing data structures with limited working memory (i.e., without constructing suffix arrays) is not a trivial task. This paper addresses this problem. Currently, only CSA admits a spaceefficient construction algorithm [15]. For a text T of length n over an alphabet Ó, this algorithm requires O(∑n log n) time and (2Ho + 1 + ε)n bits of working space, where Ho is the 0th order empirical entropy of T and ε is any nonzero constant. This algorithm is good enough when the alphabet size ∑ is small. It is not practical for text data containing protein, Chinese or Japanese, where the alphabet may include up to a few thousand characters. The main contribution of this paper is a new algorithm which can construct CSA in O(n log n) time using (Ho + 2 + ε)n bits of working space. Note that the running time of our algorithm is independent of the alphabet size and the space requirement is smaller as it is likely that Ho > 1. This paper also makes contribution to the spaceefficient construction of FMindex. We show that FMindex can indeed be constructed from CSA directly in O(n) time. © SpringerVerlag Berlin Heidelberg 2003.  en_HK 
dc.language  eng  en_HK 
dc.publisher  Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/  en_HK 
dc.relation.ispartof  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)  en_HK 
dc.title  Constructing compressed suffix arrays with large alphabets  en_HK 
dc.type  Article  en_HK 
dc.identifier.email  Lam, TW:twlam@cs.hku.hk  en_HK 
dc.identifier.authority  Lam, TW=rp00135  en_HK 
dc.description.nature  link_to_subscribed_fulltext   
dc.identifier.scopus  eid_2s2.035248823623  en_HK 
dc.identifier.hkuros  91576  en_HK 
dc.relation.references  http://www.scopus.com/mlt/select.url?eid=2s2.035248823623&selection=ref&src=s&origin=recordpage  en_HK 
dc.identifier.volume  2906  en_HK 
dc.identifier.spage  240  en_HK 
dc.identifier.epage  249  en_HK 
dc.publisher.place  Germany  en_HK 
dc.identifier.scopusauthorid  Hon, WK=7004282818  en_HK 
dc.identifier.scopusauthorid  Lam, TW=7202523165  en_HK 
dc.identifier.scopusauthorid  Sadakane, K=7005716583  en_HK 
dc.identifier.scopusauthorid  Sung, WK=13310059700  en_HK 