File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Similarity search in sets and categorical data using the signature tree

TitleSimilarity search in sets and categorical data using the signature tree
Authors
Issue Date2003
Citation
Proceedings - International Conference On Data Engineering, 2003, p. 75-86 How to Cite?
AbstractData mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.
Persistent Identifierhttp://hdl.handle.net/10722/151844
References

 

DC FieldValueLanguage
dc.contributor.authorMamoulis, Nen_US
dc.contributor.authorCheung, DWen_US
dc.contributor.authorLian, Wen_US
dc.date.accessioned2012-06-26T06:30:01Z-
dc.date.available2012-06-26T06:30:01Z-
dc.date.issued2003en_US
dc.identifier.citationProceedings - International Conference On Data Engineering, 2003, p. 75-86en_US
dc.identifier.urihttp://hdl.handle.net/10722/151844-
dc.description.abstractData mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.en_US
dc.languageengen_US
dc.relation.ispartofProceedings - International Conference on Data Engineeringen_US
dc.titleSimilarity search in sets and categorical data using the signature treeen_US
dc.typeConference_Paperen_US
dc.identifier.emailMamoulis, N:nikos@cs.hku.hken_US
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_US
dc.identifier.authorityMamoulis, N=rp00155en_US
dc.identifier.authorityCheung, DW=rp00101en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.scopuseid_2-s2.0-0344496711en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-0344496711&selection=ref&src=s&origin=recordpageen_US
dc.identifier.spage75en_US
dc.identifier.epage86en_US
dc.identifier.scopusauthoridMamoulis, N=6701782749en_US
dc.identifier.scopusauthoridCheung, DW=34567902600en_US
dc.identifier.scopusauthoridLian, W=22433603900en_US

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats