File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Similarity search in sets and categorical data using the signature tree
Title | Similarity search in sets and categorical data using the signature tree |
---|---|
Authors | |
Issue Date | 2003 |
Citation | Proceedings - International Conference On Data Engineering, 2003, p. 75-86 How to Cite? |
Abstract | Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries. |
Persistent Identifier | http://hdl.handle.net/10722/151844 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Mamoulis, N | en_US |
dc.contributor.author | Cheung, DW | en_US |
dc.contributor.author | Lian, W | en_US |
dc.date.accessioned | 2012-06-26T06:30:01Z | - |
dc.date.available | 2012-06-26T06:30:01Z | - |
dc.date.issued | 2003 | en_US |
dc.identifier.citation | Proceedings - International Conference On Data Engineering, 2003, p. 75-86 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/151844 | - |
dc.description.abstract | Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | Proceedings - International Conference on Data Engineering | en_US |
dc.title | Similarity search in sets and categorical data using the signature tree | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Mamoulis, N:nikos@cs.hku.hk | en_US |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_US |
dc.identifier.authority | Mamoulis, N=rp00155 | en_US |
dc.identifier.authority | Cheung, DW=rp00101 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.scopus | eid_2-s2.0-0344496711 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-0344496711&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.spage | 75 | en_US |
dc.identifier.epage | 86 | en_US |
dc.identifier.scopusauthorid | Mamoulis, N=6701782749 | en_US |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_US |
dc.identifier.scopusauthorid | Lian, W=22433603900 | en_US |