File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Discovering minimal infrequent structures from XML documents
Title | Discovering minimal infrequent structures from XML documents |
---|---|
Authors | |
Issue Date | 2004 |
Publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ |
Citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2004, v. 3306, p. 291-302 How to Cite? |
Abstract | More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree structures from XML documents. However, none of them was designed for mining infrequent structures which are also important in many applications, such as query processing and identification of exceptional cases. In this paper, we consider the problem of identifying infrequent tree structures from XML documents. Intuitively, if a tree structure is infrequent, all tree structures that contain this subtree is also infrequent. So, we propose to consider the minimal infrequent structure (MIS), which is an infrequent structure while all proper subtrees of it are frequent. We also derive a level-wise mining algorithm that makes use of the SG-tree (signature tree) and some effective pruning techniques to efficiently discover all MIS. We validate the efficiency and feasibility of our methods through experiments on both synthetic and real data. © Springer-Verlag 2004. |
Persistent Identifier | http://hdl.handle.net/10722/93190 |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lian, W | en_HK |
dc.contributor.author | Mamoulis, N | en_HK |
dc.contributor.author | Cheung, DW | en_HK |
dc.contributor.author | Yiu, SM | en_HK |
dc.date.accessioned | 2010-09-25T14:53:37Z | - |
dc.date.available | 2010-09-25T14:53:37Z | - |
dc.date.issued | 2004 | en_HK |
dc.identifier.citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2004, v. 3306, p. 291-302 | en_HK |
dc.identifier.issn | 0302-9743 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/93190 | - |
dc.description.abstract | More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree structures from XML documents. However, none of them was designed for mining infrequent structures which are also important in many applications, such as query processing and identification of exceptional cases. In this paper, we consider the problem of identifying infrequent tree structures from XML documents. Intuitively, if a tree structure is infrequent, all tree structures that contain this subtree is also infrequent. So, we propose to consider the minimal infrequent structure (MIS), which is an infrequent structure while all proper subtrees of it are frequent. We also derive a level-wise mining algorithm that makes use of the SG-tree (signature tree) and some effective pruning techniques to efficiently discover all MIS. We validate the efficiency and feasibility of our methods through experiments on both synthetic and real data. © Springer-Verlag 2004. | en_HK |
dc.language | eng | en_HK |
dc.publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ | en_HK |
dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | en_HK |
dc.title | Discovering minimal infrequent structures from XML documents | en_HK |
dc.type | Article | en_HK |
dc.identifier.email | Mamoulis, N:nikos@cs.hku.hk | en_HK |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_HK |
dc.identifier.email | Yiu, SM:smyiu@cs.hku.hk | en_HK |
dc.identifier.authority | Mamoulis, N=rp00155 | en_HK |
dc.identifier.authority | Cheung, DW=rp00101 | en_HK |
dc.identifier.authority | Yiu, SM=rp00207 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-35048823909 | en_HK |
dc.identifier.hkuros | 103245 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-35048823909&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 3306 | en_HK |
dc.identifier.spage | 291 | en_HK |
dc.identifier.epage | 302 | en_HK |
dc.publisher.place | Germany | en_HK |
dc.identifier.scopusauthorid | Lian, W=22433603900 | en_HK |
dc.identifier.scopusauthorid | Mamoulis, N=6701782749 | en_HK |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_HK |
dc.identifier.scopusauthorid | Yiu, SM=7003282240 | en_HK |
dc.identifier.issnl | 0302-9743 | - |