Discovering minimal infrequent structures from XML documents

Lian, W; Mamoulis, N; Cheung, DW; Yiu, SM

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-35048823909
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Discovering minimal infrequent structures from XML documents

Title	Discovering minimal infrequent structures from XML documents
Authors	Lian, W Mamoulis, N Cheung, DW Yiu, SM
Issue Date	2004
Publisher	Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/
Citation	Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2004, v. 3306, p. 291-302 How to Cite?
Abstract	More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree structures from XML documents. However, none of them was designed for mining infrequent structures which are also important in many applications, such as query processing and identification of exceptional cases. In this paper, we consider the problem of identifying infrequent tree structures from XML documents. Intuitively, if a tree structure is infrequent, all tree structures that contain this subtree is also infrequent. So, we propose to consider the minimal infrequent structure (MIS), which is an infrequent structure while all proper subtrees of it are frequent. We also derive a level-wise mining algorithm that makes use of the SG-tree (signature tree) and some effective pruning techniques to efficiently discover all MIS. We validate the efficiency and feasibility of our methods through experiments on both synthetic and real data. © Springer-Verlag 2004.
Persistent Identifier	http://hdl.handle.net/10722/93190
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Lian, W	en_HK
dc.contributor.author	Mamoulis, N	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.contributor.author	Yiu, SM	en_HK
dc.date.accessioned	2010-09-25T14:53:37Z	-
dc.date.available	2010-09-25T14:53:37Z	-
dc.date.issued	2004	en_HK
dc.identifier.citation	Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2004, v. 3306, p. 291-302	en_HK
dc.identifier.issn	0302-9743	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/93190	-
dc.description.abstract	More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree structures from XML documents. However, none of them was designed for mining infrequent structures which are also important in many applications, such as query processing and identification of exceptional cases. In this paper, we consider the problem of identifying infrequent tree structures from XML documents. Intuitively, if a tree structure is infrequent, all tree structures that contain this subtree is also infrequent. So, we propose to consider the minimal infrequent structure (MIS), which is an infrequent structure while all proper subtrees of it are frequent. We also derive a level-wise mining algorithm that makes use of the SG-tree (signature tree) and some effective pruning techniques to efficiently discover all MIS. We validate the efficiency and feasibility of our methods through experiments on both synthetic and real data. © Springer-Verlag 2004.	en_HK
dc.language	eng	en_HK
dc.publisher	Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/	en_HK
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_HK
dc.title	Discovering minimal infrequent structures from XML documents	en_HK
dc.type	Article	en_HK
dc.identifier.email	Mamoulis, N:nikos@cs.hku.hk	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	en_HK
dc.identifier.authority	Mamoulis, N=rp00155	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-35048823909	en_HK
dc.identifier.hkuros	103245	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-35048823909&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	3306	en_HK
dc.identifier.spage	291	en_HK
dc.identifier.epage	302	en_HK
dc.publisher.place	Germany	en_HK
dc.identifier.scopusauthorid	Lian, W=22433603900	en_HK
dc.identifier.scopusauthorid	Mamoulis, N=6701782749	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_HK
dc.identifier.issnl	0302-9743	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Discovering minimal infrequent structures from XML documents

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats