Input validation for semi-supervised clustering

Yip, KY; Ng, MK; Cheung, DW

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICDMW.2006.101
Scopus: eid_2-s2.0-78449278494
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers
- Mathematics: Conference papers

Conference Paper: Input validation for semi-supervised clustering

Title	Input validation for semi-supervised clustering
Authors	Yip, KY Ng, MK Cheung, DW
Issue Date	2006
Citation	Proceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483 How to Cite? DOI: http://dx.doi.org/10.1109/ICDMW.2006.101
Abstract	Semi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/93088
ISSN	1550-4786 2020 SCImago Journal Rankings: 0.545
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Yip, KY	en_HK
dc.contributor.author	Ng, MK	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.date.accessioned	2010-09-25T14:50:32Z	-
dc.date.available	2010-09-25T14:50:32Z	-
dc.date.issued	2006	en_HK
dc.identifier.citation	Proceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483	en_HK
dc.identifier.issn	1550-4786	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/93088	-
dc.description.abstract	Semi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE.	en_HK
dc.language	eng	en_HK
dc.relation.ispartof	Proceedings - IEEE International Conference on Data Mining, ICDM	en_HK
dc.title	Input validation for semi-supervised clustering	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICDMW.2006.101	-
dc.identifier.scopus	eid_2-s2.0-78449278494	en_HK
dc.identifier.hkuros	135467	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-78449278494&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.spage	479	en_HK
dc.identifier.epage	483	en_HK
dc.identifier.scopusauthorid	Yip, KY=34574226200	en_HK
dc.identifier.scopusauthorid	Ng, MK=34571761900	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.issnl	1550-4786	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Input validation for semi-supervised clustering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats