File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Input validation for semi-supervised clustering

TitleInput validation for semi-supervised clustering
Authors
Issue Date2006
Citation
Proceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483 How to Cite?
AbstractSemi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/93088
ISSN
2020 SCImago Journal Rankings: 0.545
References

 

DC FieldValueLanguage
dc.contributor.authorYip, KYen_HK
dc.contributor.authorNg, MKen_HK
dc.contributor.authorCheung, DWen_HK
dc.date.accessioned2010-09-25T14:50:32Z-
dc.date.available2010-09-25T14:50:32Z-
dc.date.issued2006en_HK
dc.identifier.citationProceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483en_HK
dc.identifier.issn1550-4786en_HK
dc.identifier.urihttp://hdl.handle.net/10722/93088-
dc.description.abstractSemi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE.en_HK
dc.languageengen_HK
dc.relation.ispartofProceedings - IEEE International Conference on Data Mining, ICDMen_HK
dc.titleInput validation for semi-supervised clusteringen_HK
dc.typeConference_Paperen_HK
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_HK
dc.identifier.authorityCheung, DW=rp00101en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/ICDMW.2006.101-
dc.identifier.scopuseid_2-s2.0-78449278494en_HK
dc.identifier.hkuros135467en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-78449278494&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.spage479en_HK
dc.identifier.epage483en_HK
dc.identifier.scopusauthoridYip, KY=34574226200en_HK
dc.identifier.scopusauthoridNg, MK=34571761900en_HK
dc.identifier.scopusauthoridCheung, DW=34567902600en_HK
dc.identifier.issnl1550-4786-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats