File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/ICDMW.2006.101
- Scopus: eid_2-s2.0-78449278494
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Input validation for semi-supervised clustering
Title | Input validation for semi-supervised clustering |
---|---|
Authors | |
Issue Date | 2006 |
Citation | Proceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483 How to Cite? |
Abstract | Semi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE. |
Persistent Identifier | http://hdl.handle.net/10722/93088 |
ISSN | 2020 SCImago Journal Rankings: 0.545 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yip, KY | en_HK |
dc.contributor.author | Ng, MK | en_HK |
dc.contributor.author | Cheung, DW | en_HK |
dc.date.accessioned | 2010-09-25T14:50:32Z | - |
dc.date.available | 2010-09-25T14:50:32Z | - |
dc.date.issued | 2006 | en_HK |
dc.identifier.citation | Proceedings - IEEE International Conference on Data Mining, ICDM, 2006, p. 479-483 | en_HK |
dc.identifier.issn | 1550-4786 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/93088 | - |
dc.description.abstract | Semi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed. © 2006 IEEE. | en_HK |
dc.language | eng | en_HK |
dc.relation.ispartof | Proceedings - IEEE International Conference on Data Mining, ICDM | en_HK |
dc.title | Input validation for semi-supervised clustering | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_HK |
dc.identifier.authority | Cheung, DW=rp00101 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/ICDMW.2006.101 | - |
dc.identifier.scopus | eid_2-s2.0-78449278494 | en_HK |
dc.identifier.hkuros | 135467 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-78449278494&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.spage | 479 | en_HK |
dc.identifier.epage | 483 | en_HK |
dc.identifier.scopusauthorid | Yip, KY=34574226200 | en_HK |
dc.identifier.scopusauthorid | Ng, MK=34571761900 | en_HK |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_HK |
dc.identifier.issnl | 1550-4786 | - |