File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: A comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorization

TitleA comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorization
Authors
Issue Date2002
Citation
Proceedings - International Conference On Pattern Recognition, 2002, v. 16 n. 4, p. 235-238 How to Cite?
AbstractAssociating documents to relevant categories is critical for effective document retrieval. Here, we compare the well-known k-Nearest Neighborhood (kNN) algorithm, the centroid-based classifier and the Highest Average Similarity over Retrieved Documents (HASRD) algorithm, for effective document categorization. We use various measures such as the micro and macro F1 values to evaluate their performance on the Reuters-21578 corpus. The empirical results show that kNN performs the best, followed by our adapted HASRD and the centroid-based classifier for common document categories, while the centroid-based classifier and kNN outperform our adapted HASRD for rare document categories. Additionally, our study clearly indicates that each classifier performs optimally only when a suitable term weighting scheme is used. All these significant results lead to many exciting directions for future exploration. © 2002 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/158425
ISSN
2023 SCImago Journal Rankings: 0.584
References

 

DC FieldValueLanguage
dc.contributor.authorTam, Ven_US
dc.contributor.authorSantoso, Aen_US
dc.contributor.authorSetiono, Ren_US
dc.date.accessioned2012-08-08T08:59:33Z-
dc.date.available2012-08-08T08:59:33Z-
dc.date.issued2002en_US
dc.identifier.citationProceedings - International Conference On Pattern Recognition, 2002, v. 16 n. 4, p. 235-238en_US
dc.identifier.issn1051-4651en_US
dc.identifier.urihttp://hdl.handle.net/10722/158425-
dc.description.abstractAssociating documents to relevant categories is critical for effective document retrieval. Here, we compare the well-known k-Nearest Neighborhood (kNN) algorithm, the centroid-based classifier and the Highest Average Similarity over Retrieved Documents (HASRD) algorithm, for effective document categorization. We use various measures such as the micro and macro F1 values to evaluate their performance on the Reuters-21578 corpus. The empirical results show that kNN performs the best, followed by our adapted HASRD and the centroid-based classifier for common document categories, while the centroid-based classifier and kNN outperform our adapted HASRD for rare document categories. Additionally, our study clearly indicates that each classifier performs optimally only when a suitable term weighting scheme is used. All these significant results lead to many exciting directions for future exploration. © 2002 IEEE.en_US
dc.languageengen_US
dc.relation.ispartofProceedings - International Conference on Pattern Recognitionen_US
dc.titleA comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorizationen_US
dc.typeConference_Paperen_US
dc.identifier.emailTam, V:vtam@eee.hku.hken_US
dc.identifier.authorityTam, V=rp00173en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.scopuseid_2-s2.0-29144522357en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-29144522357&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume16en_US
dc.identifier.issue4en_US
dc.identifier.spage235en_US
dc.identifier.epage238en_US
dc.publisher.placeUnited Statesen_US
dc.identifier.scopusauthoridTam, V=7005091988en_US
dc.identifier.scopusauthoridSantoso, A=6601931777en_US
dc.identifier.scopusauthoridSetiono, R=7005033162en_US
dc.identifier.issnl1051-4651-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats