File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Chinese Document Classification with Bi-directional Convolutional Language Model

TitleChinese Document Classification with Bi-directional Convolutional Language Model
Authors
Issue Date2020
PublisherAssociation for Computing Machinery.
Citation
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, Virtual Event, Xi'an, China, 25-30 July 2020, p. 1785-1788 How to Cite?
AbstractBy setting a typeface, each character of the Chinese text can be converted to a glyph pixel matrix. We propose to conduct text classification with such glyph features using bi-directional convolution. Although the pixel embedding can be applied to all languages, it is much more convenient to be used to represent Chinese scripts due to the square shape of Chinese characters. We extract both the forward and backward n-gram features of the text via bi-directional convolutional operations and then concatenate them. A subsequent 1-dimensional max-over-time pooling is applied to the bi-directional feature maps, and then three fully connected layers are used for conducting text classification. The proposed model has a light-weight architecture that only contains a single-layer convolutional neural network. Experiments on several Chinese text classification datasets demonstrate surprisingly excellent results for the training speed and superior performance of the proposed model in comparison with traditional methods.
Persistent Identifierhttp://hdl.handle.net/10722/294826
ISBN
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLiu, B-
dc.contributor.authorYin, G-
dc.date.accessioned2020-12-21T11:49:07Z-
dc.date.available2020-12-21T11:49:07Z-
dc.date.issued2020-
dc.identifier.citationProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, Virtual Event, Xi'an, China, 25-30 July 2020, p. 1785-1788-
dc.identifier.isbn9781450380164-
dc.identifier.urihttp://hdl.handle.net/10722/294826-
dc.description.abstractBy setting a typeface, each character of the Chinese text can be converted to a glyph pixel matrix. We propose to conduct text classification with such glyph features using bi-directional convolution. Although the pixel embedding can be applied to all languages, it is much more convenient to be used to represent Chinese scripts due to the square shape of Chinese characters. We extract both the forward and backward n-gram features of the text via bi-directional convolutional operations and then concatenate them. A subsequent 1-dimensional max-over-time pooling is applied to the bi-directional feature maps, and then three fully connected layers are used for conducting text classification. The proposed model has a light-weight architecture that only contains a single-layer convolutional neural network. Experiments on several Chinese text classification datasets demonstrate surprisingly excellent results for the training speed and superior performance of the proposed model in comparison with traditional methods.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery.-
dc.relation.ispartofProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval-
dc.rightsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Copyright © Association for Computing Machinery.-
dc.titleChinese Document Classification with Bi-directional Convolutional Language Model-
dc.typeConference_Paper-
dc.identifier.emailYin, G: gyin@hku.hk-
dc.identifier.authorityYin, G=rp00831-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3397271.3401248-
dc.identifier.scopuseid_2-s2.0-85090150507-
dc.identifier.hkuros320600-
dc.identifier.spage1785-
dc.identifier.epage1788-
dc.identifier.isiWOS:000722377700227-
dc.publisher.placeNew York, NY-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats