A cross-lingual transfer learning method for online COVID-19-related hate speech detection

Liu, Lin; Xu, Duo; Zhao, Pengfei; Zeng, Daniel Dajun; Hu, Paul Jen Hwa; Zhang, Qingpeng; Luo, Yin; Cao, Zhidong

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.eswa.2023.121031
Scopus: eid_2-s2.0-85166967683
WOS: WOS:001059475500001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Journal/Magazine Articles

Article: A cross-lingual transfer learning method for online COVID-19-related hate speech detection

Title	A cross-lingual transfer learning method for online COVID-19-related hate speech detection
Authors	Liu, Lin Xu, Duo Zhao, Pengfei Zeng, Daniel Dajun Hu, Paul Jen Hwa Zhang, Qingpeng Luo, Yin Cao, Zhidong
Keywords	COVID-19 Cross-lingual Deep learning Hate speech detection Natural language processing
Issue Date	2023
Citation	Expert Systems with Applications, 2023, v. 234, article no. 121031 How to Cite? DOI: http://dx.doi.org/10.1016/j.eswa.2023.121031
Abstract	During the COVID-19 pandemic, online social media platforms such as Twitter facilitate the exchange of information among people. However, the prevalence of “infodemic” such as online hate speech has exacerbated social rifts, discrimination, prejudice and even hate crimes. Timely and effective detection of the hate speech will help create a healthy public opinion environment. Most of the current COVID-19-related hate speech research focuses on a single language, such as English. In this paper, we introduce a cross-lingual transfer learning method, aiming to contribute to hate speech detection in low-resource languages. We propose a deep learning based model to classify hate speech with a pre-trained language model for multilingual text embedding. Data augmentation and cross-lingual contrastive learning are then utilized to further improve the performance of cross-lingual knowledge transfer. To evaluate our method, we collected three publicly available annotated COVID-19-related hate speech datasets on Twitter, i.e., two in English and one in German. Furthermore, a Chinese dataset based on Weibo is constructed to expand multilingual data. The experimental results across three languages illustrate the effectiveness of our method for cross-lingual hate speech detection. Test F1-scores of our method for English, Chinese, German as transfer target languages can reach up to 0.728, 0.799 and 0.612 respectively, which are on average better than other baselines.
Persistent Identifier	http://hdl.handle.net/10722/330484
ISSN	0957-4174 2023 Impact Factor: 7.5 2023 SCImago Journal Rankings: 1.875
ISI Accession Number ID	WOS:001059475500001

DC Field	Value	Language
dc.contributor.author	Liu, Lin	-
dc.contributor.author	Xu, Duo	-
dc.contributor.author	Zhao, Pengfei	-
dc.contributor.author	Zeng, Daniel Dajun	-
dc.contributor.author	Hu, Paul Jen Hwa	-
dc.contributor.author	Zhang, Qingpeng	-
dc.contributor.author	Luo, Yin	-
dc.contributor.author	Cao, Zhidong	-
dc.date.accessioned	2023-09-05T12:11:06Z	-
dc.date.available	2023-09-05T12:11:06Z	-
dc.date.issued	2023	-
dc.identifier.citation	Expert Systems with Applications, 2023, v. 234, article no. 121031	-
dc.identifier.issn	0957-4174	-
dc.identifier.uri	http://hdl.handle.net/10722/330484	-
dc.description.abstract	During the COVID-19 pandemic, online social media platforms such as Twitter facilitate the exchange of information among people. However, the prevalence of “infodemic” such as online hate speech has exacerbated social rifts, discrimination, prejudice and even hate crimes. Timely and effective detection of the hate speech will help create a healthy public opinion environment. Most of the current COVID-19-related hate speech research focuses on a single language, such as English. In this paper, we introduce a cross-lingual transfer learning method, aiming to contribute to hate speech detection in low-resource languages. We propose a deep learning based model to classify hate speech with a pre-trained language model for multilingual text embedding. Data augmentation and cross-lingual contrastive learning are then utilized to further improve the performance of cross-lingual knowledge transfer. To evaluate our method, we collected three publicly available annotated COVID-19-related hate speech datasets on Twitter, i.e., two in English and one in German. Furthermore, a Chinese dataset based on Weibo is constructed to expand multilingual data. The experimental results across three languages illustrate the effectiveness of our method for cross-lingual hate speech detection. Test F1-scores of our method for English, Chinese, German as transfer target languages can reach up to 0.728, 0.799 and 0.612 respectively, which are on average better than other baselines.	-
dc.language	eng	-
dc.relation.ispartof	Expert Systems with Applications	-
dc.subject	COVID-19	-
dc.subject	Cross-lingual	-
dc.subject	Deep learning	-
dc.subject	Hate speech detection	-
dc.subject	Natural language processing	-
dc.title	A cross-lingual transfer learning method for online COVID-19-related hate speech detection	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.eswa.2023.121031	-
dc.identifier.scopus	eid_2-s2.0-85166967683	-
dc.identifier.volume	234	-
dc.identifier.spage	article no. 121031	-
dc.identifier.epage	article no. 121031	-
dc.identifier.isi	WOS:001059475500001	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A cross-lingual transfer learning method for online COVID-19-related hate speech detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats