A natural language processing approach to automatic plagiarism detection

Leung, CH; Chan, YY

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/1324302.1324348
Scopus: eid_2-s2.0-62949198590

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Faculty of Education: Conference papers

Conference Paper: A natural language processing approach to automatic plagiarism detection

Title	A natural language processing approach to automatic plagiarism detection
Authors	Leung, CH Chan, YY
Keywords	Natural language process Plagiarism detection Syntactic and semantic analysis
Issue Date	2007
Citation	Sigite'07 - Proceedings Of The 2007 Acm Information Technology Education Conference, 2007, p. 213-218 How to Cite? DOI: http://dx.doi.org/10.1145/1324302.1324348
Abstract	The problem of plagiarism has existed for a long time but with the advance of information technology the problem becomes worse. It is because there are many electronic versions of published materials available to everyone. The Web is an important and common source for plagiarism. Some plagiarism detection programs (such as Turnitin) were developed to attempt to deal with this problem. To determine whether an article is copied from the Web or other electronic sources, the plagiarism detection program should calculate the similarity between two articles. However, it is often difficult to detect plagiarism accurately after modification of the copied contents. For example, it is possible to simply replace a word with its synonym (e.g. "program" - "software ") and change the entire sentence structure. Most plagiarism detection programs can only compare whether two words are the same lexically and count how many matched words are there in a paper. Thus, if the copied materials are modified deliberately, it becomes difficult to detect plagiarism. Application of natural language processing can help to resolve this kind of problem. The underlying syntactic structure and semantic meaning of two sentences can be compared to reveal their similarity. There are several steps in the matching procedure. First, the thesaurus (or the lexical hierarchical structure) is referenced to find out the synonyms, broader terms and narrower terms used in the paper being checked. Then, the paper will be compared with the documents in the database. Wordnet is a typical example of the thesaurus that can be used for this purpose. If it is suspected that the paper contains some contents from the database, the sentences of the paper may be parsed to construct their parsing trees and semantic representations for further detailed comparison. The context free grammar and the case grammar are used to represent the syntactic structure and semantic meaning of sentences in the system. It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach. Copyright 2007 ACM.
Persistent Identifier	http://hdl.handle.net/10722/134693
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Leung, CH	en_HK
dc.contributor.author	Chan, YY	en_HK
dc.date.accessioned	2011-07-05T08:24:35Z	-
dc.date.available	2011-07-05T08:24:35Z	-
dc.date.issued	2007	en_HK
dc.identifier.citation	Sigite'07 - Proceedings Of The 2007 Acm Information Technology Education Conference, 2007, p. 213-218	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/134693	-
dc.description.abstract	The problem of plagiarism has existed for a long time but with the advance of information technology the problem becomes worse. It is because there are many electronic versions of published materials available to everyone. The Web is an important and common source for plagiarism. Some plagiarism detection programs (such as Turnitin) were developed to attempt to deal with this problem. To determine whether an article is copied from the Web or other electronic sources, the plagiarism detection program should calculate the similarity between two articles. However, it is often difficult to detect plagiarism accurately after modification of the copied contents. For example, it is possible to simply replace a word with its synonym (e.g. "program" - "software ") and change the entire sentence structure. Most plagiarism detection programs can only compare whether two words are the same lexically and count how many matched words are there in a paper. Thus, if the copied materials are modified deliberately, it becomes difficult to detect plagiarism. Application of natural language processing can help to resolve this kind of problem. The underlying syntactic structure and semantic meaning of two sentences can be compared to reveal their similarity. There are several steps in the matching procedure. First, the thesaurus (or the lexical hierarchical structure) is referenced to find out the synonyms, broader terms and narrower terms used in the paper being checked. Then, the paper will be compared with the documents in the database. Wordnet is a typical example of the thesaurus that can be used for this purpose. If it is suspected that the paper contains some contents from the database, the sentences of the paper may be parsed to construct their parsing trees and semantic representations for further detailed comparison. The context free grammar and the case grammar are used to represent the syntactic structure and semantic meaning of sentences in the system. It is found that plagiarism that cannot be detected by the traditional methods can be identified by this new approach. Copyright 2007 ACM.	en_HK
dc.language	eng	en_US
dc.relation.ispartof	SIGITE'07 - Proceedings of the 2007 ACM Information Technology Education Conference	en_HK
dc.subject	Natural language process	en_HK
dc.subject	Plagiarism detection	en_HK
dc.subject	Syntactic and semantic analysis	en_HK
dc.title	A natural language processing approach to automatic plagiarism detection	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Chan, YY: yychan8@hkucc.hku.hk	en_HK
dc.identifier.authority	Chan, YY=rp00894	en_HK
dc.description.nature	link_to_subscribed_fulltext	en_US
dc.identifier.doi	10.1145/1324302.1324348	en_HK
dc.identifier.scopus	eid_2-s2.0-62949198590	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-62949198590&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.spage	213	en_HK
dc.identifier.epage	218	en_HK
dc.identifier.scopusauthorid	Leung, CH=7402612553	en_HK
dc.identifier.scopusauthorid	Chan, YY=7403676264	en_HK

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: A natural language processing approach to automatic plagiarism detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats