File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese

TitleRastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese
Authors
KeywordsBrazilian Portuguese
Eye-tracking corpus
Natural language processing
Predictability norms
Sentence complexity prediction
Issue Date17-Aug-2022
PublisherSpringer
Citation
Language Resources and Evaluation, 2022, v. 56, n. 4, p. 1333-1372 How to Cite?
Abstract

This article presents RastrOS, a new eye-tracking corpus of eye movement data from university students during silent reading of paragraphs of texts in Brazilian Portuguese (BP). The article shows the potential of the corpus for natural language processing (NLP) using it to evaluate the sentence complexity prediction task in BP and it also focuses on the description of NLP resources and methods developed to create the corpus. Specifically, we present: (i) the method used to select the corpus paragraphs from large corpora, using linguistic metrics and clustering algorithms; (ii) the platform for collecting the Cloze test, which is also responsible for creating the project datasets, and (iii) the hybrid semantic similarity method, based on word embedding models and contextualised word representations, used to generate semantic predictability norms. RastrOS can be downloaded from the open science framework repository with the computational infrastructure mentioned above. Datasets with predictability norms of 393 participants and eye-tracking data of 37 participants are available in the OSF repository for this work (https://osf.io/9jxg3/).


Persistent Identifierhttp://hdl.handle.net/10722/357104
ISSN
2023 Impact Factor: 1.7
2023 SCImago Journal Rankings: 0.786
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLeal, Sidney Evaldo-
dc.contributor.authorLukasova, Katerina-
dc.contributor.authorCarthery-Goulart, Maria Teresa-
dc.contributor.authorAluísio, Sandra Maria-
dc.date.accessioned2025-06-23T08:53:23Z-
dc.date.available2025-06-23T08:53:23Z-
dc.date.issued2022-08-17-
dc.identifier.citationLanguage Resources and Evaluation, 2022, v. 56, n. 4, p. 1333-1372-
dc.identifier.issn1574-020X-
dc.identifier.urihttp://hdl.handle.net/10722/357104-
dc.description.abstract<p>This article presents RastrOS, a new eye-tracking corpus of eye movement data from university students during silent reading of paragraphs of texts in Brazilian Portuguese (BP). The article shows the potential of the corpus for natural language processing (NLP) using it to evaluate the sentence complexity prediction task in BP and it also focuses on the description of NLP resources and methods developed to create the corpus. Specifically, we present: (i) the method used to select the corpus paragraphs from large corpora, using linguistic metrics and clustering algorithms; (ii) the platform for collecting the Cloze test, which is also responsible for creating the project datasets, and (iii) the hybrid semantic similarity method, based on word embedding models and contextualised word representations, used to generate semantic predictability norms. RastrOS can be downloaded from the open science framework repository with the computational infrastructure mentioned above. Datasets with predictability norms of 393 participants and eye-tracking data of 37 participants are available in the OSF repository for this work (<a href="https://osf.io/9jxg3/">https://osf.io/9jxg3/</a>).<br></p>-
dc.languageeng-
dc.publisherSpringer-
dc.relation.ispartofLanguage Resources and Evaluation-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectBrazilian Portuguese-
dc.subjectEye-tracking corpus-
dc.subjectNatural language processing-
dc.subjectPredictability norms-
dc.subjectSentence complexity prediction-
dc.titleRastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese-
dc.typeArticle-
dc.identifier.doi10.1007/s10579-022-09609-0-
dc.identifier.scopuseid_2-s2.0-85136200724-
dc.identifier.volume56-
dc.identifier.issue4-
dc.identifier.spage1333-
dc.identifier.epage1372-
dc.identifier.eissn1574-0218-
dc.identifier.isiWOS:000842580500001-
dc.identifier.issnl1574-020X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats