Automated Narrative Scoring Using Large Language Models

Favorite

DOI

https://doi.org/10.33009/ldbase.1714615997.1925

The analysis of narratives often accompanies comprehensive language assessments of students. While analyzing narratives can be time-consuming and labor-intensive, recent advances in large language models (LLMs) indicate that it may be possible to automate this process. In the current study, we employed LLMs to automatically evaluate narrative discourse elements. Using two data sets of narrative texts we evaluated in-context learning and fine-tuning strategies for different GPT models and compared them to fine-tuning of BERT, which is an older-generation LLM. The results suggest that GPT models are more accurate than BERT. Fine-tuning of smaller GPT models when there is abundant labeled data is superior to in-context learning of larger GPT models when there is less labeled data. When compared to human inter-rater reliability, our results indicate that there is a decreasing gap between the assessment accuracy of human raters and LLMs.

Project Active From

2022 to 2024

Website

http://trinastoolbox.com/research_ALPS.html

Project: Automated Narrative Scoring Using Large Language Models

Most Recent Datasets in Project

Most Recent Documents in Project

Most Recent Code in Project