The analysis of narratives often accompanies comprehensive language assessments of students. While analyzing narratives can be time-consuming and labor-intensive, recent advances in large language models (LLMs) indicate that it may be possible to automate this process. In the current study, we employed LLMs to automatically evaluate narrative discourse elements. Using two data sets of narrative texts we evaluated in-context learning and fine-tuning strategies for different GPT models and compared them to fine-tuning of BERT, which is an older-generation LLM. The results suggest that GPT models are more accurate than BERT. Fine-tuning of smaller GPT models when there is abundant labeled data is superior to in-context learning of larger GPT models when there is less labeled data. When compared to human inter-rater reliability, our results indicate that there is a decreasing gap between the assessment accuracy of human raters and LLMs.
Project: Automated Narrative Scoring Using Large Language Models
DOI
https://doi.org/10.33009/ldbase.1714615997.1925
Project Active From
2022
to
2024
Most Recent Datasets in Project
-
Dataset: TwoRaters
Last update: May 1, 2024
Description: Narrative language samples elicited using the ALPS Oral Narrative Retell and Oral Narrative Generation tasks from diverse K-3 students. The tworaters data set was drawn randomly from the larger corpus... -
Dataset: Training Data
Last update: May 1, 2024
Description: Narrative language samples elicited using the ALPS Oral Narrative Retell and Oral Narrative Generation tasks from diverse K-3 students. The training data set was drawn randomly from the larger corpus ... -
Dataset: Test Data
Last update: May 1, 2024
Description: Narrative language samples elicited using the ALPS Oral Narrative Retell and Oral Narrative Generation tasks from diverse K-3 students. The test data set was drawn randomly from the larger corpus of n...