Dataset: Longitudinal Study on Reading and Writing at the Word, Sentence, and Text Levels [v.1]

Dataset Upload or External Link
File Format:
Version Number

This dataset is longitudinal in nature, comprising data from school years (2007/2008-2010/2011) following students in grade 1 to grade 4. Measures were chosen to provide a wide array of both reading and writing measures, encompassing reading and writing skills at the word, sentence, and larger passage or text levels. Participants were tested on all measures once a year, approximately one year apart. Participants were first grade students in the fall of 2007 whose parents consented to participate in the longitudinal study. Participants attended six different schools in a metropolitan school district in Tallahassee, Florida. Data was gathered by trained testers during thirty to sixty minute sessions in a quiet room designated for testing at the schools. The test battery was scored in a lab by two or more raters and discrepancies in the scoring were resolved by an additional rater.

Reading Measures Decoding Measures. The Woodcock Reading Mastery Tests-Revised (WRMT-R; Woodcock, 1987): Word Attack subtest was used to assess accuracy for decoding non-words. The Test of Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1999): Phonetic Decoding Efficiency (PDE) subtest was also used to assess pseudo-word reading fluency and accuracy. Both subtests were used to form a word level decoding latent factor. The WRMT-R Word Attack subtest consist of a list of non-words that are read out loud by the participant. The lists start off with letters and become increasingly more difficult to include complex non-words. Testing is discontinued after six consecutive incorrect items. The median reliability is reported to be .87 for Word Attack (Woodock, McGrew, & Mather, 2001). The TOWRE PDE requires accurately reading as many non-words as possible in 45 seconds. The TOWRE test manual reports test-retest reliability to be .90 for the PDE subtest. Sentence Reading Measures. Two forms of the Test of Silent Reading Efficiency and Comprehension (TOSREC, forms A and D; Wagner et al., 2010) were used as measures of silent reading fluency. Students were required to read brief statements (e.g., “a cow is an animal”) and verify the truthfulness of the statement by circling yes or no. Students are given three minutes to read and answer as many sentences as possible. The mean alternate forms reliability for the TOSREC ranges from .86 to .95.

Reading Comprehension Measures. The Woodcock-Johnson-III (WJ-III) Passage Comprehension subtest (Woodcock et al., 2001) and the Woodcock Reading Mastery TestRevised Passage Comprehension subtest (WRMT-R; Woodcock, 1987) were used to provide two indicators of reading comprehension. For both of the passage comprehension subtests, students read brief passages to identify missing words. Testing is discontinued when the ceiling is reached (six consecutive wrong answers or until the last page was reached). According to the test manuals, test-retest reliability is reported to be above .90 for WRMT-R, and the median reliability coefficient for WJ-III is reported to be .92.

Spelling Measures. The Spelling subtest from the Wide Range Achievement Test-3 (WRAT-3; Wilkinson, 1993) and the Spelling subtest from the Wechsler Individual Achievement Test-II (WIAT-II; The Psychological Corporation, 2002) were used to form a spelling factor. 14 Both spelling subtests required students to spell words with increasing difficulty from dictation. The ceiling for the WRAT3 Spelling subtest is misspelling ten consecutive words. If the first five words are not spelled correctly, the student is required to write his or her name and a series of letters and then continue spelling until they have missed ten consecutive items. The ceiling for WIAT-II is misspelling 6 consecutive words. The reliability of the WRAT-3 spelling subtest is reported to be .96 and the reliability of the WIAT-II Spelling subtest is reported to be .94.

Written Expression Measures. The Written Expression subtest from the Wechsler Individual Achievement Test-II (WIAT-II; The Psychological Corporation, 2002) was administered. Written Expression score is based on a composite of Word Fluency and Combining Sentences in first and second grades and a composite of Word Fluency, Combining Sentences, and Paragraph tasks in third grade. In this study the Combining Sentences task was used as an indicator of writing ability at the sentence level. For this task students are asked to combine various sentences into one meaningful sentence. According to the manual, the test-retest reliability coefficient for the Written Expression subtest is .86.

Writing Prompts. A writing composition task was also administered. Participants were asked to write a passage on a topic provided by the tester. Students were instructed to scratch out any mistakes and were not allowed to use erasers. The task was administered in groups and lasted 10 minutes. The passages for years 1 and 2 required expository writing and the passage for year 3 required narrative writing. The topics were as follows: choosing a pet for the classroom (year 1), favorite subject (year 2), a day off from school (year 3). The writing samples were transcribed into a computer database by two trained coders. In order to submit the samples to Coh-Metrix (described below) the coders also corrected the samples. Samples were corrected once for spelling and punctuation using a hard criterion (i.e., words were corrected individually for spelling errors regardless of the context, and run-on sentences were broken down into separate sentences). In addition, the samples were completely corrected using the soft criterion: corrections were made for spelling based on context (e.g., correcting there for their), punctuation, grammar, usage, and syntax (see Appendix A for examples of original and corrected transcripts). The samples that were corrected only for spelling and punctuation using the hard criterion were used for several reasons: (a) developing readers make many spelling errors which make their original samples illegible, and (b) the samples that were completely corrected do not stay true to the child’s writing ability. Accuracy of writing was not reflected in 15 the corrected samples because of the elimination of spelling errors. However, as mentioned above spelling ability was measured separately. Data on compositional fluency and complexity were obtained from Coh-Metrix. Compositional fluency refers to how much writing was done and complexity refers to the density of writing and length of sentences (Berninger et al., 2002; Wagner et al., 2010).

Coh-Metrix Measures. The transcribed samples were analyzed using Coh-Metrix (McNamara et al., 2005; Graesser et al., 2004). Coh-Metrix is a computer scoring system that analyzes over 50 measures of coherence, cohesion, language, and readability of texts. Appendix B contains the list of variables provided by Coh-Metrix. In the present study, the variables were broadly grouped into the following categories: a) syntactic, b) semantic, c) compositional fluency, d) frequency, e) readability and f) situation model. Syntactic measures provide information on pronouns, noun phrases, verb and noun constituents, connectives, type-token ratio, and number of words before the main verb. Connectives are words such as so and because that are used to connect clauses. Causal, logical, additive and temporal connectives indicate cohesion and logical ordering of ideas. Type-token ratio is the ratio of unique words to the number of times each word is used. Semantic measures provide information on nouns, word stems, anaphors, content word overlap, Latent Semantic Analysis (LSA), concreteness, and hypernyms. Anaphors are words (such as pronouns) used to avoid repetition (e.g., she refers to a person that was previously described in the text). LSA refers to how conceptually similar each sentence is to every other sentence in the text. Concreteness refers to the level of imaginability of a word, or the extent to which words are not abstract. Concrete words have more distinctive features and can be easily pictured in the mind. Hypernym is also a measure of concreteness and refers to the conceptual taxonomic level of a word (for example, chair has 7 hypernym levels: seat -> furniture -> furnishings -> instrumentality -> artifact -> object -> entity). Compositional fluency measures include the number of paragraphs, sentences and words, as well as their average length and the frequencies of content words. Frequency indices provide information on the frequency of content words, including several transformations of the raw frequency score. Content words are nouns, adverbs, adjectives, main verbs, and other categories with rich conceptual content. Readability indices are related to fluency and include two traditional indices used to assess difficulty of text: Flesch Reading Ease Score and Flesch- 16 Kincaid Grade Level. Finally, situation model indices describe what the text is about, including causality of events and actions, intentionality of performing actions, tenses of actions and spatial information. Because Coh-Metrix hasn’t been widely used to study the development of writing in primary grade children (Puranik et al., 2010) the variables used in the present study were determined in an exploratory manner described below. Out of the 56 variables, 3 were used in the present study: total number of words, total number of sentences and average sentence length (or average number of words per sentence). Nelson and Van Meter (2007) report that total word productivity is a robust measure of developmental growth in writing. Therefore, indicators for a paragraph level factor included total number of words and total number of sentences. Average words per sentence was used as an indicator for a latent sentence level factor, along with the WIAT-II Combining Sentences task.

Following the Sunshine State Standards, students are required to take the Florida Comprehensive Assessment Test (FCAT) yearly. The FCAT contains a Reading subtest that assesses informational and literary reading comprehension. The Writing subtest requires students to write a narrative, expository, or persuasive essay within a 45-min session

Are these data unique or derived? ? If data were collected specifically for this project and are not stored in a slightly different form elsewhere, they are unique. If these data combine or use data from other datasets, they are derived
Tallahassee, Florida, United States
318 Children (Age Range: 5-11)
Time Points
Data Collection Location(s)
When were the data in this dataset collected?
August 2007 to June 2011

Items Connected to this Dataset