Instrument

1. Mathematics Test Item Development

The task of development of a mathematics test is being conducted by researchers at the Educational Testing Service (ETS). They first created a test blueprint based on the Common Core State Standards (CCSS) in Mathematics for Grade 8. After the review of the blueprint by the project team at Davis, Irvine, and ARDAC and by members of the project Technical Advisory Committee (TAC), ETS created two sets of items based on the approved blueprint, first, a set of 40 items and then another set of 30 items. The first set of 40 items was reviewed by the project team and suggestions for revisions were shared with ETS. These suggestions were incorporated into the first set of items by ETS and informed development of the second set of items (30 items). Further suggestions and feedback were provided to research team at ETS on the first and second set of items. The two sets of items were combined into a set of 70 items and were then divided into two sets of 35 items labeled as Forms A and B (two parallel forms), each with similar item content and item characteristics.

In order to finalize a standard form of a mathematics assessment, the research team reviewed the item statistics obtained from the Spring 2014 pilot. The Spring pilot administered a total of 65 items distributed in two forms (i.e., 30 unique items and 5 common items in each form). The two forms were assembled to have a balanced coverage of standards. A random assignment of the pilot forms was performed within each classroom. This design was intended to examine the qualities of all the items in the pool. Item difficulty was measured by the percentage of students who answered the item correctly (p-value). Item discrimination was measured by the item-total correlation. The item statistics were also examined by the English language learner (ELL) group membership.

At this stage, we considered that the items with p-values below 0.2 might be considerably difficult for students in general. These items also tended to have poor item-total correlation statistics. In selecting the items for a final standard form, we made efforts to exclude the items that were considered too difficult. This decision was made partly with an assumption that items considered substantially difficult might not allow us to detect the effects of the accommodations. In addition, a decision was made to include a maximum of 35 items for the standard form considering that it took approximately 40 minutes on overage for the students to complete each pilot form.

Upon initial screening of the items based on the pilot results, the research team reviewed the standards coverage as well as linguistic complexity of the selected items. Given the focus of examining the effects of the accommodations that provide direct linguistic support for ELL students, the team agreed that the standard assessment form should be more linguistically rich. Because the final standard form will include only 35 items, it is important to have a sufficient number of items for which we can provide linguistic accommodations such as linguistic modifications and glossaries. In order to evaluate the linguistic complexity of the initially-selected items, a holistic rubric with a scale range from one (minimal linguistic complexity) to five (highest linguistic complexity) was utilized. The details of the rubric are found in Abedi’s previous studies (e.g., Abedi, 2006). For the 14 items that were considered relatively low linguistic complexity (e.g., the ratings of 1 and 2), modifications were made in the pilot items to add some complexity in terms of vocabulary and syntactic structure.


2. TIMER (a Spanish version of a reading proficiency test)

The task of creation a Spanish version of TIMER was undertaken by the Davis research team. As discussed in the study proposal, a short English reading test named TIMER was adopted from the earlier work by UCLA/CRESST researchers. TIMER has two subscales, a sentence completion subscale and a “word/non-word” subscale which takes approximately 8 minutes to administer. This test will be given to Grade 8 students in Phase II of the study by computer before they take the mathematics test to determine their level of proficiency in reading in English and based on the results, to assign appropriate accommodations to them. The TIMER test was translated in Spanish by a bilingual (English/Spanish) UC Davis doctoral student who mastered both languages. An independent check of the translation was conducted by the UC Davis research team and discrepancies were corrected.