Happy Thursday! This week, we’re delving into factors of high-stakes testing. Specifically, we’re asking:
- Does administration location (testing center vs. virtual proctored) affect performance?
- How can we improve the equitability of high-stakes tests?
Testing in Jammies 😴
Alright, so, I feel like I should provide a bit of a spoiler for this article - the findings are not significant, statistically speaking. However, they are Very Significant from a practicality standpoint! Be sure to check out LSW Issue #58 for a quick review of why nonsignificant findings are still important 😊
Over the past couple of years (re: pandemic) testing has been difficult to perform in-person. Due to this challenge, research regarding the impact of high-stakes testing in other settings has become critical. In this article, researchers aimed to assess any potential differences between administering high-stakes tests through live remote proctoring (LRP) versus at a testing center (Cherry, O’Leary, Naumenko, Kuan, & Waters, 2021).
While past research has assessed cheating and user experience, the impact of mode of administration on scores has yet to be examined (Karim et al., 2014; Lilley et al., 2016; Cherry et al., 2021). The study evaluated over 14,000 learners across “11 different professional licensing examinations” in the United States. Exams were all taken on a laptop or desktop computer and on-screen content did not differ between testing locations (test center or LRP). For the LRP test-takers, the software included a rigorous check-in process similar to that of the test centers. The proctoring software also included a browser lock, video/audio recording, and a live proctor (Cherry et al., 2021).
When parsing out individual exams, a few did show statistically significant differences between testing center and LRP test scores. However, they varied in direction and the effect sizes were very small. Aggregated data from all 11 exams illustrated that scores between testing centers and LRP modes were “very similar,” with 82.96% and 83.21% respectively. One difference was test-taking time, with LRP sessions lasting almost 5 minutes longer, on average, than testing center administration. Overall, results showed that learners behave and perform similarly across testing modes (Cherry et al., 2021).
Key Takeaway: For high-stakes testing, administration in a testing center or live recorded proctoring both lead to similar results. LRP is a viable option for testing rather than an in-person exam.
Read More (Open Access): Cherry, G., O’Leary, M., Naumenko, O., Kuan, L.-A., & Waters, L. (2021). Do outcomes from high stakes examinations taken in test centres and via live remote proctoring differ? Computers and Education Open, 2.
A review: Are tests equitable?
A recent paper published in Intelligence reviews the literature on high-stakes testing and bias (Burgoyne, Mashburn, & Engle, 2021). High-stakes tests are related to “professional opportunities and, in turn, economic outcomes” (Burgoyne et al., 2021). Tests are often used for personnel selection, promotion, and more. Many high-stakes tests are based on the history of intelligence tests. Unfortunately, intelligence and other high-stakes tests have a history of bias, with scores differing per race, gender, and ethnicity (Burgoyne et al., 2021). Further, test scores may differentially predict job performance, called predictive bias, for groups. When predictive bias is present in a high-stakes test, scores cannot be reliably used across groups. Below is an image from the article that illustrates predictive bias:
Research has shown that individual differences related to attention control, working memory capacity, and fluid intelligence are related to performance outcomes and are much more equitable. Further, new evidence suggests that working memory capacity and attentional control tests reduce adverse impacts compared to traditional high-stakes tests (Burgoyne et al., 2021).
Adverse impacts from high-stakes tests occur when those above-mentioned selections (hiring, promotion, etc.) are made based on a biased test. When decisions are made on a biased test, protected groups, i.e., members of “a race, color, religion, sex, national origin group, or other protected class” in the United States, suffer economic consequences (Burgoyne et al., 2021).
In order to foster diversity, equity, and inclusion (DEI) in workplaces, high-stakes tests must be equitable. Upon reviewing the literature, the authors suggest “shifting the focus on high-stakes tests away from acculturated knowledge” and toward fluid intelligence (Burgoyne et al., 2021). Although this shift is not likely to solve everything, “as systemic and historical inequities continue,” it is an important first step toward equitable testing (Burgoyne et al., 2021). While DEI in high-stakes testing is not quite where we want it to be, research is assisting in getting us there.
Key Takeaway: To foster DEI, high-stakes tests should move away from crystallized intelligence and toward fluid intelligence. If this is not possible, domain-specific tests should be supplemented with tests on attentional control, working memory capacity, and fluid intelligence.
Read More ($): Burgoyne, A. P., Mashburn, C. A., & Engle, R. W. (2021). Reducing adverse impact in high-stakes testing. Intelligence, 87.
_"We suggest that shifting the focus of some high-stakes assessments away from crystallized intelligence or supplementing them with other cognitive constructs could mitigate group differences in performance without sacrificing criterion validity. In particular, we provide evidence that tests of attention control—the domain-general ability to maintain focus on task-relevant information and resist distraction—could provide a more equitable path forward."
_- Burgoyne et al., 2021
Pets of Learning Science Weekly
Our adorable car decor this week comes from reader, Debra. Sweet Clair is waiting to get through some tests too! She passed with flying colors 🌈
Send us your pet pics at email@example.com.
Wondering why we’re including animal photos in a learning science newsletter? It may seem weird, we admit. But we’re banking on the baby schema effect and the “power of Kawaii.” So, send us your cute pet pics -- you’re helping us all learn better!
The LSW Crew
Learning Science Weekly is written and edited by Kaitlyn Erhardt, Ph.D.
Have something to share? Want to see something in next week's issue? Send your suggestions: firstname.lastname@example.org