Victor Essien: Advancing Fairness in Language Assessment Through Data

In multilingual classrooms across the United States, a single test score can change the trajectory of a child’s education determining what support they receive, which classes they enter, and how their progress is understood over time. Behind those scores are teams of researchers, psychometricians, and linguists working quietly to make sure every result is as fair and accurate as possible.

One of those people is Victor Eyo Essien, a psychometrician and Researcher at the Center for Applied Linguistics (CAL) in Washington, D.C. CAL is a nonprofit organization that partners with WIDA to develop large-scale English language proficiency assessments used across dozens of states and agencies in the U.S. and beyond.

At CAL, Victor supports psychometric work for WIDA ACCESS, a suite of English language proficiency tests given annually to English learners in kindergarten through grade 12, designed to monitor their progress in academic English across listening, speaking, reading, and writing.

Drawing on years of experience in large-scale assessments in Nigeria and advanced training in educational measurement, Victor is part of a newer generation of measurement professionals who see data not just as numbers, but as a tool for equity.

Join us as we explore Victor’s journey from national examinations in Nigeria to psychometric work in the United States and how he thinks about fairness, data quality, and the future of language assessment.

Welcome, Victor. You’ve worked in assessment in both Nigeria and the United States. What first drew you into educational measurement and testing?

I came into educational measurement through a very practical door: students’ scores were deciding real futures, and I wanted to understand whether those scores were truly fair.

In Nigeria, I spent years working with high-stakes examinations both as an assistant examiner and later in more technical roles supporting large-scale assessments. I saw how much weight parents, schools, and policymakers placed on exam results, but I also saw the challenges: time pressure, test speededness, and the risk that some students weren’t really showing what they knew.

That curiosity led me into formal study in Educational Measurement, Research & Statistics, and eventually to psychometrics. Once I realized there was an entire field dedicated to designing, analyzing, and improving tests so that they measure what they should and do so fairly across groups I was hooked.

Today, working at the Center for Applied Linguistics, I feel like I’m continuing that same mission in a different context: using data and measurement tools to support multilingual learners in U.S. schools.

Can you describe your role at the Center for Applied Linguistics and how it connects to the WIDA ACCESS assessments?

At CAL, I work with the psychometrics and quantitative research team that supports WIDA assessments, especially ACCESS, which is used to monitor the English language development of multilingual learners across WIDA Consortium member states.

My work sits in a few key areas:

Data preparation and quality checks: I help clean and structure very large datasets from test administration. That means checking for anomalies, verifying coding, and making sure the data are ready for serious analysis because small errors at this stage can easily turn into misleading conclusions later.
Reliability and validity analyse: I support analyses related to test reliability, item performance, and scoring consistency. This can involve classical test theory and item response theory, depending on the question. The goal is to make sure the test behaves as a stable and trustworthy instrument across administrations and populations.
Equating and scaling support: Because WIDA ACCESS is administered annually, scores from different years must be placed on a comparable scale. I assist with parts of that process preparing data, checking model outputs, and documenting results so that a score from this year is interpretable relative to previous years.
Technical reporting and quality control: I also contribute to documentation and quality control steps that feed into annual technical reports. These reports help states and stakeholders see how the test is performing psychometrically and how it continues to meet federal and professional standards.

Even though I’m still early in my U.S. career, I see my role as being part of the measurement backbone of the program supporting the evidence that makes score interpretations defensible and fair.

You often talk about fairness in testing. What does fairness mean to you in the context of language assessment?

For me, fairness in language assessment has a few layers.

First, every student should have a genuine opportunity to show what they know. That means the items need to be well targeted, the instructions clear, and the testing conditions supportive. If students fail because of poorly designed items, technical issues, or construct irrelevant barriers, that’s not really a reflection of their language ability.

Second, fairness means scores must be comparable and free from systematic bias. That’s where psychometric work comes in examining item functioning, checking for patterns that could disadvantage groups, and adjusting where necessary. If we see that a task is consistently harder for a subgroup for reasons unrelated to the construct of academic English, we must ask difficult questions.

Third, fairness is also about transparent communication. When we produce technical reports, manuals, and interpretive guidance, we’re not just satisfying a compliance requirement—we’re helping educators and policymakers understand what scores can and cannot tell them. Misinterpretation can be just as damaging as poor measurement.

My background in Nigeria, working with national examinations that deeply affect access to opportunities, makes me very sensitive to these issues. Whether it’s mathematics items in West African exams or speaking tasks in WIDA ACCESS, the principle is the same: students deserve accurate, unbiased information about their learning.

How has your international experience from Nigeria to the U.S. shaped the way you approach data and psychometrics?

The biggest influence is that I almost never look at a dataset as just numbers. Each row is a student sitting somewhere in a classroom whether that’s a secondary school in Nigeria or a multilingual learner in a U.S. public school.

In Nigeria, I was very close to the operational side of testing: marking scripts, understanding how exam administration works on the ground, and seeing first-hand the pressure that students and families face. That experience trained me to always ask: How will this analytic decision play out in real life?

In the U.S. context, I’ve seen how technical sophistication and scale can be combined with strong research traditions. At CAL, there is deep attention to construct definition, validity evidence, test development procedures, rater training, and alignment with standards.

Putting those two worlds together, I try to bring:

A practical, on the ground mindset from my years in West African examinations.
A research driven, evidence based mindset from my current work in the U.S.

That combination makes me very careful about assumptions. When I see a pattern in the data, I think about the student experience behind it, not just the model fit statistics.

Many people are talking about AI and machine learning in assessment. How do you see these tools intersecting with your work?

AI and machine learning are already influencing assessment, and I think their role will only grow but they must be integrated thoughtfully.

From a measurement perspective, AI can support:

Automated item generation and practice materials, which can expand access and reduce construct-irrelevant variance like unfamiliarity with item types.
Scoring support, especially for written and spoken language, where human raters may need assistance with consistency and turnaround time.
Diagnostic feedback, helping students and teachers understand not just what score was obtained, but why.

At the same time, my psychometric training reminds me of important cautions:

AI systems inherit biases from their training data. If we’re not careful, we can scale existing inequities instead of solving them.
Transparency is critical. Stakeholders should know when AI is involved and what safeguards are in place.
Any AI-driven tool must still be anchored in sound validity arguments we must show that decisions based on those outputs are appropriate and fair.

I’m particularly interested in how machine learning can support item selection, adaptive testing, and quality monitoring areas that combine my love for data with my concern for fairness and efficiency in large-scale programs.

What impact do you hope your work with CAL and WIDA will have on multilingual learners?

My hope is that, in a very concrete way, students’ scores will tell a more accurate and just story about their abilities. If our analyses help keep the test stable and reliable across years, then a student’s growth in English proficiency will be captured more accurately. If our quality checks catch small problems early, we prevent them from turning into big issues that affect entire cohorts.

The learners we serve often navigate multiple languages, new school systems, and sometimes new countries. They deserve assessment systems that recognize their strengths and progress instead of reinforcing stereotypes or underestimating them. I may not meet the students whose data I analyze, but I think about them every time I run code, review an output, or read through a technical report.

For young researchers and students, especially international students who want to enter psychometrics or language assessment, what advice would you give?

A few things I’ve learned so far:

Build strong foundations: Take courses in statistics, measurement theory, and research methods seriously. You can’t skip the fundamentals and expect to do responsible work in high-stakes assessment.

Stay close to real practice: If you can, get experience with actual test administration, scoring, or program implementation. It will change how you read datasets and interpret results.

Look for mentors and collaborators: My journey has been shaped by mentors in Nigeria and in the U.S. who believed in my potential. Don’t be afraid to ask questions, seek feedback, and learn from people ahead of you.

Remember the human side of data: You are not just working with numbers. You’re working with people’s lives, opportunities, and identities. Let that responsibility guide your ethics and your attention to detail.

Be patient with yourself: Transitioning across countries, systems, and academic expectations is not easy. It’s okay to ask for help, take time to adjust, and grow step by step.

If you care about fairness, are curious about data, and enjoy solving complex problems, psychometrics and language assessment can be a very meaningful path.

About the Interviewer

Victor Essien is a researcher interested in developing and delivering better test, psychometrics, business, and application of Artificial intelligence. They cover stories about assessment, technology, and policy shape opportunities for learners around the world.

An Interview with Victor Essien: Advancing Fairness in Language Assessment Through Data

Leave a Reply Cancel reply

Leave a Reply Cancel reply

Related