tl;dr-ELT

too long; didn’t read- ELT

Both teachers & learners appreciate the value of writing as a way to develop ideas, practise language & show what someone can really do in English. However, when it comes to assessing writing, especially at scale, we often fall back on shortcuts such as length, accuracy & vocabulary range. Which leads me to wonder whether technology is now capable of moving beyond these surface cues to provide more meaningful assessment.

A recent study in Scientific Reports explores exactly this by testing a new AI-based system designed to judge English writing more consistently across different essay topics, not just familiar ones.

The study

Ren, Fan & Wang (2025) introduce a new Automated Essay Scoring model called HFC-AES (Hybrid Feature-based Cross-Prompt Automated Essay Scoring). Their focus is a long-standing weakness in automated marking: essays written to unseen or unfamiliar prompts are often scored unreliably.

To investigate this, the researchers trained & tested their system on large collections of learner essays, including the ASAP dataset as well as TOEFL11 & ICLE. The essays were written by non-native English users responding to multiple prompts, with human examiner scores used as the reference point.

The model works in two main stages:

  • A topic-independent stage that looks at overall writing quality using a mix of simple indicators (like sentence length or error patterns) & deeper neural analysis
  • A topic-related stage that checks how closely the essay content actually matches the task, using attention-based modelling to track relevance & organisation

Performance was measured using Quadratic Weighted Kappa (QWK), a common way of checking how closely automated scores match human ones.

The findings

The results are striking. On average, the system’s scores matched human examiners very closely, and it performed better than several well-known automated marking tools, including ones based on BERT & GPT (two well-known types of AI language models used to analyse [the former] & generate [the latter] text.).

Some key takeaways:

  • Removing features linked to text organisation led to a clear drop in accuracy, showing that structure really matters
  • Features connected to task relevance were especially important for argumentative essays
  • Attention mechanisms helped most with essays that involved abstract thinking, weighing options or developing a position
  • The system was fast enough for practical use, scoring roughly 60–70 essays per minute

That said, the model still struggled with subtle elements of writing. Rhetorical questions, shifts in tone or carefully balanced arguments were sometimes undervalued, while fluent but shallow responses could receive higher scores than a human examiner might give.

Why this matters for ELT

What makes this study interesting for me  isn’t the tech itself, but what it shows about writing. The model performs best when it can track coherence, organisation & relevance, the same things we often prioritise when marking by hand.

It also highlights a familiar tension. Accuracy & fluency are relatively easy to measure, for humans & machines alike. Depth of argument, stance & originality are much harder. Even advanced AI systems still find these aspects challenging.

To put it simply, imagine two essays with similar grammar & vocabulary. One develops an idea logically & stays focused on the task. The other sounds fluent but goes in circles. Systems like HFC-AES are becoming better at telling the difference, though they are not there yet.

Teacher Takeaways

  • Automated scoring is moving beyond grammar & word counts towards organisation & relevance
  • Task fulfilment is now a central focus for newer AI-based assessment tools
  • These systems can support feedback & large-scale marking, but they still miss nuance, voice & creativity

Rather than replacing teachers, research like this invites us to reflect on what we value most in student writing, & which aspects of that are hardest to capture, whether by humans or machines.

How do you decide what matters most when you assess writing in your classroom?

Leave a Reply

Welcome to my blog

take the legwork out of reading!

There’s a lot of fascinating information out there, but sometimes we just don’t have time to find it & actually read it.
This is where this blog comes in.

I’m here to give you a summary of interesting studies, journalism & news related to the world of ELT, language learning, linguistic research & anything else that catches my eye.
I always include the link, so you can check it out for yourself.

Let’s connect
Follow tl;dr-ELT on WordPress.com

Discover more from tl;dr-ELT

Subscribe now to keep reading and get access to the full archive.

Continue reading