A recent paper by Shingo Nahatame & Kazuhiro Yamaguchi, published in Language Learning, takes a fresh look at a classic ELT concern: what actually makes a text “hard” for L2 readers to process. Using eye‑tracking data from longer texts than most previous studies, the authors revisit readability a Bayesian lens, which essentially means the model updated its expectations as evidence accumulated & estimated the probability that each linguistic feature genuinely affected reading effort.
The study
The researchers analysed open‑access eye‑tracking data from 41 Japanese university students reading 30 English passages (267–412 words) drawn from EIKEN exams. Participants read paragraph by paragraph while their eye movements were recorded. Three global measures of processing effort were examined:
- total reading time
- skipping count (words skipped on first pass)
- regression count (times readers moved their eyes back)
The texts were also analysed for readability indices (e.g. Flesch, CML2RI, SBERT) & more than 800 linguistic features (lexical, syntactic, cohesion, pragmatic). Bayesian variable‑selection models were used to identify which features best predicted processing effort, controlling for text length & learners’ reading proficiency.
The findings
Several complex linguistic features did predict processing effort, but their advantage over simple features was modest.
Key results included:
- Lexical sophistication dominated. Features like contextual distinctiveness (how “surprising” a word’s co‑occurrence patterns are) & age of acquisition predicted longer reading times & more regressions.
- Bigram frequency mattered. Texts containing more frequent two‑word combinations led to more skipping, suggesting chunking at the phrase level.
- Some syntactic features played a role. More complex noun phrases & more adverbial clauses increased processing effort.
- But simple features held their own. Word length & sentence length predicted processing effort almost as well as the complex models. In fact, models using only word & sentence length often matched or outperformed those using readability indices.
- Random effects were huge. Differences between texts & readers explained a large share of variance, echoing work by Collins‑Thompson (2014) on the inherently individual nature of readability.
This aligns with earlier findings from Crossley, Nahatame, Zhang & Gong: lexical features matter, but their predictive power is limited & often unstable across datasets. It also echoes Kuperman et al.’s argument that aggregated word‑level features often outperform more global complexity measures.
To make this concrete:
A sentence like “The committee convened to evaluate the feasibility of the proposed initiative” contains later‑acquired, low‑frequency words & dense noun phrases. Eye‑tracking research suggests readers will spend longer on it, skip less & regress more.
By contrast, “The team met to decide if the plan could work” uses earlier‑acquired vocabulary & simpler structures, reducing processing load.
What this study ultimately shows is that readability isn’t as mysterious as we sometimes assume. Even with hundreds of linguistic features in play, the strongest signals still come from the basics: the words learners meet on the page.
Teacher takeaways?
- When selecting texts, don’t underestimate simple metrics. Average word length often tells you as much as sophisticated readability tools.
- Lexis still drives difficulty. Later‑acquired, low‑frequency words slow readers down more than syntactic bells & whistles.
- Longer texts behave differently. Features that matter in short passages may fade in longer ones, where global patterns dominate.
How do you judge whether a text will be manageable for your learners?



Leave a Reply