Why do AI IELTS Writing scores vary SO much? 😭 GPT vs Gemini vs Claude (Notebook LM)
I’m preparing for IELTS Academic and I’m getting really confused about my Writing band estimation depending on the AI I use 😭
Gemini is usually much more generous with my essays and often gives me around 7–7.5. GPT and Claude are way stricter and keep putting me around 6.5/7 no matter how much I improve. Recently I started using NotebookLM with official IELTS PDFs/rubrics uploaded from the IELTS website, and interestingly it tends to agree more with Gemini than with GPT/Claude.
The problem is that this discrepancy is becoming mentally exhausting because I genuinely feel like I’m improving:
- my Task 1s became much more objective and structured
- my timing improved
- I’m making fewer grammar mistakes
- my coherence is stronger
…but some AIs still keep giving me basically the same score range every single time.
So now I honestly don’t know what my real level is or which AI is the most reliable benchmark for IELTS Writing evaluation.
Has anyone else experienced this?
Which AI do you think evaluates IELTS Writing more realistically?
And how close do you think AI scoring actually is to real examiners nowadays?