New benchmark of 13,500 AI responses tests frontier models from OpenAI, Google, and Anthropic on the curriculum knowledge that underpins K–10 teaching across both countries
Sydney, Australia, February 2026, A new whitepaper has put AI’s knowledge of Australian and New Zealand school curricula to the test for the first time. The study, titled Benchmarking Frontier Model Recall of Australian and New Zealand K–10 Curricula, evaluated the world’s leading AI models, including OpenAI’s GPT-5.2, Google’s Gemini 3 Pro, and Anthropic’s Claude Sonnet 4.5, on their ability to answer questions about the curriculum frameworks used in K–10 classrooms across both countries. The results highlight a notable gap between the models’ general academic capabilities and their knowledge of specific regional curricula.
The research, published by Australian education technology company CurricuLLM, is the first systematic benchmark of AI curriculum knowledge in this region. It evaluated eight AI systems across approximately 13,500 question-response pairs spanning the Australian Curriculum v9, the Victorian Curriculum, the Western Australian Curriculum, and the New Zealand Curriculum. Questions ranged from identifying what specific learning outcomes mean, to recalling what content is taught in a given subject and year level, to answering the kinds of open-ended questions a teacher might naturally ask.
“We wanted to answer a simple question: do the AI tools teachers are already using actually know our curriculum?” said Dan Hart, Founder of CurricuLLM. “These are incredibly capable models, they score 95% on American SAT exams. But when it comes to the specific curriculum content that structures teaching and learning in Australia and New Zealand, there’s a very significant knowledge gap. The data makes that clear.”
What the Research Found
The whitepaper revealed a striking gradient in AI performance depending on the type of curriculum knowledge being tested. On open-ended, conceptual questions, such as which subject covers a particular topic or what is generally taught at a certain year level, mainstream models performed reasonably well, with Google’s Gemini 3 Pro achieving 80%. However, on the structured curriculum knowledge that teachers depend on for lesson planning and compliance, identifying what specific learning outcomes mean, recalling content points, and matching outcomes to the correct subject and year level, performance dropped dramatically. Two of the seven mainstream models scored 0% when asked to identify the meaning of specific curriculum outcome entries, and most scored below 17% on content recall.
No mainstream AI model exceeded a 41% overall pass rate across all four curricula. Among the Australian frameworks specifically, the Victorian Curriculum proved hardest for AI models, with baseline pass rates as low as 15%. The study also found that models frequently produced answers based on outdated, superseded curriculum versions, a particularly concerning pattern given that all four curricula are currently undergoing multi-year revision processes.
A Purpose-Built Alternative
CurricuLLM, a specialist AI assistant purpose-built for Australian and New Zealand teachers, was tested alongside the mainstream models using the same benchmark. By grounding every response in authoritative, version-controlled curriculum data, CurricuLLM achieved an overall pass rate of 89%, outperforming the best mainstream model by 48 percentage points. On the structured knowledge categories where mainstream models struggled most, CurricuLLM scored 83% on outcome identification and 89% on reverse lookups. On open-ended teacher-style questions, it achieved 98%.
The whitepaper also notes an important qualitative difference between models. Anthropic’s Claude models were the most likely to acknowledge when they didn’t know an answer, declining to respond rather than providing incorrect information. By contrast, models from OpenAI and Google almost never refused a question, instead providing confident but often wrong answers, a pattern the paper describes as potentially more harmful for teachers who may not realise the information is incorrect.
“This isn’t a criticism of AI in education, quite the opposite. We believe AI will be transformative for teaching,” said Hart. “But the tools teachers use need to actually know the curriculum they’re working with. A general-purpose chatbot trained predominantly on American content is going to have blind spots when it comes to the Australian Curriculum or the New Zealand Curriculum. That’s not a controversial claim, it’s just a data problem, and it’s one that can be solved.”
Why This Matters for Schools
The findings arrive as AI adoption in schools accelerates across both countries. Research shows that 69% of New Zealand primary school teachers are already using AI weekly for lesson planning and assessment, while Australian education policymakers have identified AI risk management as a paramount concern. With all four curricula undergoing revision, the gap between what mainstream AI models have memorised and what is actually current is widening, creating a growing risk that teachers relying on general-purpose AI tools may unknowingly plan against outdated or incorrect curriculum content.
“Teachers are already stretched thin. If an AI tool is going to be part of their workflow, it has to give them confidence that the curriculum information it provides is accurate and up to date,” said Hart. “We built CurricuLLM specifically to solve this problem, to give Australian and New Zealand teachers, for free, an AI assistant they can actually rely on. We hope this research helps schools make more informed decisions about which AI tools they adopt.”
About the Research
The benchmark evaluated seven frontier AI models, GPT-4.1-mini, GPT-4.1, GPT-5.2, Gemini 3 Pro, Gemini 3 Flash, Claude Sonnet 4.5, and Claude Haiku 4.5, alongside CurricuLLM across approximately 1,700 questions per model spanning five categories of curriculum knowledge. Evaluation used a combination of deterministic matching and an independent AI judge, with human validation confirming approximately 80% judge accuracy. The full whitepaper is available at curricullm.com/research.
About CurricuLLM
CurricuLLM is a purpose-built AI assistant for Australian and New Zealand K–10 teachers. By grounding every response in authoritative, up-to-date curriculum data, CurricuLLM delivers the curriculum accuracy that teachers need for lesson planning, resource creation, and curriculum mapping. Founded by Dan Hart, CurricuLLM is on a mission to make AI in education work for teachers in our region. Learn more at curricullm.com.
Media Contact: Dan Hart (dan@curricullm.com) Link: https://curricullm.com
