CurricuLLM LogoCurricuLLM
In the ClassroomFeaturesPricingTraining HubDevelopersFAQ

Research

Advancing the evidence base for AI-assisted teaching and learning

At CurricuLLM, we believe that effective AI in education must be grounded in rigorous research. Our work focuses on evaluating and improving how language models interact with structured curriculum content, ensuring that the tools teachers rely on are accurate, transparent, and evidence-based.

We actively seek partnerships with universities, education departments, and research institutions to advance the evidence base for AI-assisted teaching and learning. If you are interested in collaborating, we would love to hear from you.

Benchmarking Frontier Model Recall of Australian and New Zealand K–10 Curricula
PublishedFebruary 2026

Benchmarking Frontier Model Recall of Australian and New Zealand K–10 Curricula

A Comparative Evaluation of Parametric Knowledge and Retrieval-Augmented Generation

Large language models (LLMs) are increasingly used in educational contexts, yet their factual knowledge of specific national and sub-national curricula remains largely untested. We present a curriculum knowledge benchmark that systematically evaluates how accurately frontier LLMs can recall structured educational content from four Australian and New Zealand K–10 curriculum frameworks: the Australian Curriculum v9, the Victorian Curriculum, the Western Australian Curriculum, and the New Zealand Curriculum.

Dan Hart

CurricuLLM

More Research

Evaluating Large Language Model Translation Performance Across 22 Languages for Education Communications
PublishedJune 2025

Evaluating Large Language Model Translation Performance Across 22 Languages for Education Communications

This project evaluated the performance of language models tasked with translating education-related communications across 22 languages, with the goal of achieving robust multilingual communication within educational settings through the deployment of Large Language Models (LLMs). Reference-based evaluation was used, where machine translation metrics were established to compare candidate translations against post-edited references. The post-edited references are assumed to be accurate. The findings shed light on the expected performance of current LLMs for this specific use case and provide a methodology for similar evaluations in the future.

Dipankar Srirag, Aditya Joshi, Caroline Thompsett, Dan Hart

UNSW

CurricuLLM Logo
CurricuLLM

AI for schools

Product

FeaturesPricingDevelopersUse CasesFAQ

Company

About usPrivacy policyStatusContact

Resources

Terms of useSupportTraining hubBlogResearchPress