Last term I sat down with three colleagues to moderate a sample of Year 11 Modern History essays. Four markers, five essays, an hour and a half. The point of moderation is to check that we're applying the rubric the same way, so when each of us takes the rest of our class set home that night, students across the faculty are marked to the same standard.
On the third essay, we landed on four different bands.
Same rubric. Same essay. Four NSW-qualified History teachers, all experienced. One marker had Band 4. Two had Band 5. One had Band 6.
We did what moderation is supposed to do. We talked it through. We looked at the source integration, the directive verb, the level of analysis. After fifteen minutes we settled on a shared Band 5. The marker who'd given Band 6 conceded; the marker who'd given Band 4 conceded. Three of us recalibrated our internal anchor for what Band 5 looked like in that paragraph type.
Then we went home and marked the other 27 essays in our own classes. Alone. On a Sunday night.
That's the moderation problem. Moderation patches the drift on the small sample you moderated. It doesn't stop the drift happening on the rest of the pile.
What AITSL Standard 5.3 actually asks for
Standard 5.3 in the Australian Professional Standards for Teachers requires teachers to participate in assessment moderation to support consistent and comparable judgements of student learning. It's not a fringe expectation. It's the standard.
The claim behind it is simple. Two students who hand in the same quality of work should receive the same band. Whether they sit in your class or in your colleague's. Whether their essay was marked on Tuesday morning or Sunday night. Whether they were the third essay in the pile or the twenty-eighth.
In practice, that consistency is hard to deliver. Faculty moderation is the main tool we have for it. AITSL's own framing of the standard (in the Embark Progress reference on assessment and feedback) describes moderation as collaboration between colleagues that aligns standards for judgement and ensures assessments are fair and reliable.
That description is correct. It's also incomplete. Moderation aligns standards on the moderated sample. It does not solve the variance inside an individual marker's pile across a marking session.
Where the variance actually comes from
I've talked to enough faculty markers across enough schools to know the patterns repeat. The variance has four main sources, and moderation in the staffroom catches none of them.
Time of day. The first essay in a marking session gets your full attention. The fifteenth gets a re-read. The thirtieth, at 11pm on a Sunday, gets the band you decided on in the first 30 seconds and a comment you've already typed twice tonight in slightly different words. Most experienced markers know this is happening. Some have re-marked their own piles a week later and seen the drift on paper.
Fatigue compression. Tired markers push the band distribution toward the middle. Strong essays come down a notch. Weak essays come up a notch. The students at the top and bottom of the cohort take the bigger hit.
Position in the pile. A Band 5 essay marked after three Band 6s reads weaker than the same essay marked after three Band 3s. Marker calibration shifts with what came before it.
Knowledge of the student. A student who's improved across the term gets the benefit of context in their teacher's marking. A student in a colleague's class doesn't. That's not always wrong, but it's not the rubric doing the work; it's marker memory.
Moderation catches the drift across markers in a moderated sample. It misses the drift within a marker across an evening.
What that costs
It costs students equity. Two students writing the same quality of work do not get the same mark; their marks correlate with where in the pile they sat and what time their teacher got to them. Standard 5.3 says this shouldn't happen. In most schools, it happens routinely.
It costs teachers confidence. A teacher who's honest about their own fatigue knows that the marks they assigned at 10pm aren't the marks they'd assign the same essays at 9am. Most of us privately recalibrate by re-reading the borderline ones. It still doesn't fix the rest of the pile.
It costs the school its position at appeal. When a student or parent contests a mark, the strongest defence is consistent application of the rubric across the cohort. That defence weakens if the school can't show how the rubric was applied to each individual script.
What MarkMate actually does about this
MarkMate was built because the moderation problem doesn't have a moderation-only solution. The drift inside an individual marker's pile needs something that doesn't get tired, doesn't compress the distribution at midnight, and applies the rubric the same way to script 1 and script 30.
MarkMate handles the base layer. It reads each piece of work against the marking criteria you've supplied. It assigns rubric scores and generates annotated feedback. It does this consistently across the whole class set. No drift between script 1 and script 30. No compression at 11pm.
The teacher stays in the loop. You review the scores. You override where you disagree. You add context where the student needs it: the kid who's had a hard fortnight, the kid you've been pushing on voice all term, the kid whose ambition outran their execution this time.
The change isn't that the teacher stops marking. The change is that the teacher's professional judgement is applied on top of a consistent base layer, instead of being the only thing standing between a tired marker and that marker's drift.
Moderation still happens. It gets easier. Instead of debating four different bands on the same essay, the faculty conversation moves to the cases where individual teachers overrode MarkMate's score and why. The variance going into the moderation discussion is smaller. The discussion itself gets sharper.
What it doesn't replace
A few things MarkMate doesn't and shouldn't do.
It doesn't replace the teacher's judgement on context. A student writing below their usual standard for a reason still needs a teacher who knows the reason. MarkMate doesn't know.
It doesn't replace moderation. Moderation is still where the faculty calibrates the rubric collectively, where new staff get inducted into the standards, and where the appeals-ready documentation conversation happens.
It doesn't replace a school's record-keeping. It generates the rubric scores and the annotated feedback, and it logs them per script. The school still owns the cohort record, the appeals process, and the conversation with parents.
For school leaders thinking about procurement
If you're a Head of Faculty, Deputy, or Principal reading this, the question you're probably weighing is whether this fits with the school's compliance posture and the kind of moderation defence you'd want at appeal. Two things to look at.
The Compliance & Safety page covers what data MarkMate sends to the AI provider (no student names or identifiers), how marking is logged per script for appeal defence, and the boundary between what MarkMate does and what teachers do.
The class link feature lets students self-check drafts under teacher supervision before submission, which is the procurement-friendly use case for senior cohorts where moderation discussions about AI use are already part of faculty practice.
