Complete Guide to Classroom Assessment

I have been a classroom teacher for over a decade. Assessment is the part of teaching that most professional development addresses least helpfully. Workshops focus on rubric design while ignoring the evidence on what actually changes student learning. This guide covers the assessment practices that research supports and that I have seen work across different subjects and age groups.

Part of our Evidence-Based Teaching Guide guide.

Formative vs. Summative: The Core Distinction

Formative assessment informs instruction while learning is still in progress. Summative assessment evaluates learning after instruction ends. Most teachers over-rely on summative and under-invest in formative. The distinction matters because formative assessment, when done well, is one of the highest-effect interventions in education.

Hattie’s Visible Learning meta-analysis (2009, updated 2023, synthesizing 1,200+ studies) found feedback — the core of formative assessment — has an effect size of 0.70, equivalent to roughly 1.5 years of additional learning. Most interventions cluster below 0.40.

Effective Formative Assessment Techniques

Exit tickets: a single question at the end of class that reveals whether students grasped the lesson’s core concept. Takes 3 minutes. Gives you tomorrow’s starting point. Most useful when you actually read them before the next class.

Cold calling with think time: pose a question, wait 5–7 seconds (true wait time, not 2 seconds), then call on a student. Dylan Wiliam’s research (Embedded Formative Assessment, 2011) found wait time over 3 seconds increases response quality and participation from more students significantly.

Mini whiteboards or Peergrade: whole-class simultaneous response. Students write or type their answer and hold it up at the same time. You see every student’s understanding in 60 seconds instead of hearing from two students per class.

Diagnostic questions: multiple choice questions with carefully designed distractors that reveal specific misconceptions. The question “What is 0.3 + 0.4?” tells you less than asking students to choose between 0.07, 0.7, and 7 — each wrong answer maps to a specific misunderstanding of decimal place value.

Summative Assessment That Measures What Matters

Test validity — whether your test measures what you intend to measure — is the most underexamined issue in classroom assessment. A common problem: tests heavy on factual recall that are labeled as measuring critical thinking. Bloom’s Taxonomy is a useful design tool. Aim for at least one-third of questions at application level or above.

Rubrics should describe performance, not assign points. “Demonstrates understanding of cause and effect with specific evidence from the text” is useful feedback. “4 out of 5 points” is not. Analytical rubrics (separate categories) give more diagnostic information than holistic rubrics (one overall rating).

Grading Practices Worth Reconsidering

Averaging grades across a semester weights early performance equally to final performance — this penalizes learning. A student who failed early quizzes while learning and scored 95% on the final has demonstrated mastery. Standard averaging does not reflect that. Standards-based grading, which grades against learning objectives rather than averaging scores, addresses this more accurately.

Late penalties reduce grades for behavior rather than learning. Some districts separate academic grades from work habit grades to keep assessment evidence clean. Worth examining local policy and research before adopting.

Feedback Timing and Quality

Feedback within 24 hours is dramatically more effective than feedback returned a week later (Black & Wiliam, Assessment and Classroom Learning, 1998). Students have moved on cognitively. Prioritize speed over polish. A written sentence of specific feedback delivered quickly beats a detailed rubric returned late.

Peer and Self-Assessment

Peer assessment increases student engagement with criteria and produces more revision than teacher-only feedback, when structured properly. Students need anchor examples and sentence starters (“This argument is strong because… It could be stronger if…”). Self-assessment is most effective when tied to specific criteria and done immediately after completing work, not days later.

Sources: Hattie, Visible Learning (2023 update). Wiliam, Embedded Formative Assessment (2011). Black & Wiliam, Assessment and Classroom Learning, Assessment in Education (1998). Bloom’s Taxonomy of Educational Objectives (1956, revised 2001).