When Practice Becomes Prophecy

John AApril 13, 2026

11 8 minutes read

An exercise designed to reproduce exam conditions that cannot hold those conditions together is not a simulation—it’s a demonstration of the gap. In 2026, Nigeria’s Joint Admissions and Matriculation Board (JAMB) ran exactly that exercise: according to a bulletin from JAMB spokesperson Fabian Benjamin, the Unified Tertiary Matriculation Examination (UTME) had two explicit purposes—letting candidates familiarize themselves with the computer-based testing (CBT) environment and testing the board’s own readiness for the live exam. Server failures, power outages, and delayed starts hit some centers during both the primary mock session and the follow-up hands-on practice slot, so neither purpose was met. Candidates received neither a reliable rehearsal of CBT conditions nor any clear assurance that the infrastructure behind the real exam was ready. The conditions meant to make the simulation authentic were the very things that failed.

This breakdown shows that time limits, interface, and supervision are not background scenery but variables that determine what knowledge is retrievable and which skills are trained. The deeper question—why conditions exert this kind of weight on performance, not just comfort—turns out to have a research-grounded answer.

The Science Behind Exam Conditions

Regulators have started to treat exam conditions themselves as objects of study. A review of international and academic evidence commissioned for Ofqual’s consultation on on-screen assessment concluded that mode effects are likely to exist because on-screen and pen-and-paper formats are inherently different, with digital tests often associated with greater cognitive demands, particularly for reading. The report stresses that successful implementation of format changes will require careful test design, ongoing evaluation, and a commitment to maintaining standards, because changing the mode of delivery can change the cognitive work an exam requires even when the syllabus content is identical. Ezekiel Sweiry, a researcher at Ofqual, makes the logic explicit, noting that “Any variation in cognitive load between modes could be said to represent a shift in construct, with a greater emphasis on working memory demands in the mode associated with greater load.” If mode can alter the construct being assessed, then practice that ignores mode is, by definition, practicing for a different task.

Encoding specificity theory gives this a mechanistic foundation. A 2021 article in Psychonomic Bulletin & Review states the principle directly: effective retrieval cues reinstate parts of the information stored in the memory trace, and reinstating the original context tends to benefit memory—a finding anchored in Smith and Vela’s 2001 meta-analytic review, which found environmental context effects on memory reliably detectable across studies. That review also found that environmental context effects on memory could be moderated by factors such as the presence of strong item-specific cues or deliberate mental reinstatement of the learning context. The practical consequence follows from this: the closer practice conditions come to the target assessment environment, the less cognitive reconstruction students are forced to perform under pressure when it actually counts. The finding that digital mode alone can increase cognitive load fits this pattern: it is a concrete demonstration of how a change in conditions, even with constant content, can shift what students are effectively being asked to do.

Some of the most exam-critical capabilities are almost entirely condition-dependent: timing decisions, sustained attention, and recovery from setbacks develop through repeated exposure to the specific pressures of real assessments. Yet those conditions don’t assemble themselves—they have to be deliberately designed, and the capacity to design them well is distributed very unevenly.

The Design Gap and Inequality

High-fidelity simulation starts with design, not just difficulty. To function as a genuine stand-in for a live exam, a mock needs specification-level decisions about time allowed, overall length, the mix of item formats, the ordering and sectioning of questions, and how marks are allocated and weighted. The AERA/APA/NCME Standards for Educational and Psychological Testing treat these as core elements of formal test specifications, stating that developers should spell out features such as item formats, the ordering of items and sections, test length and time limits, and should document the rationale when scores apply differential weighting to items. When these structural features are improvised or omitted, the simulation changes what it represents: the construct sampled, the strategies rewarded, and the kinds of pressure students experience no longer align cleanly with the live assessment.

Timing adds a second layer to fidelity. The same well-designed mock paper serves different functions depending on when it is used. Early in a course, it primarily acts as a diagnostic, highlighting gaps in coverage and familiarizing students with broad expectations. Deployed four to six weeks before an exam, that identical paper becomes a conditioning tool: it forces students to operate at the pace, concentration level, and emotional intensity that the live assessment will require. Design fidelity and deployment timing interact; a paper that mirrors the live exam but arrives too early, or a last-minute exercise that bears little structural resemblance to the real thing, both underuse the potential of simulation.

Historically, most mock papers have been authored by individual teachers working from their reading of the syllabus and experience of past papers. That’s a bounded model: without direct visibility of internal test specifications or the processes by which examiner emphasis shifts across sessions, the inferred version of the exam and the actual one can drift apart—quietly, and in ways that are hard to detect from within a single classroom.

The consequences are uneven. Schools with multiple experienced subject specialists, established internal assessment teams, and time allocated for collaborative paper design tend to produce higher-fidelity mocks: structurally closer to the real thing, better calibrated in difficulty, and more up to date with recent trends. Institutions without that capacity are more likely to rely on improvised or inherited materials that only loosely resemble the target assessment. The question of who else can own the problem of maintaining authentic conditions—and at what scale—doesn’t have an obvious answer inside any one school.

Authenticity as Professional Infrastructure

Standardized exam conditions don’t preserve themselves. In live, high-stakes settings, the procedural authenticity of an assessment—consistent identity verification, continuous oversight, documented evidence trails—requires deliberate professional infrastructure to sustain. The scale of investment that organizations put into this is itself a signal: conditions are not administrative detail layered onto an exam; they are part of what the exam’s results are taken to mean.

Remote delivery makes that responsibility more complex, which is why some organizations specialize in reproducing controlled exam-room conditions at a distance. VICTVS Ltd operates secure, high-stakes examination delivery across more than 180 countries, using a global network of fully qualified invigilators and running, under license from FIFA, the official remote platform for the FIFA Football Agent Exam. Its technology reproduces core features of an in-person test center: candidates upload ID documents for verification, remain under continuous high-definition video and audio communication with an invigilator, and have their sessions recorded in detail for evidential review. That level of investment—purpose-built systems, trained professionals, auditable records—signals that standardized conditions are treated as a primary engineering task, not a logistical convenience. Underpinning this are clear expectations about professional conduct and integrity, framed as living the truth through consistent, principled action—which in examination terms means training invigilators to monitor for misconduct, report concerns immediately, and resist pressures such as bribery, question leakage, or impersonation, so that the remote setting retains the structural authenticity of a physical exam hall.

Recent experience from other testing organizations shows what happens when standardization in remote settings is weak. In a 2022 test-security communication, ETS, which administers exams such as TOEFL and GRE, reported a more than 200% increase in score cancellations across at-home testing in the 2021 financial year compared with 2020, as integrity concerns escalated with the expansion of remote delivery. That kind of spike underlines why systems that embed identity checks, continuous professional supervision, and high-quality recording, such as those used by VICTVS, are framed not as optional extras but as necessary controls to preserve score validity.

Closing the Gap—Examiner-Authored Simulation at Scale

High-stakes exam providers are now treating delivery mode as a fairness constraint, not a cosmetic detail. Matthew Glanville, Director of Assessment at the International Baccalaureate, explains the IB’s position as it prepares to move Diploma Programme and Career-related Programme examinations onto laptops and desktop computers for more than 180,000 students across over 5,500 schools in more than 160 countries, with paper and digital formats running in parallel throughout the rollout: “We’re supporting schools by running paper and digital exams in parallel—helping everyone build confidence in the logistics and teaching approaches. Ensuring fairness and comparability is our top priority.” The College Board takes the same position for the Digital SAT, channeling students toward full-length practice tests inside its Bluebook app so that interface, tools, and timing mirror test-day conditions. For individual schools, keeping simulations aligned with evolving exam formats becomes a dedicated design task—one that moves faster than most internal assessment calendars can accommodate.

These shifting targets make it harder for schools to keep their simulations structurally accurate and current at the same time. Revision Village, an online revision platform for IB Diploma and IGCSE students and teachers, addresses this through its twice-yearly Prediction Exams for IB Mathematics. For each May and November session, IB examiners and experienced teachers author full mock papers that reflect recent past-paper trends in topic emphasis, style, weighting, and difficulty, then release them about a month before the live exams so students can sit them under timed conditions. The papers mirror the structure and mark allocation of the real assessments and are delivered through web browsers and through the Revision Village app that students already use for practice, recreating both the cognitive and procedural demands of the live exam. With more than 350,000 IB students from over 1,500 schools in more than 135 countries using the platform, that scale has a specific consequence: condition-matched rehearsal stops being something that only well-resourced schools can construct internally and becomes something students can access regardless of their institution’s assessment capacity. That’s the access shift the design-gap argument points toward.

When the Practice Becomes the Point

The JAMB mock UTME began as a proof of readiness and ended as something more instructive: a demonstration that exam conditions are not ambient background but the functional substrate through which assessment meaning is produced. When the servers failed and the power cut out, candidates didn’t just lose a rehearsal—they lost the specific pressure environment that the rehearsal was built to replicate, and no amount of content knowledge could substitute for that. That same logic explains why live exams attract professional invigilation infrastructure, why the IB is running parallel paper and digital formats rather than switching overnight, and why examiner-authored mocks timed to arrive before a live session carry different value than end-of-topic quizzes. Conditions are not decorative details wrapped around exam content; they help define what the exam is.

The assumption being challenged is that high-stakes exams simply reveal subject knowledge and that students who know enough will naturally perform. The evidence points instead to performance as a context-bound capability: something that emerges when knowledge is activated under particular structures, time limits, and pressures, and that therefore has to be trained under similar constraints. Preparation that ignores those constraints risks cultivating understanding that doesn’t transfer when it matters. A student who has covered every topic but never performed under the exam’s actual conditions hasn’t finished preparing—they’ve been rehearsing for a version of the assessment that won’t be the one they sit.

John AApril 13, 2026

11 8 minutes read