Detecting Deception, Not Deepfakes

Abstract

For nearly a decade, deepfake detection has been framed as a classification task: given an audio or video clip, decide whether it is real or synthetic. Top detectors often report high accuracy on standard benchmarks; however, performance drops sharply on content from newer or unseen generators. We argue that better classifiers of synthetic media alone will not solve this problem, especially for interactive deepfakes such as impersonation in video and voice calls, where the harm lies not in the artifact (manipulated media signal) but in the act of deception.

Deepfake detection therefore requires a complementary analytical layer focused on communicative interaction, not just media realism. We identify five assumptions that artifact-based detection (the forensic analysis of low-level signal traces) relies on and show that all five are eroding as generative models improve, producing what we call the Generalization Illusion. To address this, we draw on three well-established frameworks from philosophy of language and social psychology, namely, Speech Act Theory, Grice's Cooperative Principle, and Cialdini's Principles of Influence, to examine forensic signals at three levels: the utterance, the conversation, and the listener response. The result is a unified framework that complements existing forensic methods. We close with open problems for future work.

Position Statement

Our Position Current deepfake detectors focus on identifying traces left by specific generative tools, rather than detecting synthesis itself. As these tools evolve, their traces fade and detection performance declines. We argue that the field needs a new analytical target, shifting from the media artifact to the communicative interaction, and from synthetic realism to operational deception, while operating alongside existing media classification rather than replacing it.

Current detectors ask "was this generated by a machine?" We argue the right question is "is this being used to deceive someone?" Both are classification problems, but the second requires inputs that current detectors typically ignore: speech acts, conversational coherence, and influence patterns, rather than pixels and frequencies. In this paper, we take the position that this is not an engineering shortfall but a category error: media synthesis detection has been mistaken for the defining question, when it should be treated as one signal within the larger problem of deception detection.

Diagnosis: The Generalization Illusion

Existing deepfake detectors rest on a set of forensic premises that were reasonable when introduced but are now eroding due to advances in generative modelling. We identify five such premises (P1–P5). They do not fail independently — they fail jointly, producing a systematic overestimation of real-world capability from static benchmarks that we term the Generalization Illusion.

P12017–2022

Spatial artifact persistence

Face manipulation introduces detectable irregularities at blending boundaries, in warping fields, or in local textures. CNN-based detectors such as XceptionNet, Face X-ray, and LAA-Net operationalize this by learning to localize boundary artifacts.

Eroded by: end-to-end diffusion-based generators that synthesize entire frames without a discrete blending step.

P22019–2022

Frequency-domain signatures

Generative models leave characteristic spectral fingerprints — checkerboard artifacts from transposed convolutions, anomalous frequency-energy distributions — that distinguish them from natural images. Targeted by F3-Net, FreqNet, FE-CLIP.

Eroded by: hybrid pipelines, frequency-domain post-processing, and signature mismatch between newer generators and older detectors.

P32019–2023

Temporal signals

Frame-by-frame generation introduces flicker, identity drift, and unnatural micro-movements detectable by modeling temporal dynamics. Methods include FTCN, AltFreezing, MSVT, and Temporal Coherence Networks.

Eroded by: temporally-aware generators, motion stabilization, and advances in frame consistency and interpolation.

P42018–2023

Biological signal violation

Synthetic faces fail to reproduce subtle physiological signals: blink patterns, gaze stability, micro-expression timing, and rPPG-derived heart-rate variation. Operationalized by FakeCatcher, DeepRhythm.

Eroded by: high-resolution and temporally consistent generators that can preserve or imitate even rPPG signals.

P52018–present

Signal survival under compression and distribution

The signals detectors rely on — spatial, spectral, temporal, biological — survive lossy real-world distribution channels: video compression, social-media re-encoding, screen capture, conferencing codecs.

Weakly validated: P5 is a meta-premise that gates the observability of every other signal at deployment. Most detectors are evaluated on minimally compressed lab data.

How deception is actually caught

The record of real-world deepfake attacks reveals a consistent pattern. When attacks are stopped, it is because a human operator detected a contextual anomaly; when they succeed, it is because no such verification occurred. In neither case does automated media forensics play a meaningful role.

UK Energy (2019). €220,000 lost to a voice-cloned CEO. Discovered only after a follow-up call raised contextual suspicion about the request — not the audio signal.
Arup (2024). $25.5M lost to a multi-person deepfake video conference. Discovered weeks later through financial reconciliation — not by any automated detection system.
Ferrari (2024). Attempted voice-clone of the CEO. Stopped when the executive asked a personal question only the real CEO could answer.

In each case, detection relied on contextual signals outside the current paradigm. Current detection methods do not capture these signals because they were not designed to.

A Three-Layer Framework

We propose interaction forensics: a complementary layer that targets behavioural signals beyond the reach of artifact-based analysis, operating in parallel with existing detection pipelines. The framework decomposes an interaction into three analytical layers, drawing on three theories from linguistics and social psychology.

L1 / Utterance

Illocutionary analysis

— Searle's Speech Act Theory

Analyzes individual utterances to ask: what is the speaker doing, and does it fit their role and context? Identity claims are treated as checkable assertions, where "checkable" is operationalized through out-of-band verification — callbacks, prior shared context, challenge-response — rather than real-time media analysis. Signals include vague answers under verification pressure, unsolicited self-identification, and resistance to verification.

A deepfake video of a CEO saying "transfer $200,000 immediately" is a directive whose validity conditions — real authority, appropriate channel, sincerity — are systematically unsatisfied by design, despite high visual realism.

L2 / Conversation

Conversational norm analysis

— Grice's Cooperative Principle

Evaluates how well a conversation follows basic communication norms across its full flow, drawing on the four maxims: Quantity (right amount of information), Quality (truthful), Relation (relevant), Manner (clear and natural). Critically, this layer distinguishes deception from legitimate urgency: genuine urgent requests remain contextually coherent and admit verification; deceptive ones suppress verification and disrupt conversational norms.

A deepfake "manager" requesting urgent payment may falsely claim an identity, provide unnecessary detail, and insert a financial request into routine communication. The message seems normal — but it breaks the cooperative rules it appears to follow.

L3 / Recipient

Coercion pattern analysis

— Cialdini's Principles of Influence

Examines how the communication attempts to influence the target. The key signal is not whether influence is present, but how intensely it is applied and how many tactics co-occur. Deepfake fraud typically combines authority, scarcity, social proof, reciprocity, commitment, liking, and unity within a single interaction at atypical density.

"This is the CEO. Transfer the funds immediately. The finance team is aware, and it must be done within the hour" combines authority + social proof + scarcity in one utterance. Each cue benign in isolation; their combined density signals manipulation.

Cross-layer integration

The three layers are complementary lenses, and their joint signal is more informative than any layer in isolation. Each layer produces an independent deception score; an aggregate is computed as a weighted combination, with both an aggregate threshold and per-layer override thresholds to avoid the vulnerability of strict-agreement requirements. Weights and thresholds are deployment-dependent and form part of the calibration agenda.

Evaluation: Beyond AUC

The framework cannot be evaluated using the benchmarks that dominate deepfake detection research. Existing benchmarks test classifiers on isolated clips and report binary accuracy or AUC. AUC remains useful for evaluating media-classification subcomponents, but is insufficient as a primary summary for interaction-grounded deception: it averages over operating points, is invariant to deployment base rates, and reduces latency to a single number. We propose evaluating defences on complete interaction scenarios using four complementary metrics that surface these dimensions explicitly.

Attack Prevention Rate (APR)

The fraction of attack scenarios in which the defence intervenes before the target executes the requested harmful action. Correct decisions issued after compliance contribute nothing to APR.

Benign Pass-Through Rate (BPR)

The fraction of legitimate interactions that proceed without unwarranted intervention. APR alone is gameable — a defence that intervenes on every interaction trivially achieves APR = 1 — so we pair it with BPR. A useful defence achieves both, with their trade-off made explicit on the APR–BPR plane.

Precision at fixed APR

The proportion of flagged interactions that are genuine attacks, evaluated at a chosen attack prevention rate and under deployment-realistic base rates. Because deepfake fraud has very low base rates in deployment, even high BPR can mask operationally untenable false-positive volumes at scale. Precision at fixed APR surfaces this cost directly.

Intervention latency

The time elapsed from the start of an interaction to the system's intervention decision (block, alert, or escalation). A defence achieving high APR at 60 seconds is qualitatively different from one achieving the same APR at 5 seconds in attacks where compliance occurs within ten seconds. Latency must be reported alongside APR and BPR, not collapsed into them.

Scenario design

Scenarios should be characterized along five dimensions: attack type (CEO fraud, invoice redirection, phishing escalation), victim persona (finance, HR), modality and channel (audio, video, synchronous or asynchronous), interaction length (single or multi-step), and compliance setup (scripted decisions, simulated agents, or human studies). A useful benchmark may include 10²–10³ scenarios across ~10 attack types, ~5 personas, and 2–3 modalities, with matched benign cases for BPR.

Research Agenda

Our framework is conceptual: it defines what an interaction-grounded detection system should analyze, not how to build one from current components. The building blocks at each layer are at different maturity levels — mature for text-based speech-act classification, descriptive for Gricean conversational analysis, and emerging for influence-pattern detection in interactive settings. Making the framework practical requires solving five problems whose foundations exist in adjacent domains but have not been integrated for interactive deepfake deception.

Deception benchmarks. Scenario-based benchmarks for deception recognition, with attack and matched-benign cases evaluated against APR, BPR, precision at fixed APR, and intervention latency. Annotated not only for synthetic/real labels, but for deceptive intent, influence tactics, validity-condition violations, and communicative context.
Speech-act classification for multimedia. Extending speech-act classification to video calls, voice messages, and multimodal social posts. LLM-based methods reach near-human performance on text, but no published system performs real-time speech-act verification against speaker-role expectations in interactive video or voice settings.
Conversational anomaly detection. Discourse models that detect systematic violations of cooperative communication norms. The central challenge is distinguishing covert violations intended to mislead from flouting — the overt, cooperative deviations (sarcasm, politeness, hyperbole) that characterize routine interaction. Without this distinction, systems will produce unworkable false-positive rates.
Influence-tactic detection at scale. Detectors for authority exploitation, urgency framing, social-proof fabrication, and reciprocity manipulation in real-time communication. Prior work in phishing and propaganda detection provides a foundation; calibration for the density and combination patterns observed in deepfake fraud specifically is the open problem.
Aggregating layer signals. The three communication layers produce complementary scores that must be combined into a deception estimate. Lightweight rule-based combinations offer interpretability and low data requirements; learned classifiers — including LLM-based judges that reason over all three layers jointly — offer flexibility but require labelled scenarios. The right aggregator is an open methodological question.

Several deployment constraints warrant acknowledgement. Real-time inference across three layers may exceed latency budgets, suggesting a staged pipeline that begins with lightweight transcript analysis and escalates as needed. Continuous transcription raises privacy concerns, requiring consent, on-device processing, and data minimization. Adversarial adaptation is expected: as attackers learn these signals are monitored, they will adjust their strategies. Humans remain the final line of defence; these systems should support, not replace, their judgment.