Back to Blog

Latest Techniques for Detecting GPT-4 and Claude-Generated Text

The Evolving Landscape of AI Text Generation

The pace of advancement in large language models has been staggering. From GPT-3, which first demonstrated that AI could produce convincing prose, to GPT-4, GPT-4o, and Anthropic's Claude family, each generation has produced text that is harder to distinguish from human writing. For those tasked with detecting AI-generated content — educators, publishers, compliance officers — understanding these advances and the detection methods designed to counter them is essential.

This article traces the evolution of AI text generation, examines why newer models present greater detection challenges, and surveys the cutting-edge techniques being developed to meet those challenges.

From GPT-3 to GPT-4o: A Detection Perspective

GPT-3 and GPT-3.5 (2020-2022)

GPT-3, with its 175 billion parameters, was a breakthrough in text generation quality. However, its output exhibited several detectable characteristics:

  • Repetitive phrasing patterns, particularly in longer texts
  • Difficulty maintaining coherent arguments over extended passages
  • A tendency toward generic, surface-level content
  • Noticeable patterns in sentence structure and transitional language

GPT-3.5, the model behind the original ChatGPT release, improved on these issues but retained many detectable patterns. Early detection tools achieved high accuracy rates (95%+) against GPT-3.5 output.

GPT-4 (March 2023)

GPT-4 represented a significant leap in text quality. Its output showed markedly improved characteristics:

  • More nuanced argumentation and reasoning
  • Better maintenance of coherence over long passages
  • Greater vocabulary diversity and more natural-sounding prose
  • Improved ability to adopt specific tones, styles, and registers

These improvements made GPT-4 text harder to detect, with some studies showing initial detection accuracy dropping by 10-15 percentage points compared to GPT-3.5. However, detection tools adapted quickly, and current tools have largely closed this gap.

GPT-4o and GPT-4 Turbo (2024)

OpenAI's GPT-4o ("omni") model, released in May 2024, brought multimodal capabilities but also further refined text generation. The "Turbo" variants optimized for speed and cost efficiency generate text that is virtually indistinguishable from GPT-4 output from a detection standpoint. Key changes from a detection perspective include:

  • More human-like variation in sentence length and structure
  • Better integration of domain-specific knowledge
  • Reduced tendency toward formulaic introductions and conclusions
  • Improved handling of nuance and ambiguity

Claude 3.5 and Claude 4 (2024-2025)

Anthropic's Claude models have introduced additional detection challenges. Claude's training approach, which includes Constitutional AI (RLHF with a constitution of principles), produces text with distinctive characteristics:

  • A tendency toward balanced, carefully hedged statements
  • Strong adherence to factual accuracy (reducing hallucination-based detection)
  • Natural-sounding qualifications and caveats
  • A writing style that many readers describe as thoughtful and measured

Claude-generated text can be particularly challenging to detect because its training explicitly optimizes for helpful, harmless, and honest communication — qualities that also characterize good human writing.

Why Newer Models Are Harder to Detect

Reduced Statistical Fingerprints

Each generation of language model reduces the statistical fingerprints that detection tools rely on. Earlier models produced text with clearly lower perplexity and burstiness than human writing. Newer models have narrowed this gap significantly:

  • Perplexity distributions of GPT-4o text overlap substantially with human-written text
  • Burstiness patterns in Claude output can closely mimic human variation
  • Token probability distributions become more human-like with each generation

Instruction-Following Capabilities

Modern models can follow detailed instructions about writing style, which means users can explicitly request characteristics that defeat simple detection methods. Instructions like "vary your sentence length," "include personal anecdotes," or "write in a casual tone" can produce text that lacks the uniform, polished quality that detection tools look for.

Fine-Tuning and Custom Models

The availability of fine-tuning APIs allows users to create custom model variants that may produce text with different statistical properties than the base models. While this is not yet widespread in academic dishonesty, it represents a growing concern.

Current Detection Approaches

1. Statistical Methods

Statistical detection methods analyze the mathematical properties of text without relying on a trained classifier:

  • Perplexity and burstiness analysis: A foundational approach in AI detection, measuring how predictable and uniform the text is
  • Log-likelihood analysis: Examining the probability that a language model would generate each token in sequence
  • DetectGPT (probability curvature): Developed by researchers at Stanford, this method perturbs text slightly and measures whether the resulting changes decrease the log-probability more than expected for human-written text
  • Entropy analysis: Measuring the information content at various levels (word, sentence, paragraph)

Statistical methods have the advantage of being model-agnostic — they can potentially detect text from any AI model, including models not yet released. Their main limitation is that they become less effective as AI text becomes more statistically similar to human text.

2. Neural Classification

Neural classifiers are trained on large datasets of human-written and AI-generated text to learn distinguishing features:

  • Transformer-based classifiers: Models like RoBERTa fine-tuned on detection tasks can achieve high accuracy on known model outputs
  • Ensemble methods: Combining multiple classifiers (statistical + neural) to produce more robust predictions
  • Adversarial training: Training classifiers against adversarially modified AI text to improve robustness
  • Multi-model training: Training on output from multiple AI models to generalize better to unseen models

AIDetector.ch's detection engine combines statistical features with neural classification in an ensemble architecture. This multi-signal approach helps maintain accuracy even as individual signals become weaker with newer models.

3. Watermarking

Watermarking is a proactive approach where the AI model itself embeds a detectable signal in its output:

  • Statistical watermarking: As proposed by Kirchenbauer et al. (2023), this approach biases the model's token selection toward a "green list" of tokens, creating a statistically detectable pattern that is invisible to human readers
  • Semantic watermarking: Encoding signals in the semantic content of the text rather than in token-level statistics
  • Instruction-based watermarking: Embedding specific phrases or patterns that serve as watermarks

Watermarking has significant promise but faces practical challenges: it requires cooperation from model providers, it can be removed by paraphrasing, and it raises questions about the integrity of AI-generated text for legitimate uses.

OpenAI revealed in 2024 that it had developed an effective watermarking system for ChatGPT but had not deployed it, citing concerns about competitive impact and potential bias against non-native English speakers.

4. Stylometric Analysis

A newer approach combines AI detection with traditional stylometry — the statistical analysis of writing style:

  • Comparing a suspected text against a student's known writing profile
  • Identifying deviations in vocabulary richness, sentence complexity, and stylistic preferences
  • Detecting sudden changes in writing quality or style within a document

This approach is particularly promising for educational contexts where baseline writing samples are available.

The Arms Race: Generation vs. Detection

The relationship between text generation and detection is often described as an arms race. As detection tools improve, techniques to evade them also advance:

  • Paraphrasing attacks: Running AI text through paraphrasing tools to alter its statistical properties
  • Human-AI collaboration: Using AI to generate a draft, then substantially rewriting it — making the result genuinely collaborative
  • Prompt engineering: Crafting prompts that instruct the model to write in a more "human" style
  • Back-translation: Translating AI text to another language and back to alter its statistical fingerprint

However, this framing is somewhat misleading. Detection does not need to be perfect to be useful. Even imperfect detection serves as a deterrent and as one data point in a broader integrity assessment.

The Future of AI Detection Technology

Several developments are likely to shape the next generation of detection tools:

  • Provenance-based approaches: Rather than analyzing text properties, tracking the origin and creation process of documents through metadata and digital signatures
  • Collaborative detection networks: Institutions sharing anonymized detection data to build more robust classifiers
  • Regulatory requirements: The EU AI Act and potential Swiss legislation may require AI models to include detectable markers in their output
  • Multimodal detection: As AI models produce increasingly multimodal content (text + images + code), detection tools will need to analyze across modalities
  • Personalized baselines: Building individual writing profiles for each student to detect deviations from their established style

Practical Recommendations

For Swiss educators and institutions dealing with the latest AI models:

  1. Stay current: Detection tools are updated regularly. Use platforms like AIDetector.ch that keep pace with new models.
  2. Layer your approach: No single detection method is sufficient. Combine automated detection with process documentation, oral assessment, and pedagogical design.
  3. Educate yourself: Understanding how AI models work helps you evaluate detection results and spot AI-generated text manually.
  4. Focus on learning: Ultimately, the goal is not to catch cheaters but to ensure that students are actually learning. Design assessments that make genuine engagement more rewarding than AI assistance.

Sources

  • OpenAI, "GPT-4 Technical Report," arXiv:2303.08774, 2023.
  • Anthropic, "The Claude Model Card and Evaluations," Model Documentation, 2024.
  • Kirchenbauer, J. et al., "A Watermark for Large Language Models," Proceedings of ICML, 2023.
  • Mitchell, E. et al., "DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature," Proceedings of ICML, 2023.
  • Sadasivan, V.S. et al., "Can AI-Generated Text be Reliably Detected?" arXiv preprint arXiv:2303.11156, 2023.
  • Tian, E. & Cui, A., "Towards Detection of AI-Generated Text using Zero-Shot and Statistical Methods," Princeton University, 2023.
  • OpenAI, "Understanding Our Approach to AI Text Watermarking," Blog Post, 2024.
  • European Commission, "Regulation (EU) 2024/1689 — Artificial Intelligence Act," 2024.