Fake Voice Detection
Fake Voice Detection

Fake Voice Detection

🌐 Website: echo-reality-check.vercel.app

Overview

Our platform provides a powerful solution to detect whether a given voice recording is real (human) or synthetically generated by AI. As synthetic voices become increasingly indistinguishable from real human speech, the risk of audio-based impersonation and misinformation is growing rapidly. 

What It Does

Users can upload any audio clip — from phone calls, voice notes, podcast snippets, or social media uploads — and our system will analyze the audio using state-of-the-art models trained on both real and synthetic datasets. Within seconds, the platform provides a result indicating whether the voice is real or AI-generated, along with a confidence score and key signal indicators.

How It Works

  • Models : We are using an ensemble learning approach, combining multiple models specialized in audio anomaly detection and deepfake recognition. The ensemble gives a prediction confidence score indicating how strongly the system believes the audio is real or fake.
  • Input Methods : Users can submit audio by upload via UI option or API Integration. Both routes support near real-time inference and return structured prediction data.
  • Final Verdict
    • Classifies the clip as either "Real" or "Fake".
    • Confidence Score: A percentage indicating how confident the model is in its prediction.
    • Feature-Based Explanation: Highlights the specific audio features that influenced the decision, such as unnatural frequency shifts, missing micro-modulations, or robotic harmonics.
    • Deviation Metrics: Shows how much the audio differs from the learned human baseline across the most deviated features.
  • The platform is optimized for speed. Typical analysis time is under 5 seconds, making it usable in real-time applications such as voice verification, emergency fraud checks, or content screening pipelines.
  • Dataset: We have trained it on the dataset with both real human voices and AI-generated samples.
  • Features Used 
    • We have trained our models initially on a set of 72 extracted audio features like Mel-spectrograms or MFCCs, spectral centroid etc
    • These features feed into both the detection models and the explanation engine, making results both accurate and interpretable.

💻 Try It Out

  1. Visit echo-reality-check.vercel.app to check our POC.
  2. Click on one of the 15 available audio samples

  3. Get an instant prediction with a confidence score

  4. Compare how well the model performs on real vs fake voices