Skip to content

Active Speaker Detection

The pipeline detects whether the voice sound aligns with the video of the person speaking. By applying audio and visual features continuously, it can predict whether the sound matches the person in the video.

The system effectively measures the synchronisation between audio and visual elements, identifying discrepancies that may indicate deepfake content.

Input Data

  • Video file (MP4, MOV, AVI etc…)

Output Data

  • code:
    • 0 means the successful result.
    • 1 - if the person is not speaking.
    • 2 - if no face is present.
  • description - The description of the processed video.
  • result - The result of the pipeline is the score for each consecutive frame in the video file

JSON Response Example

"Active Speaker Detection": {
"code": 0,
"description": "Successful check",
"result": 96.32
}