Active Speaker Detection
The pipeline detects whether the voice sound aligns with the video of the person speaking. By applying audio and visual features continuously, it can predict whether the sound matches the person in the video.
The system effectively measures the synchronisation between audio and visual elements, identifying discrepancies that may indicate deepfake content.
Input Data
- Video file (MP4, MOV, AVI etc…)
Output Data
code
:0
means the successful result.1
- if the person is not speaking.2
- if no face is present.
description
- The description of the processed video.result
- The result of the pipeline is the score for each consecutive frame in the video file
JSON Response Example
"Active Speaker Detection": { "code": 0, "description": "Successful check", "result": 96.32}