Issue
I have trained a CNN model in Google Colab for facial expression detection with the FER2013 dataset containing 7 emotion classes ('Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral').
Used flask framework to build a web application. OpenCV’s haarcascade_frontalface_default.xml is used to detect faces. With this I'm able to do real-time live streaming of the video using my laptop's webcam and detect the facial expressions.
But when added the audio corresponding to the expressions together with the text message the live video stream is lagging (getting stuck in between) and video/audio are not in sync. How to make them in sync or how to deliver the text and audio corresponding to the expression every 3 seconds (while the video stream is going on take frames every 3 seconds and deliver the audio message corresponding to that particular frame together with the text message)?
Any help would be much appreciated, thanks in advance.
I referred the below link for app.py and index.html for webapp creation.
https://levelup.gitconnected.com/how-to-build-a-real-time-emotion-detection-web-app-ce7e3ed7b7de
This is having the text messages corresponding to the facial expressions(frame by frame), but I want the audio as well(eg: 'happy.mp3' will play for happy face expression). Audio files which I used have a size between 2-4 kb
my python file : app.py
from flask import Flask, render_template, Response
import cv2
import numpy as np
from playsound import playsound
from tensorflow.keras.models import model_from_json
from tensorflow.keras.preprocessing import image
#load model
model = model_from_json(open(r'C:\Users\HP\emotion_model.json', 'r').read())
#load weights
model.load_weights(r'C:\Users\HP\emotion_model.h5')
face_haar_cascade = cv2.CascadeClassifier(r'C:\Users\HP\haarcascade_frontalface_default.xml')
app = Flask(__name__)
camera = cv2.VideoCapture(0)
def gen_frames(): # generate frame by frame from camera
while True:
# Capture frame by frame
success, frame = camera.read()
if not success:
break
else:
gray_img= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces_detected = face_haar_cascade.detectMultiScale(gray_img, 1.32, 5)
for (x,y,w,h) in faces_detected:
print('WORKING')
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),thickness=4)
roi_gray=gray_img[y:y+w,x:x+h] #cropping region of interest i.e. face area from image
roi_gray=cv2.resize(roi_gray,(48,48))
img_pixels = image.img_to_array(roi_gray)
img_pixels = np.expand_dims(img_pixels, axis = 0)
img_pixels /= 255
predictions = model.predict(img_pixels)
#find max indexed array
max_index = np.argmax(predictions[0])
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
predicted_emotion = emotions[max_index]
print(predicted_emotion)
cv2.putText(frame, predicted_emotion, (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
# below code is commented because it is making the live video stream stuck in between(audio and video with text message not in sync)
"""if predicted_emotion=='Happy':
playsound(r'C:\Users\HP\emotions_audio\happy.mp3')
elif predicted_emotion=='Angry':
playsound(r'C:\Users\HP\emotions_audio\angry.mp3')"""
resized_img = cv2.resize(frame, (1000, 700))
ret, buffer = cv2.imencode('.jpg', frame)
frame = buffer.tobytes()
yield (b'--frame\r\n'
b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n') # concat frame one by one and show result
@app.route('/video_feed')
def video_feed():
#Video streaming route. Put this in the src attribute of an img tag
return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')
@app.route('/')
def index():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True)
My html file in templates folder: index.html
<!doctype html>
<html lang="en">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="css/bootstrap.min.css">
<link rel="stylesheet" href="css/fontAwesome.css">
<link rel="stylesheet" href="css/templatemo-main.css">
<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css"
integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
<title>Real Time Emotion Detection</title>
</head>
<body style="background-color:#002147;">
<div class="parallax-content baner-content" id="home">
<div class="container">
<div class="row">
<div class="col-lg-8 offset-lg-2">
<h3 class="mt-5"><font color="white" style="font-family:verdana;" style="font-size:300%;"><center>Real-Time Emotion Detection</center></font></h3>
<center><img src="{{ url_for('video_feed') }}" width="80%"></center>
</div>
</div>
</div>
</div>
</body>
</html>
Solution
This is because the playsound
execution is synchronous.
Before going to the next frame detection, playsound
must be executed completely, introducing the lag you experience.
There is the block flag in playsound
:
import playsound
playsound.playsound('test.mp3', block=False)
which works on Linux but not in Windows, I've tested myself.
If this is your case I would highly suggest to use separate threads/processes for the video processing and audio playback.
Something like this:
import playsound
import threading
import time
def sound(play_sound: threading.Event):
while True:
play_sound.wait()
print('playing sounds...')
time.sleep(3) # playsound.playsound('test.mp3')
print('playing sounds...done')
play_sound.clear()
def video(play_sound: threading.Event):
i = 1
while i < 10:
print(f'processing frame {i}')
# processing camera info here
if i % 4 == 0: # your facial detection activates
if not play_sound.is_set():
play_sound.set()
time.sleep(1)
i += 1
def main():
play_sound = threading.Event()
sound_thread = threading.Thread(target=sound, args=(play_sound,))
video_thread = threading.Thread(target=video, args=(play_sound,))
# make the sound thread daemon otherwise the program won't terminate with the video thread.
sound_thread.daemon = True
video_thread.start()
sound_thread.start()
# it is sufficient to wait for video thread termination
video_thread.join()
if __name__ == '__main__':
main()
Answered By - SystemSigma_
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.