Speech Emotion Recognition Model Using Python and Machine Learning in 9steps

Project on Speech Emotion Recognition Using Machine learning and python

Speech is the most natural way of expressing ourselves as humans.

Feature Extraction:

Feature Extraction


Steps to Follow:

speech emotion recognition model

1. Importing essential Libraries

importing libraries
import glob                         
import os
import librosa
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation from keras.layers import Dropout
from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

2. Now We have to extract features and parse audio wav from our dataset for that I have written a function

feature extraction
def extract_feature(file_name): 
X, sample_rate = librosa.load(file_name)
stft = np.abs(librosa.stft(X))
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0)
tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X),
return mfccs,chroma,mel,contrast,tonnetz
def parse_audio_files(parent_dir,sub_dirs,file_ext="*.wav"):
features, labels = np.empty((0,193)), np.empty(0)
for label, sub_dir in enumerate(sub_dirs):
for fn in glob.glob(os.path.join(parent_dir, sub_dir, file_ext)):
mfccs, chroma, mel, contrast,tonnetz = extract_feature(fn)
except Exception as e:
print ("Error encountered while parsing file: ", fn)
ext_features = np.hstack([mfccs,chroma,mel,contrast,tonnetz])
features = np.vstack([features,ext_features])
labels = np.append(labels, fn.split('\\')[2].split('-')[2])
return np.array(features), np.array(labels, dtype = np.int)

3. Now we are required to Hot Encode our Dataset labels to convert our categorical data

hot encoding
def one_hot_encode(labels):                           
n_labels = len(labels)
n_unique_labels = len(np.unique(labels)) one_hot_encode = np.zeros((n_labels,n_unique_labels+1)) one_hot_encode[np.arange(n_labels), labels] = 1 one_hot_encode=np.delete(one_hot_encode, 0, axis=1) return one_hot_encode

4. Now we need to save all our essential extracted features into variables X and Y

save extracted features
main_dir = 'D:\Audio_Speech_Actors_01-24'                         sub_dir=os.listdir(main_dir)                        
print ("\ncollecting features and labels...") print("\nthis will take some time...") features, labels = parse_audio_files(main_dir,sub_dir) print("done")
labels = one_hot_encode(labels) np.save=('y', labels)

5. As the feature extraction is done in the above steps we need to load our data and then split it for training and testing

splitting and training the data
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.33, random_state=20)

6. Now one of the Important parts of this ML model is its Deep Neural network architecture that is as follow:

DNN architecture
n_dim = train_x.shape[1]                        
n_classes = train_y.shape[1] n_hidden_units_1 = n_dim
n_hidden_units_2 = 400
n_hidden_units_3 = 200
n_hidden_units_4 = 100
#defining the model
def create_model(activation_function='relu',
optimiser='adam', dropout_rate=0.2): model = Sequential()
# layer 1 model.add(Dense(n_hidden_units_1, input_dim=n_dim, activation=activation_function))
# layer 2 model.add(Dense(n_hidden_units_2, activation=activation_function)) model.add(Dropout(dropout_rate))
# layer 3 model.add(Dense(n_hidden_units_3, activation=activation_function)) model.add(Dropout(dropout_rate))
model.add(Dense(n_hidden_units_4, activation=activation_function)) model.add(Dropout(dropout_rate))
# output layer
model.add(Dense(n_classes, activation='softmax')) #model compilation model.compile(loss='categorical_crossentropy', optimizer=optimiser, metrics=['accuracy'])
return model

7. Now its Time to Fit our Model

fitting the model
model = create_model()                         
#train the model
history = model.fit(train_x, train_y, epochs=200, batch_size=4)
Epoch 1/200
964/964 [==============================] - 2s - loss: 2.2671 - acc: 0.1494
Epoch 2/200
964/964 [==============================] - 1s - loss: 1.9933 - acc: 0.2106
Epoch 3/200
964/964 [==============================] - 1s - loss: 1.9295 - acc: 0.2106
Epoch 4/200
964/964 [==============================] - 1s - loss: 1.8740 - acc: 0.2355

Epoch 199/200
964/964 [==============================] - 1s - loss: 0.1319 - acc: 0.7021
Epoch 200/200
964/964 [==============================] - 1s - loss: 0.4685 - acc: 0.6302

8. Now we Predict our Model and its accuracy using

Predicting the accuracy
predict=model.predict(test_x,batch_size=4)accuracy=accuracy_score(y_true=y_test, y_pred=predict)
print("Accuracy: {:.2f}%".format(accuracy*100))

9. Now we will perform EDA on our Model to get a better understanding

eda of the trained model
emotions=['neutral', 'calm', 'happy', 'sad', 'angry', 'fearful', 'disgust', 'surprised']                                                y_pred = np.argmax(predict, 1)                         predicted_emo=[]                          
for i in range(0,test_y.shape[0]): emo=emotions[y_pred[i]] predicted_emo.append(emo)
actual_emo=[] y_true=np.argmax(test_y, 1)
for i in range(0,test_y.shape[0]):
cm =confusion_matrix(actual_emo, predicted_emo) index = ['angry', 'calm', 'disgust', 'fearful', 'happy', 'neutral', 'sad', 'surprised']
columns = ['angry', 'calm', 'disgust', 'fearful', 'happy', 'neutral', 'sad', 'surprised']
cm_df = pd.DataFrame(cm,index,columns) plt.figure(figsize=(10,6)) sns.heatmap(cm_df, annot=True)

The above result is the heatmap of our model created using the Seaborn library in python.


Future Scope:

Thank You for Reading!

Computer Science Engineer || Tech Enthusiast || Blogger || Digital Marketer || ML || DevOps || Cloud computing || Automation || Flutter App Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store