Integration of Wavelet and Recurrence Quantification Analysis in Emotion Recognition of Bilinguals

Background : This study offers a robust framework for the classification of autonomic signals into five affective states during the picture viewing. To this end, the following emotion categories studied: five classes of the arousal-valence plane (5C), three classes of arousal (3A), and three categories of valence (3V). For the first time, the linguality information also incorporated into the recognition procedure. Precisely, the main objective of this paper was to present a fundamental approach for evaluating and classifying the emotions of monolingual and bilingual college students. Methods: Utilizing the nonlinear dynamics, the recurrence quantification measures of the wavelet coefficients extracted. To optimize the feature space, different feature selection approaches, including generalized discriminant analysis (GDA), principal component analysis (PCA), kernel PCA, and linear discriminant analysis (LDA), were examined. Finally, considering linguality information, the classification was performed using a probabilistic neural network (PNN). Results: Using LDA and the PNN, the highest recognition rates of 95.51%, 95.7%, and 95.98% were attained for the 5C, 3A, and 3V, respectively. Considering the linguality information, a further improvement of the classification rates accomplished. Conclusion: The proposed methodology can provide a valuable tool for discriminating affective states in practical applications within the area of human-computer interfaces.

Int Clin Neurosci J. Vol 7, No 1, Winter 2020 36 journals.sbmu.ac.ir/Neuroscience http Concerning feature extraction, the literature mainly focused on the traditional standard approaches, namely time and frequency-based analysis. [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35] In a few studies based on common features, promising classification scores (>90%) have accomplished. 22,33 Although they were efficacious in the problem of affect recognition, the number of emotion categories was too limited. Also, these methodologies are supportive for the processing of linear and stationary signals. Therefore, they are not consistent with the chaotic and nonstationary nature of biomedical data. Some efforts have been made using the wavelet transforms. 36,37 Applying the wavelet-based approach in combination with the Fisher classifier, the accuracy of 92.6% achieved. 36 As wavelet may not be proficient in extracting all the information of bio-signals, some researchers examined nonlinear based indices. 7,9,[38][39][40][41][42] Using nonlinear features to the quadratic discriminant classifier, a remarkable increment in the recognition rate, about 90%, was accomplished. 38 This manuscript examined the effectiveness of nonlinear and wavelet-based indices integration in the problem of affect recognition by ANS signals.
Affective computing is still in its infancy, and so far, the achievement of a fruitful and productive computing agenda has not realized. The diversity of intelligent algorithms, as well as computer users' characteristics, behaviors, reactions, and differences, may seem to be the reason. As each subject perceives emotions differently, the system should prepare in a way that can naturally adapt to the operators' emotional states. Consequently, more effort is required to find new correspondences between individuals and emotions. In this context, some characteristics of the users have observed in the literature. The impact of age, gender, and culture extensively examined in the field of emotion [43][44][45][46] and rarely considered in the design of the active machine. 47 To our knowledge, the linguality (being mono or bi-lingual) information has not tested in affective processing; however, some evidence of differences in the emotional processing between monolinguals and bilinguals observed in the literature. 48,49 This possibility was investigated in this study.
The current paper presents an efficient affect recognition system considering linguality differences. We postulate a preliminary hypothesis that being bilingual or monolingual is influential in the processing of emotional stimuli. To this end, some nonlinear features of wavelet coefficients extracted from ANS signals. The organization of this investigation is as follows. Primarily, the data recording process and the experimental procedure concisely described. Second, the strategies adopted for the feature extraction, feature selection, and classification provided. Subsequently, the experimental results presented. Finally, the conclusions offered.

Materials and Methods
ECG and PR signals collected by employing emotion evocative visual stimuli. Signals were filtered by a notch filter at 50 Hz and were segmented based on the blocks of affective contents. The normalized data were decomposed into different wavelet levels (using Daubechies 4 at level 8). Subsequently, the recurrence quantification analysis (RQA) performed on each wavelet coefficient. The RQA measures were RR, L max , V max , DET, LAM, L, and ENTR, which stand for recurrence rate, maximum diagonal line length, maximal vertical line length, determinism, Laminarity, mean diagonal line length, and the entropy of the diagonal line lengths, respectively. Next, different feature selection approaches, including principal component analysis (PCA), linear discriminant analysis (LDA), generalized discriminant analysis (GDA), and kernel PCA (K-PCA) were evaluated. Finally, considering linguality information, the classification was performed utilizing a probabilistic neural network (PNN). The proposed emotion recognition system demonstrated in Figure 1.

Data
In this experiment, 47 college students were participated, comprising 31 women (19-25 years; mean age of 21.90 ± 1.7 years) and 16 men (19-23 years; mean age of 21.1 ± 1.48 years). All subjects were Iranian, including 15 monolinguals and 32 bilinguals. The ECG and PR of them were collected. For emotion elicitation, pictures from the International Affective Picture System (IAPS) used. 50 The images grouped into three categories of emotional schemes. 8 In the beginning, the participant's agreement to be included in the trial acquired by a signed consent form. All participants reported no history of neurological disease, cardiovascular infection, epileptic, and hypertension diseases. Before data recording, they have not any caffeine, salt, or fatty diets for 2 hours. The volunteers were tutored to stay motionless during the experiment, particularly for their fingers, hands, and legs.
The procedure lasted for about 15 minutes. The stimuli have shown after the initial baseline measurement (rest). For two minutes (initial rest), their eyes were open while viewing a blank screen. Afterward, 28 blocks of images (which balanced among volunteers) were arbitrarily presented to prevent habituation in them. Each block incorporated 5 pictures with identical emotion and displayed about 15 seconds to warrant the stability of the emotion over time. Immediately, 10 seconds of the blank screen was shown to let the physiological variations returning to the reference point and promises the consistency in the demonstration of different emotional images. Instantly, a white plus was appeared for 3 seconds to prepare the subjects for the next block and remind them to concentrate and look at the middle of the monitor. Also, they were self-assessed their emotions. Figure 2 demonstrates the protocol description.
The ECG was acquired from the lead I using disposable

Feature Extraction Wavelet Transforms
Decomposition of a signal into some scales, each demonstrating a particular roughness of the signal, is feasible by the multi-scale feature of the wavelet transform. Each step comprises digital filters and down-samplers by two. The first filter, g [.], is a high-pass (HP) filter, and the second filter, h[.] is a low-pass (LP) one. The outputs are down-sampled by a factor of two to offer the detail coefficient and the approximation coefficient,  correspondingly. The first approximation coefficient (A1), further breaks down similar to the previous stage.
All wavelet transforms, can be evaluated by an LP filter, H, satisfy the following condition: The z-transform of the filter h demonstrated by H(z). The HP filter defined as: Considering the initial condition, H 0 (z) = 1, an increasing length series of filters can achieve: In the time domain, it is expressed by (4): Where k is the sampled discrete-time and [.] ↑m denotes the up-sampling by a scale of m. The standardized wavelet φ i,1 (k) and scale basis function ψ i,1 (k) considered by (5): Where i is the scale, and l is the translation parameter. Additionally, 2 i/2 is the inner product normalization. The decomposition of discrete wavelet transforms 51 is determined by (6): journals.sbmu.ac.ir/Neuroscience http where the detail and approximation coefficients at the resolution i were symbolized by d (i) (l) and a (i) (l), respectively.
In the present study, wavelet decomposition was done at level eight, applying Daubechies wavelet (db4) since it is morphologically similar to the signal pattern.

RQA Parameters
The RQA measures typically estimated by using the recurrence point density as well as the structures of diagonal and vertical lines. 52 We calculated the following RQA descriptors: RR, DET, L, L max , ENTR, LAM, and V max . The recurrent point's percentage in an RP (recurrence plot) calculated by RR, where their density is measured by (7).
The DET is known as a function of the histogram P(ε, l) of diagonal lines of length l.

 
In the following equations, the symbol ε neglected (i.e., P(l) instead of P(ε, l)). The determinism is identified by: Where the numerator was the recurrent points constructing diagonal lines, and the denominator shows these points on the main diagonal. The tangential motion of the phase space trajectory can develop the diagonal lines, which are omitted by the threshold l min .
If two segments of the trajectory are close to each other, the average time of their closeness calculated by L.
This parameter also established as the mean prediction time.
In the RP, the exponential divergence of the trajectory characterized by the most extended diagonal line (L max ): Entropy rises to the Shannon information entropy of line segment distribution and defined as (12).
Where ENTR reflects the complexity of the diagonal line. Laminarity is defined as follows: Naturally, it is the ratio between the vertical structures and the entire set of recurrence points, while P(v) defined as (14): In other words, the occurrence of laminar states for the system (not the length of these laminar phases) introduced by LAM. The maximum length of the vertical lines (V max ) is defined by: where NV introduces the absolute number of vertical lines.
In the current study, recurrence quantification analysis is performed applying MATLAB Toolbox. 52

Feature Selection
The possibility of the curse of dimensionality, computational costs, and deprived classification accuracies are the consequences of the high-dimensional feature space. In the current study, to select relevant features of the extracted features, LDA, PCA, KPCA, and GDA were evaluated.

LDA
LDA is a supervised feature selection technique. It attempts to discover a linear mapping M that is aiming to maximize the direct class discrimination in the lowdimensional representation of the data. 53 This has performed by assigning a linear mapping (M), which maximizes the Fisher criterion. Suppose C as the set of promising classes and c c X X − cov as the covariance of the zero mean points x i apportioned to c ϵ C. Besides, assume that the covariance matrices of the zero-mean data X and the cluster means are shown by Where p c characterizes the class before the class label c. Then, in the low-dimensional demonstration of the data, to optimize the proportion of the S w and the S b , LDA is used: By (19) the Fisher criterion maximization can be made: PCA Using PCA, a linear transformation T of the data with the principal components (i.e., principal eigenvectors) is constructed to maximize the T T X X T − cov . 54 Consequently, PCA defined as (20).
Where principal eigenvalues reflected in λ. By mapping onto the linear basis , the lowdimensional data representations y i of the data-points x i calculated. In PCA, the size of the covariance matrix corresponds to the data point's dimensionality considered as the chief drawback of this technique. 55 Kernel PCA Using a kernel function, the traditional linear PCA is reformulated in a high-dimensional space and called Kernel PCA (K-PCA). It is the nonlinear mapping of the PCA. 56 For the data-points x i ; the kernel matrix K is defined by (21): The eigenvectors of the covariance matrix α i are affected by the eigenvectors of the kernel-based matrix v i.
Where κ shows a kernel function. 57 Employing the next adjustment, the kernel matrix, K, is centered.
The eigenvectors of the covariance matrix αi are affected by the eigenvectors of the kernel-based matrix vi.
In the covariance matrix, the low-dimensional representation of data Y is the consequence of the projection onto the eigenvectors: The performance of K-PCA extremely hinges on the kernel function κ selection (the Gaussian, polynomial, and linear). 57 Similar to PCA, its main drawback is the size of the kernel matrix; 55 which is square with the amount of the data-points samples.

GDA
GDA technique (or Kernel LDA) corresponds to the reformulation of LDA through a kernel function. 58 The kernel matrix K eigenvectors gathered in a matrix P that fulfills: Many eigenvectors p i communicates with small eigenvalues λ i . Using (25), smoothed kernel K obtained. The Fisher criterion maximized in the GDA method, which corresponds to the maximum Rayleigh's quotient.
where W represents a n × n block diagonal matrix.
W c is a n c × n c matrix of which the entries are 1/n c , and n c indicates the instance numbers in class c, and l designates the number of classes. Matrix V contains the computed principal eigenvectors of the matrix P T WP. The normalized eigenvectors v i calculated by (28): Accordingly, the projection of the data points onto the normalized eigenvectors α i is realized in the highdimensional space described by the kernel function κ. 55 Classification It has shown that PNN performs well in classification problems. 59 After calculating the distance between the input and the training and adding some measures, a competitive layer produces the output vector.
For PNN training, the smoothing parameter (sigma, σ) must determine, which is a fundamental step. An optimum σ usually derived by trial and error.

Results
First, we evaluated the signal characteristic's differences in emotional recognition between mono and bilingual participants. For example, Figure 3 shows the results for the ECG signal. The Kruskal-Wallis statistical test performed to determine whether significant differences exist between the groups. Also, a multiple comparison test was implemented using the Tukey-Kramer approach. The significance has also shown in the figure.
As the figure shows, there is the most considerable number of features with a significant difference for LV/ LA. After that, the HV/LA stimuli provide the most significant differences between the mono and bilingual groups. Among the RQA features, the highest significant difference in various emotional classes found for ENTR and L. Although there is no decreasing or incremental journals.sbmu.ac.ir/Neuroscience http pattern in RQA parameters for mono and bilingual groups, the emotional classes have profoundly influenced their variations. Similar results obtained for PR.
Next, we evaluated the classification performances. The classifier randomly chose a two-third of the feature vector as the training set and the rest as the test. Moreover, the classification procedures implemented for three groups: 1) all subjects, 2) the monolingual group, and 3) the bilingual group. The classification accuracy computed as follows:  (29) where TP/FP is True/False Positives, and TN/FN is True/False Negatives.
We examined the accuracy of the classifier based on the dimensional structure of the emotional space. Accordingly, three categories of emotions considered: the rest and each quadrant of emotion plane (5C), three classes of arousal (3A), and three classes of valence (3V). Furthermore, different feature selection methodologies, as well as different σ parameters, are evaluated. The classification rates for discriminating five classes of affective states of the ECG and PR presented in Figure 4 and Figure 5.
The best recognition rates achieved for σ = 0.01. When   Figure 5). LDA, as a feature selection methodology, performed the best. In this circumstance, the classification rates raised to 91.54% and 95.51% for ECG and PR, correspondingly. The emotion discrimination also performed by taking into account the linguality information. Considering this information, a further improvement of the classification rate accomplished. A more comprehensive description of the accuracy rates for monolinguals and bilinguals given in Figure 4b and Figure 5b. A similar procedure performed by considering three classes of arousal ( Figure  6 and Figure 7).
Again, LDA outperformed the other feature selection methodologies. The maximum recognition rates for ECG and PR elevated from 65.39% and 78.2% up to 92.01% and 95.7%, respectively. Additional increments in the recognition rates were achieved using linguality information. Precisely, for ECG, an increment of 7.28 ± 3.5% and 7.21 ± 3.12%, and for PR, an increment of 2.93 ± 1.96% and 4.15 ± 1.78% were achieved for monolingual and bilingual, respectively.
Applying three classes of the valence, the results  Figure 8 and Figure 9.
Similar results obtained for 3V. In this context, the maximum classification accuracies of 65.72% and 78.25% for ECG and PR realized, which raised to 92.06% and 95.98% by using LDA. These rates were also elevated, applying linguality information.

Discussion
Over the last decades, more attention devoted to the design of an efficient emotion recognition system. In this study, an integrative signal based classification framework with linguality information presented for emotion recognition. For the first time, the PR implemented in the problem of emotion recognition. A three-phase classification scheme examined (5C, 3V, and 3A).
The suggested system was implemented employing innovative ANS features in a combination of different feature selection approaches. The feature set was constructed based on the RQA measures of wavelet coefficients to ensure the nonstationary and chaotic nature of biomedical signals. The PNN evaluated for different σ parameters and emotional states.
The results of this study indicated that the most significant differences between ANS characteristics of mono and bilingual groups were during low arousal emotions (LV/LA and HV/LA). The result is consistent with our preliminary findings. In our previous work, 60 we examined the ANS responses of monolinguals and    61 attempted to find whether significant differences exist between physiological arousal of bilinguals, using a mother-tongue versus a second language. They demonstrated no significant difference for ANS signals (i.e., ECG and GSR). Conceivably, the difference in their response is because their research was conducted between the mother tongue and the second language, not between bilingual and monolingual groups. From the perspective of the data classification, first, the adjustment of σ to 0.01 provided better classification accuracies than the other values of σ. Second, by using the proposed affective recognition system, comparable performances for all emotion categories were attained. Third, the linguality information, like other individual characteristics, 47 significantly improved classification performance. Fourth, the potential of nonlinear based approaches for discriminating affective states using physiological signals was confirmed. This result is highly in line with previous achievements that nonlinear derived features supported an excellent classification result for the discrimination of affective states. 7,9,[38][39][40][41][42] Lastly, LDA outperformed the other dimensionality reduction techniques. The usefulness of LDA in emotion recognition has previously verified. 42 In this study, using LDA, the best emotion classification was obtained from PR features in discriminating against three affective states on the valence dimension (95.98%). However, the similar system architecture provided the highest accuracy rate of 95.51% for distinguishing 5 emotional classes. These results indicate that the overall classification accuracy of the proposed integrative emotion recognition approach is superior to previous achievements. To classify the emotion from ECG signals, Jerritta et al 40,41 assessed the Hurst parameter as a nonlinear technique. Employing KNN and Fuzzy KNN, the maximum classification rate of 92.87% reached for discriminating against six types of emotions. Examining Hilbert-Huang transforms, and discrete Fourier transforms, the classification rate of 52% attained. 42 Extracting dynamic features from galvanic skin response in combination with SVM lead to the best classification rate of 80.31% for grief. 62 Recently, Nardelli et al 9 examined various standard features and nonlinear based features in the problem of emotion recognition. The recognition rates of 84.26% and 84.72% are achieved on the arousal and valence dimensions, correspondingly.
The results of the current study confirm the effectiveness of the integrative methodology. However, it supposed that these findings could further improve by considering other characteristics of subjects. Future works could also focus on the study of other ANS signals like galvanic skin response.

Conclusion
As a result, this study yielded the following conclusions.