👨🤖 EEG responses to the Real vs Virtual faces

Are you for real? Decoding realistic AI-generated faces from neural activity

Pasted image 20250623001955.png|600

Decoding for unrealistic faces displayed a triple peak pattern, emerging at around 100 ms and maintained until around 370 ms. Early decoding differences are consistent with rapid face detection and face-specific processing (Rossion et al., 2015Dobs et al., 2019Crouzet et al., 2010Wardle et al., 2020). The latter two peaks (at around 170–200 ms and 270–320 ms) have been similarly demonstrated to emerge in real versus artificial face perception (Wheatley et al., 2011Balas and Koldewyn, 2013Sagiv and Bentin, 2001Schindler et al., 2017Schindler et al., 2019Wardle et al., 2020). Schindler et al. (2017) suggest that early-stage N170 processing is related to assessing the structural configuration of faces as seen by a greater occipital involvement whilst the later-staged LPP, seen to increase linearly with face realism, suggests a deeper person-related, semantic involvement (also see Abdel Rahman, 2011Taylor et al., 2016). Differences at the triple peak correspond to N250 and P300 components typically associated with face familiarity (Collins et al., 2018) and semantic information (Tanaka, Curran, Porterfield, & Collins, 2006), the latter especially important for behaviour (Hanso, Bachmann, & Murd, 2010). In contrast, realistic/real decoding displayed a single-peak emergence between around 170 ms to 240 ms indicating a difference in processing between realistic and unrealistic faces. Namely, that differences in perception between real and realistic faces were constrained to the 170 ms time period. Indeed, in comparing human faces to doll faces and artificial faces, others have shown that only the human faces typically evoke sustained neural responses beyond the N170 component necessary for higher-order perception (Balas and Koldewyn, 2013Wheatley et al., 2011). Balas and Koldewyn (2013) found that the N170 was better characterised by encoding deviations from facial appearance than it was for animacy perception. In other words, realistic faces were perceived as configurally different to real faces, but only unrealistic faces engaged later processing necessary for high-order animacy or familiarity perception. Overall, earlier decoding for unrealistic faces, consistent with apparent low-level image differences (Fig. 1C), suggests that early and low to mid-level processing differences may account for decodability between real and unrealistic faces. The decoding for realistic faces, by contrast, emerges later and is constrained to the 170 ms time period, suggesting a face-specific configural process may be responsible for this distinction.

Pasted image 20250623014209.png

On the behavioral level, a good deepfake discrimination capacity has been found, which confirmed studies on the recognition of AI-generated faces [13,14,15,16]. Participants had good performance in recognizing the true faces as well [33,34]. A good degree (∼76%) of accuracy in classifying faces was observed, as reported in Table 1 and Table 2. Participants were slightly better at discriminating images with neutral emotional content than images with positive or negative emotional content. It seems to confirm the work of Montagrin et al., which highlighted the important role played by memory in facilitating the recognition of neutral faces in goal-relevant situations [35]. Statistical analysis of PSD-related features highlighted two main significant turn-ons for deepfakes: delta activation in the frontal (Figure 5a and Figure 6a) and right occipital (Figure 4a, Figure 5a and Figure 6a) areas could be due to a dynamic switching attention mechanism [36], meaning that participants spent more time interpreting synthetic faces, whereas high-beta activation in the left temporal area, which includes the FG [37], states that FFA activation is not determined by the originality label of the face presented.


Pasted image 20250623002533.png

Current EEG results provide evidence that high-quality virtual agent faces elicit an early-stage neural response, represented by the N170 component, that is indistinguishable from the response evoked by real human faces. This suggests that the early stage of face processing is primarily concerned with the presence of a face template rather than minute details or perceptual quality, at least when assessed by N170.


Pasted image 20250623003157.png

Assessing neuronal responses both in frequency and time domain, we found a non-linear relationship between SSVEP amplitudes and stylization level, that is, the most stylized cartoon images and the real photographs evoked stronger responses than images with medium stylization. Moreover, realness of even highly similar stylization levels could be decoded from the EEG data with task-related component analysis (TRCA). Importantly, we also account for confounding factors, such as the size of the stimulus face's eyes, which previously have not been adequately addressed.


Real-life relevant face perception is not captured by the N170 but reflected in later potentials: A comparison of 2D and virtual reality stimuli

An interesting article in which the authors investigated how EEG responses differ during the perception of real and virtual faces.

Pasted image 20250522021231.png
FIG 1. Procedure of stimulus presentation: 0.5–0.8s fixation, 1.5s stimulus presentation, 1.5s inter stimulus interval (ISI). Exemplary stimuli of face conditions and perceptual control conditions (scrambled, blurred) are illustrated.

Pasted image 20250603201428.png
FIG. 2. Time-by-amplitude plot of the root mean squared ERP averaged over all electrodes for the selection of appropriate time windows for all ERP components. Grey highlighted sections mark the time windows for P1 (95–125ms), N170 (165–195ms), L1 (230–420ms) and L2 (685–1,385ms).

Pasted image 20250522010323.png
Fig. 3. Time-by-amplitude plot of the mean P1 and N170 amplitudes for all conditions (panels A1, B1). Mean topographies across conditions used for ERP averaging (panel A2, B2). The electrodes selected for analyses are indicated. For the P1 electrodes Oz, O1, O2, P7, PO7, P8, PO8, TP7, TP8 and those in close vicinity were used. For the N170 electrodes P7, P8, PO7, PO8, P10, P9, PO10, PO9, TP7, TP8 and those in close vicinity were used.

Pasted image 20250603201747.png
FIG. 7. Panel A illustrates the P1 topographies for all stimulus types in both modalities. Panel B depicts the mean P1 amplitudes for all stimulus types in both modalities. The error bars depict the confidence intervals for the mean values. Significant differences within each modality are marked (p<0.05).

Pasted image 20250603202728.png
FIG. 8. Panel A illustrates the N170 topographies for all stimulus types in both modalities. Panel B depicts the mean N170 amplitudes for all stimulus types in both modalities. The error bars depict the confidence intervals for the mean values. Significant differences within each modality are marked (p<0.05).

Extending previous laboratory studies, later components reflect said mechanisms of realistic face processing. In contrast to earlier components, later potentials are linearly related to stimulus realism (Schindler et al., 2017), modulated by socially relevant emotional expressions and affective contexts (Bublatzky et al., 2014; Stolz et al., 2019) and especially sensitive for self-related emotions (Herbert et al., 2013).

Processing of actually self-relevant emotional and contextual information, such as, e.g., threat towards oneself, seems to not be captured by the N170 component.

Thus, consistent with laboratory results, late components discriminate faces and controls under realistic conditions, as they exhibiting much more discriminatory potential than the N170.

source: https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1050892/full

#EEG #face #N170 #ERP #VR #perception #faceRecognition