Discussion and Conclusion - 東北大学機関リポジトリTOUR

In this work, we have conducted a large-scale experiment on natural image reconstruc-tion from ECoG signals using deep learning. In successful cases, the L1-VGG-GAN and cGAN models produced reconstructions that contain various class- or object-specific visual attributes in presented images, suggesting that training reconstruction models with an adversarial loss is crucial to achieve better natural image reconstruc-tions. Furthermore, our results with downsampled ECoG signals showed the

impor-tance of utilizing rich temporal dynamics in ECoG signals for better natural image reconstruction. In our experiments, we recorded ECoG signals from the macaque inferior temporal cortex (ITC). ITC is considered as the highest region in the ven-tral visual pathway. Although functional properties of neurons in the early visual cortex are relatively well investigated, those in the mid and higher visual cortex are still unclear. Therefore, it is notable that our results indicate the possibility of re-constructing diverse natural images from electrophysiological recordings of neuronal activities in ITC.

Although reconstructions by the L1-VGG-GAN and cGAN models were quali-tatively better than the L1-based models, in our quantitative results, the L1-based models outperformed the other two models on the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM), which are pixel-level distortion metrics.

On the other hand, the L1-VGG-GAN and cGAN models outperformed the L1-based models on the Fr´echet Inception Distance (FID), which compares the distribution of ground truth and reconstructions. Similar results have been observed in various image restoration tasks. For example, when evaluating with PSNR or SSIM, models trained only with a pixel-level loss (e.g., mean squared error: MSE) usually outperforms models trained with a combination of a pixel-level, perceptual, and adversarial losses [124, 125]. However, when evaluating with human opinion scores or the perceptual index [125], GAN-based models usually outperformed pixel-only models. We believe that the evaluation metric of image reconstruction models in brain decoding should be decided by the purpose of study. If models should reconstruct ”accurate” images in terms of pixel values, they should be evaluated by pixel-level distortion metrics, such as PSNR and SSIM. On the other hand, if models should reconstruct more per-ceptible images, they should be evaluated by human opinion scores, the perceptual index, and image synthesis metrics (e.g., FID).

There are several limitation in our methods. First, in the training of our recon-struction models, it is assumed that brain signals of each trial reflect the presented image. Therefore, our methods are not directly applicable if models need to recon-struct imagined or perceived images, where explicit supervision is not easily available.

We believe that unsupervised or semi-supervised learning can be employed for more generic image reconstruction scenarios in brain decoding. Second, while our models reconstruct a single image from brain signals, each subject was continuously presented the image at every time step. Visual information in brain signals might depend on the time step. Therefore, to investigate what kind of visual information is contained at each time step, we need to train models that reconstruct a sequence of images from brain signals. This problem is also related to video reconstruction from brain signals [28].

Chapter 4 Deep Learning for Channel-Agnostic Brain Decoding across Multiple Sub-jects

4.1 Introduction

We can record complex spatiotemporal responses from the brain while the subject per-ceives or imagines something, using an electric or magnetic recording technique such as electroencephalography (EEG), magnetoencephalography (MEG), and electrocor-ticography (ECoG). The goal of brain decoding is to read out what was perceived or imagined from brain signals. Accurate decoding of motor states is crucial for creat-ing practical BCI systems in the real-world. [126, 98, 127]. Furthermore, developcreat-ing better brain decoding methods helps researchers investigate what kind of features are related to brain signals, by evaluating how well each decoding model predicts specific perceptual information from brain signals [128].

While a number of studies have proposed various decoding methods for brain signals [129, 130], most existing methods consider only static, single-subject cases, where a decoder is trained independently for each subject’s dataset with the same recording equipment. In BCI applications, long calibration time and overly repeated

recording trials are painful for patients; thus, decoders are desired to be transferable to novel patients and conditions. For cognitive science, across-subject decoding analyses are useful when the number of trials for each subject is limited.

In the literature, various methods for across-subject decoding have been proposed, such as common spatial patterns (CSPs) [131, 132, 29, 133] and transfer learning [134, 31, 135]. However, few studies have investigated decoding methods that are robust to the shift of recording channels. In practice, it is tough to record brain signals with exactly the same equipment and conditions from a large number of subjects or from a subject over a long period. Moreover, even with the same equipment, channel locations or conditions can change in each session, especially when the recording requires breaks and/or repeated removals of the electrode. Also, if a decoder accepts only one fixed number of input channels, it is not applicable to novel subjects’ dataset that have a different number of channels. Therefore, developing channel-agnostic decoding methods is crucial for creating scalable and transferable BCI systems.

In this work, we study brain decoding across multiple subjects with a different number of recording channels and channel location shifts. We consider channel-agnostic brain decoding as a instance learning problem [136, 34]. In multi-instance learning, each input is considered as a set of independent multi-instances (bag), and the task is considered as a weakly supervised learning problem where only one label is annotated for each entire bag. By using a multi-instance pooling operator, models can aggregate informative features over a variable number of input instances.

This formulation naturally fits into channel-agnostic brain decoding, where the goal is to train better performing decoders that are robust to the change of the number and the location of recording channels.

Based on the multi-instance learning formulation, we propose a novel agnostic decoder architecture with three building blocks. The first block, channel-wise feature extraction, uses a channel-channel-wise version of temporal convolutional net-works (TCNs) [117, 118], which applies shared 1D convolution kernels independently for each channel. The second block, across-channel transform, uses recently pro-posed multi-head self-attention [137] to model inter-channel interactions with channel

permutation invariance. The third block, multi-channel pooling aggregates features across a variable number of channels.

We conducted a thorough experiment to verify the design of our proposed decoder architecture in channel-agnostic brain decoding across multiple subjects. Our dataset has ECoG signals recorded from two subjects with a different number of channels and inconsistent channel locations. We trained our proposed models and baselines to predict six visual object classes from each subject’s single-trial data. Our results indicate the importance of using across-channel transforms with channel permutation invariance and inter-channel interactions for achieving better classification results in channel-agnostic brain decoding across multiple subjects.

ドキュメント内東北大学機関リポジトリTOUR (ページ 86-91)