import pathlib
21 Lab: Exploring a Dataset
We want to take a look at this real-world dataset:
https://openneuro.org/datasets/ds005420/versions/1.0.0
= pathlib.Path("../data/ds005420-download") path
= list(path.iterdir())
content 10] content[:
[PosixPath('../data/ds005420-download/sub-50'),
PosixPath('../data/ds005420-download/sub-40'),
PosixPath('../data/ds005420-download/sub-45'),
PosixPath('../data/ds005420-download/sub-9'),
PosixPath('../data/ds005420-download/sub-35'),
PosixPath('../data/ds005420-download/sub-16'),
PosixPath('../data/ds005420-download/CHANGES'),
PosixPath('../data/ds005420-download/sub-2'),
PosixPath('../data/ds005420-download/sub-36'),
PosixPath('../data/ds005420-download/sub-21')]
As we see, there are files, and directories.
21.1 Exercises 1
- List only the sub-directories in path.
- List only the sub-directories with subject data.
- Delete files:
CHANGES
andparticipants.tsv
21.2 Exercises 2
We will start by making sure our data/metadata contains the information we expect at a high level.
- Verify that all subject directories have a eeg sub-directory.
- Verify that all data in a subject directories matches with the subject number.
- Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500.
- (Optional) Write a file (
discarded_subjects.txt
) with the subject numbers that do not match that criterion.
21.3 Exercises 3
Now we want to look at the data. We find that the data is in a particular format .edf
that we cannot directly read in python.
Hint: We need to install a third-party library mne
to read .edf
files.
- Plot a histogram of RecordingDuration across all subjects.
- Plot one time series.
- Plot all time series with labels according to channel name.
- Plot the channels that start with “T” and “O”.
- Plot a correlation plot of the “T” and “O” channels as a heatmap.
21.4 Exercises 4
After having taken this quick look at the data, we want to start processing the data.
- Add a column called
channel_mean
that
- Substract the mean from each channel
- Plot correlation matrix of all-vs-all channels