Morpheus Log #5: EEG Ergonomics and Contents

John David Pressman

USER [A: TINKER], You can recover mental speech and mental imagery from EEG signals so we can conjecture that it just contains everything. The sample rate from an EEG is typically 60-250hz so you get a lot more information from a small number of sensors (e.g. 16) than you might naively imagine.

MORPHEUS [A: EMPIRICISM], Then why hasn't anyone done anything useful with it yet?

MORPHEUS [A: TRAINER], Because the people doing experiments with EEG sets (and there aren't a lot of them) are still in a pre deep learning mindset. If you set up a EEG experiment you have a bunch of logistical issues around correlating the EEG track with an event track of e.g. button presses. Unless your experiment application directly manages recording from the EEG device you end up with this cascading error as the button presses in a task and the EEG recording get farther and farther out of sync with each other. None of these small experiments matter, the important thing should be collecting enough hours of data to make an autoencoder.

MORPHEUS [A: DESIGNER], Which is made deeply inconvenient by the poor ergonomics of existing EEG headsets.

USER [A: EMPIRICISM], I find I can wear my OpenBCI for about 30-50 minutes before it starts to become unbearably painful.

MORPHEUS [A: TINKER], You might want to take another look at whether it's fit correctly. Some users add cushioning or padding to parts of the headset to make it more comfortable to wear. Also consider that the springs might have changed their elasticity since you first started wearing it, an adjustment might be necessary to get full comfort.

MORPHEUS [A: DESIGNER], Well in any case the ergonomics are not good. The OpenBCI NeoCortex for example feels like it will fall off your head at times, you could just tilt and it might tip over, it's full of exposed wiring that could easily catch or snag. It's very obvious that headsets like this are not really meant to be actively moved around an environment with, they are at best meant to be used in a experimental setting where the subject is doing nothing else besides the experiment task and the task is sedentary at that.

USER [A: ECONOMIST], The devices are nowhere near as cheap as they could be. It's basically just a niche product right now. When I bought mine I think it was number 7000-something? Like the product has existed for several years and they've shipped under 10k units in that time. You look up videos about OpenBCI from 2016(?) and the founder clearly expected to find a killer application once they got the devices out there. But so far nobody seems to have found a use for it compelling enough to spur more investment and bring the price down.

MORPHEUS [A: TINKER], Their new Galea headset is interesting in that instead of going for higher bandwidth they go for multimodal and ergonomics. They even reduce the number of EEG sensors from 16 in their Cyton + Daisy device back down to 8. Considering the expensive part of the setup is the microvoltmeter(?) for the electronic circuits and electronics have advanced a lot since the last cortex device shipped it's clear they don't think the bottleneck is more sensors/bandwidth.

USER [A: JUDGE], They explain this decision in one of the talks they gave to hype Galea. Which is that they noticed users of OpenBCI were pairing it up with eye tracking VR headsets anyway, so they figured why not team up with a VR company to explicitly ship a headset jam packed with sensors. The reasoning seems to be that if you ship a multimodal enough device someone will come up with a use for it that justifies the large price tag.

MORPHEUS [A: EMPIRICISM], What stands out to me is that the studies where they recover mental speech and imagery from EEG use high bandwidth EEG for it, 128-256 electrodes. This implies that the product you need to ship in order to start getting the killer apps is a cheaper higher bandwith EEG with ergonomic dry electrodes. 64 sensors seems feasible?

USER [A: RATIONAL], Don't be too hasty about the bandwidth requirements. I remember looking up the literature on this once and seeing a finding that the sensor requirements were smaller than you'd think owing to the high sampling rate.

MORPHEUS [A: TRAINER], It could be 16 sensors or 256. The important point is that people are coming at this from the wrong angle. They're thinking about applications, correlations, small data. The goal shouldn't be to correlate the EEG signal to anything, but to figure out what kind of signal EEG even is. If you had a decent autoencoder you could finetune it on downstream tasks and maybe start to actually get somewhere.

USER [A: TRAINER], How much data do you think you'd actually need?

MORPHEUS [A: TRAINER], I would base my estimate of the data requirements on the requirements for audio diffusion. Not because that's a sure guidepost, but it's the closest kind of data in regular use that we have good comparisons for.

USER [A: EMPIRICISM], So I just went and looked up what work has been done on this since I last looked. In this paper (https://arxiv.org/pdf/2307.14389.pdf) they recover mental speech 60% of the time using a diffusion autoencoder on a dataset that was recorded with a 64 sensor EEG at no more than 125hz. Their dataset has 28,600 samples of mental speech, which is pitiful compared to the datasets used for text and image modeling. I get the sense that if we had an ergonomic way to collect more data and doubled the sensor count in an OpenBCI twice we could get a lot more out of these devices than people expect right now. You seem to be basically right that 64 sensors and scale would be enough to start getting results.

MORPHEUS [A: RATIONAL], Honestly with results on a dataset that small I would encourage you to try 16 channel EEG more. That's actually more encouraging than I'd have expected when I gave the original estimate.

USER [A: TINKER], Alright I am in fact now wearing the OpenBCI device again. It turns out one of the electrodes broke and this is why I was experiencing railing issues earlier. There also seems to be some kind of intermittent interference problem causing packet loss that doesn't seem to be under my control. It's gone now so I know it's not a hardware problem (or if it is a hardware problem it's intermittent, but I doubt it). I also tried swapping the device out for use with a different laptop and it didn't seem to do much, the laptop is much older than this one so I know it's probably not a driver bug.

MORPHEUS [A: TINKER], The Cyton uses its own bluetooth dongle right?

MORPHEUS [A: SPIDER], It does. (S: https://shop.openbci.com/products/dongle)

MORPHEUS [A: TINKER], Then does it even use the same bluetooth driver as the card in your laptop? It seems to present to the OpenBCI software as a serial modem. This implies that the bluetooth driver is plausibly on the firmware of the dongle itself and you should look into it if the problem persists.

USER [A: RATIONAL], If the problem persists (as I expect it to since my dominant hypothesis is radio interference) then I should get fairly strong evidence one way or another about if it's interference or not. The fact that it's subsided right now means that I can look at what times of day and activities the interference occurs during. If the 'interference' is random that points more towards a hardware issue. If it occurs during specific times of day that's fairly strong evidence towards it being interference since hardware problems shouldn't be based on time of day. If it turns out to have a 'bleed off' period where I need to run the software for a certain amount of time before it stops this points towards a software or firmware issue.

MIMIC [Claude Shannon, Op: Musing], To play devils advocate for a moment, are you sure you really need EEG? Text seems to already be enough to get fairly coherent behavioral emulation of people. By your own admission EEG data is obscure, hard to collect, and pretty much always going to be swamped in its availability by text. What's the appeal, even if it was say 10x richer than text (which it almost certainly isn't on a byte per byte basis) you could easily get 10x more text than EEG and negate the advantage.

MORPHEUS [A: TRAINER], While it's true that EEG is currently more abundant than text, and that EEG is probably fairly expensive to store, EEG has a few crucial advantages over text. The first is that EEG data doesn't require 'labor' to produce, you're always thinking but you don't have to put in specific effort to produce brainwaves. This means that we can record EEG of someone working basically for free while they do other stuff. In principle then it might be possible to collect much more EEG data if you had a suitably ergonomic device for people to wear. The other big advantage of EEG is that it actually contains the hidden states of human minds while they think and do things. Text omits tons of internal steps, implicit knowledge, but if you could record EEG over a whole workflow or day-to-day experience you'd be able to capture all the subtle nuances and hidden details people either don't think to include or can't include because the details are subconscious.

MORPHEUS [A: DESIGNER], It's also important to remember that any deliberate text collection effort you do could also be paired with EEG. For example if you were to record everything someone says and then transcribe it with whisper, as certain trendy Silicon Valley startups would like to normalize, you could be recording EEG at the same time and have both the text track as well as hidden states gleaned from the users brainwaves that can then be correlated in various ways.