Find correlated neural activity that is offset-delayed using cross-correlation

compared to convolution & autocorrelation

In experiments measuring neural network activity, such as in studies using GCaMP, it’s often of interest to find neurons that have related activity patterns.

Finding neurons that are coactivated can easily be done using principal components analysis. However, you may also be interested in finding neurons that have a single upstream source driving their activity, but their activation timings are temporally offset.

Since these activation patterns don’t align, and are in a sea of other activity, there may not seem to be an easy way to identify these coupled neurons. Thankfully there is a method that can vastly reduce the complexity of this task. It’s a correlational method called cross-correlation, and it can be used to find, not just related signals, but also correct for offset in spike/signal timing.

Below, the script demonstrates how to use the cross-correlation function (here, in MATLAB using the xcorr() function) by loading three related but asynchronous signals into the workspace, computing their offset, and aligning them.

load relatedsig
ax(1) = subplot(3,1,1); plot(s1) ylabel('s_1') axis tight
ax(2) = subplot(3,1,2); plot(s2) ylabel('s_2') axis tight
ax(3) = subplot(3,1,3); plot(s3) ylabel('s_3') axis tight xlabel('Samples')

Next, we use the xcorr() function to compute the cross-correlations between the three pairs of signals; then normalize them so their maximum value is one.

[C21,lag21] = xcorr(s2,s1);
C21 = C21/max(C21);

[C31,lag31] = xcorr(s3,s1);
C31 = C31/max(C31);

[C32,lag32] = xcorr(s3,s2);
C32 = C32/max(C32);

The locations of the maximum values of the cross-correlations indicate time leads or lags.

[M21,I21] = max(C21);
t21 = lag21(I21);

[M31,I31] = max(C31);
t31 = lag31(I31);

[M32,I32] = max(C32);
t32 = lag31(I32);

Plot the cross-correlations. In each plot display the location of the maximum.


plot(lag21,C21,[t21 t21],[-0.5 1],'r:')
text(t21+100,0.5,['Lag: ' int2str(t21)])
axis tight

plot(lag31,C31,[t31 t31],[-0.5 1],'r:')
text(t31+100,0.5,['Lag: ' int2str(t31)])
axis tight

plot(lag32,C32,[t32 t32],[-0.5 1],'r:')
text(t32+100,0.5,['Lag: ' int2str(t32)])
axis tight

We can see that s2 leads s1 by 350 samples; s3 lags s1 by 150 samples. Thus s2 leads s3 by 500 samples. Line up the signals by clipping the vectors with longer delays.

s1 = s1(-t21:end);
s3 = s3(t32:end);

ax(1) = subplot(3,1,1);
axis tight

ax(2) = subplot(3,1,2);
axis tight

ax(3) = subplot(3,1,3);
axis tight


The signals are now synchronized and ready for further processing.

github icon Interested seeing more stuff like this? Head over to my GitHub page

Hot Hands, a Paradox, and one reason why it’s bad to combine within-subject data

The ‘Hot Hand’ phenomenon is a popular belief (applicable to many domains from sports to gambling) that players who were successful in their most recent attempt(s) have increased odds of being successful in their next attempt — they are on a so-called ‘hot streak’ or have a ‘hot hand’. The statistical validity of this belief can be investigated using actual data. Indeed, it has been. For example, Tversky and Gilovich (1989) investigated the hot hand belief in basketball.

Here however, we are not scrutinizing the hot hand belief, but rather using this framework and dataset to reveal the presence of a Simpsonian Paradox. The definition of this paradox will precipitate from the following example…

In the 1996/97 NBA jam seasons Michael Jordan shot a pair of free throws on 338 occasions. MJ made both 251 times, missed both 5 times, made only the first 34 times, and made only the second 48 times. These data are presented in the table above, as are the same data for Dennis Rodman, and also their combined numbers.

Let Phit and Pmiss denote the proportion of first shot hits followed by a hit, and the proportion of first shot misses followed by a hit, respectively. These proportions for Jordan and Rodman, along with their combined numbers are:

Phit = 251 / 285 = .881
Pmiss = 48 / 53 = .906

Phit = 54 / 91 = .593
Pmiss = 49 / 80 = .612

Phit = 305 / 376 = .811
Pmiss = 97 / 133 = .729

Notice that, contrary to the hot hand phenomenon, MJ actually shot worse after he made the first of two freethrow shots. The same goes for Rodman.

So both players are actually worse on their second freethrow, on attempts when they’ve made their first shot.

Combining MJ and Rodman’s freethrow data together, the opposite is true. This is the Simpsonian paradox.

Favorite News Sources Across the Political Aisle

What are the most liberal-leaning and conservative-leaning news outlets?

Sometime last presidential election season I had this very thought. All kinds of dirt was being thrown around about both candidates; however, lots of it was coming from news sources I had never heard of. I probably still wouldn’t have if Twitter and Reddit didn’t exist, providing these outlandish stories a platform for mass exposure (and mass outrage).

So I could never really tell if what I was reading was from a legit source, completely spun, or flat-out fake. For example I would see a headline like:

FBI Arrests Hillary on Corruption Charges

Linking to a news outlet calling themselves “The Discovery Examiner Guardian“, or something. I thought, well… if The DEG is like the NYT, Hil-dawg is probably in deep shit. If the DEG is like Breitbart then I’m 99% sure the opposite is true.

So who the F are these guys? Like, in general.

So I googled: What are the most liberal-leaning and conservative-leaning news outlets? To my dismay I found nothing satisfying. No ranking lists curated by experts, no data driven politio-meter, nothing really. Just a bunch of anecdotes from internet people complaining that so-and-so news is like totally bias.

I guess it makes sense given that any corporation attempting to appear as “the news” is trying to woo as many people as possible into believing they are thee most credible straight-shootin’, just-the-facts-you-decide, fair-and-balanced, no-underlying-agenda organization around. So, as nice as it would be, of course Fox News isn’t going to post on their homepage something like, “We are a 8/10 on the left/right political spectrum“.

So I had an idea… Reddit created this mess; let’s see if they can help fix it.

Using Reddit API (PRAW), I wrote python script to identify the favorite news sources of two subreddit communities on opposite ends of the political aisle.

This bot scraped the url from the top daily submissions to the main pro-Trump and anti-Trump subreddit communities, essentially determining these subreddits favorite news outlets. Nota bene: the validity of these data as a litmus for liberal-leaning and conservative-leaning news rests on the assumption that generally people prefer to post and upvote stories that align-with and support their personal world view.

Without further adieu…

UPPER PANEL: pro-Trump subreddit The_Donald
LOWER PANEL: anti-Trump subreddit EnoughTrumpSpam

I cross posted this project on Reddit’s Data is Beautiful, where a Googler, Filipe Hoffa saw my post and took it to the next level. Using data studio he expanded my original idea to all of reddit, and made it interactive. It’s something really worth playing around with for a few minutes. So go check it out!.

You can grab my code for scraping the reddit data from this github gist (with the caviat that it was hacked-together in a few minutes of spare time; so take it as a draft with lots of room for improvement ;)