Thanks to you awesome help with <a class="issue-link js-issue-link" data-error-text="F

I think that <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Calculation of LvR and correlation coefficients of ground state about multi-area-model HOT 26 OPEN

inm-6 commented on July 19, 2024

Calculation of LvR and correlation coefficients of ground state

from multi-area-model.

Comments (26)

albada commented on July 19, 2024

Could it be the condition for neurons to have spiked at least once where this goes wrong? So that you are still including some silent neurons in the calculation?

from multi-area-model.

neworderofjamie commented on July 19, 2024

Well-spotted! rates ends up with less than 2000 neurons that spike after calling strip_binned_spiketrains. However, I don't quite understand how this ever worked as the paper says that the ground state simulations were run for 10.5 seconds and the 10.5 second spike trains in the repository don't have enough data

from multi-area-model.

neworderofjamie commented on July 19, 2024

So it's working but not leaving 2000 neurons

from multi-area-model.

albada commented on July 19, 2024

Yes that sounds right. I agree the relevant wording in the paper is ambiguous at best...

from multi-area-model.

neworderofjamie commented on July 19, 2024

So were the stats calculated using a longer simulation?

from multi-area-model.

albada commented on July 19, 2024

No, I don't think so. The paper states that for chi=1, the simulation duration was 10.5 s. I think the calculation started with 2000 neurons, from which the ones that spiked at least once were taken, leaving fewer than 2000 neurons in some cases.

from multi-area-model.

neworderofjamie commented on July 19, 2024

Ahh ok so how come our recalculation of the stats using the same code and (presumably) the same data don't match? I don't see any non-determinism in that code

from multi-area-model.

mschmidt87 commented on July 19, 2024

I think that @albada is right in saying that for some populations, there were probably fewer than 2000 neurons entering the calculation, simply because these population are small in the first place and then have low rates.

For what it's worth, I wouldn't call these deviations significant. Keep in mind that these values are very low, 10^-3 - 10^-4, on a scale of 0 to 1. If your simulations do not produce exactly the same spikes, these deviations can easily occur, but they're not significant, in my opinion.

I think, unless you're using the exact same configuration for your compute system (MPI processes, threads) and the same NEST version (+ some other dependencies that influence the random numbers), it's unlikely that you can produce the same spikes.

from multi-area-model.

neworderofjamie commented on July 19, 2024

Hey @mschmidt87 - thanks for looking at this. My concern is that, as a test, we're calculating these metrics from the published spiking data using the published code and we don't get the published correlation coefficients

from multi-area-model.

mschmidt87 commented on July 19, 2024

Okay, sorry, I missed that point, I thought you had run your own simulations. Let me take a look at that tomorrow then, I'll get back to you.

from multi-area-model.

neworderofjamie commented on July 19, 2024

Thank you so much!

from multi-area-model.

mschmidt87 commented on July 19, 2024

I can reproduce your finding for FEF, 6E. I am getting the same value as you for the cross-correlation coefficient. I can also see a deviation for the LvR value (0.2683 from recalculating vs. 0.4178 in the json file). However, I can reproduce the the population rates from the json files, which makes me conclude that the spike data is the correct one and I didn't use other data for the calculation of the json files.

I couldn't find the problem in the analysis code (I tried to change the part where we subsample etc., but no success). Since I produced the json files with the code that is in the repo and the file hasn't been modified, I am suspecting that this might perhaps be a version problem of the dependencies.

I am now using Python 3.8 and numpy 1.18.1 as well as the latest master of correlation_toolbox.
Unfortunately, I didn't record the exact dependencies at the time I produced the data in the data repo, so I can't investigate this in an easy manner.

from multi-area-model.

neworderofjamie commented on July 19, 2024

Glad to hear you can reproduce. Wouldn't it be possible that the spike rate would result in the same mean rates but different correlatation and irregularity (in the extreme case you could generate spike trains from populations of poisson sources with the same mean rates)?

Nonetheless, I can try and investigate older versions today. correlation_toolbox doesn't look to have changed a huge amount at least. The data was pushed on the 8/1/2018 so, assuming you created it around then, I can try and bisect numpy versions.

from multi-area-model.

neworderofjamie commented on July 19, 2024

I've tried using numpy 1.13.3 and 1.11.3 (which seems like a good guess for era-appropriate bleeding edge and less bleeding edge) on both python3 and python2 and using the older version of correlation_toolbox.sort_gdf_by_id (https://github.com/INM-6/correlation-toolbox/blob/f91d9f2fff2a1b27c1ace4cd771464e57c109648/correlation_toolbox/helper.py#L44-L79) (as this seems to be only change with much chance of affecting this) and the results remain unchanged so I'm a little stumped

from multi-area-model.

mschmidt87 commented on July 19, 2024

Thanks for your tests. Of course, it is possible to produce the exact same rates with different 2nd order statistics, but to achieve that with two different runs of the same simulation (with different RNG seeds), which is what I would have suspected, is extremely likely, i.e. can be excluded.

I'll think about it a little more.

from multi-area-model.

neworderofjamie commented on July 19, 2024

That is very true

from multi-area-model.

neworderofjamie commented on July 19, 2024

Hey @mschmidt87, any thoughts?

from multi-area-model.

neworderofjamie commented on July 19, 2024

I've done a little bit of code archeology and found a change in the LvR calculation. If I calculate the LvR before with and without this change:

from correlation_toolbox import helper as ch
import numpy as np

# **NOTE** this is heavily based off the analysis code from the paper
def load(filename, duration_s, num, check):
    tmin = 500.
    subsample = 2000
    resolution = 1.
    tref= 2.0

    spikes = np.load(filename)

    # calc lvr
    i_min = np.searchsorted(spikes[:, 1], tmin)
    i_max = np.searchsorted(spikes[:, 1], duration_s * 1000.0)
    LvR = np.array([])
    data_array = spikes[i_min:i_max]
    for i in np.unique(data_array[:, 0]):
        intervals = np.diff(data_array[np.where(data_array[:, 0] == i)[0], 1])
        if intervals.size > 1:
            val = np.sum((1. - 4 * intervals[0:-1] * intervals[1:] / (intervals[0:-1] + intervals[
                         1:]) ** 2) * (1 + 4 * tref / (intervals[0:-1] + intervals[1:])))
            LvR = np.append(LvR, val * 3 / (intervals.size - 1.))
        else:
            LvR = np.append(LvR, 0.0)
    # CHANGE HERE
    if check and len(LvR) < num:
        LvR = np.append(LvR, np.zeros(num - len(LvR)))
    return np.mean(LvR)


# Values from the json file on gnode
dataset_fef_6e_lvr = 0.4178813296671444
dataset_fef_5i_lvr = 0.9737456740769203

duration_s = 10.5

# Population sizes
num_fef_5i = 3721
num_fef_6e = 16128

# Load data
nest_fef_5i_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-5I.npy", duration_s, num_fef_5i, False)
nest_fef_6e_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-6E.npy", duration_s, num_fef_6e, False)

print("FEF 5I LvR - NEST:%f, Dataset:%f" % (nest_fef_5i_lvr, dataset_fef_5i_lvr))
print("FEF 6E LvR - NEST:%f, Dataset:%f" % (nest_fef_6e_lvr, dataset_fef_6e_lvr))

With check=True:

FEF 5I LvR - NEST:0.973484, Dataset:0.973746
FEF 6E LvR - NEST:0.268638, Dataset:0.417881

With check=False:

FEF 5I LvR - NEST:0.973746, Dataset:0.973746
FEF 6E LvR - NEST:0.473715, Dataset:0.417881

Which is significantly closer to the values in the published JSON. As this change was after the original submission date, might the published LvR data have been calculated prior to this change?

from multi-area-model.

mschmidt87 commented on July 19, 2024

I think you are probably right that this padding of silent neurons with 0.0 values to the results had not been used in the publication and the published data.

The manuscript says in the methods:
" Spike-train irregularity is quantified for each population by the revised local variation LvR [82] averaged over a subsample of 2000 neurons. The cross-correlation coefficient is
computed with bin width 1 ms on single-cell spike histograms of a subsample of 2000 neurons
per population with at least one emitted spike per neuron. Both measures are computed on the
entire population if it contains fewer than 2000 neurons."

This does not mention the padding and your calculations indicate that at least the new results are closer to the bublished with check=False (in your code). Unfortunately, I can't recall why I added the padding at some point.

I guess it's a question of definition how silent neurons should be accounted for in the calculations, but I would say now that assigning them value of 0 does not make much sense.

I think that @akorgor applied your code to more populations and can confirm your findings.

from multi-area-model.

neworderofjamie commented on July 19, 2024

thanks for looking into this some more @mschmidt87! It's still weird that the value for FEF 5I fits exactly but the one for FEF 6E doesn't though - again that seems unlikely to be caused by two different runs of the same simulation.

from multi-area-model.

neworderofjamie commented on July 19, 2024

@akorgor has kindly shared the full set of spike trains with me so I thought I'd share this (very bad) plot of the distribution of LvR metrics for 6E:

from multi-area-model.

akorgor commented on July 19, 2024

I found that if in the code snippet above on the LvR calculation check is chosen to be True and num is chosen to be len(np.unique(spikes[:, 0])), the results from the code and from gnode match up to a difference of ~1e-15, 1e-16. Thus, for computing the average LvR value only the neurons that spiked at least once during the simulation contribute a 0 but not the completely silent neurons like the definition of num in the code snippet indicates. I could verify this for the areas FEF, V1, V2 for the populations in all layers in the chi=1.0 10.5s simulation.

from multi-area-model.

akorgor commented on July 19, 2024

For the correlation coefficient calculation however, I could not find the reason for the deviations until now. The gnode values are for the majority of the populations (15/24) larger than the ones from the recalculation. The differences range from -0.00025 to +0.00025 and do not seem to depend on the total number of neurons in the population or the number of spiking neurons. Except for FEF 6E there are always more than subsample=2000 neurons after ch.strip_binned_spiketrains(hist).
Since I found that this calculation of the correlation coefficients is in general sensitive to subsampling in terms of neurons and time, I tried setting tmin=0, tmax=max(np.unique(spikes[:, 1]) and subsample=3000, but none of these three simulations yielded the gnode values. @albada @mschmidt87, I would be happy to test if you have further ideas where the deviations could come from.

from multi-area-model.

mschmidt87 commented on July 19, 2024

I found that if in the code snippet above on the LvR calculation check is chosen to be True and num is chosen to be len(np.unique(spikes[:, 0])), the results from the code and from gnode match up to a difference of ~1e-15, 1e-16. Thus, for computing the average LvR value only the neurons that spiked at least once during the simulation contribute a 0 but not the completely silent neurons like the definition of num in the code snippet indicates. I could verify this for the areas FEF, V1, V2 for the populations in all layers in the chi=1.0 10.5s simulation.

Thank you @akorgor for that detective work (and obvisouly sorry for the misleading code and created confusion). I think that this these choices are definitely debatable, however, since the deviations are not too large from the data shown in the paper, I would suggest to just edit the code in the repo such that the results match and stick to the algorithm used for the paper.

from multi-area-model.

mschmidt87 commented on July 19, 2024

@akorgor has kindly shared the full set of spike trains with me so I thought I'd share this (very bad) plot of the distribution of LvR metrics for 6E:

What does the lower panel show? I think it can't be the values from the published paper, because the values for 6E are mostly well below 1 (Fig. 3F).

from multi-area-model.

neworderofjamie commented on July 19, 2024

@akorgor has kindly shared the full set of spike trains with me so I thought I'd share this (very bad) plot of the distribution of LvR metrics for 6E:

What does the lower panel show? I think it can't be the values from the published paper, because the values for 6E are mostly well below 1 (Fig. 3F).

The lower panel is the values in the json file on gnode (from the simulation where chi=1.9)

from multi-area-model.

Calculation of LvR and correlation coefficients of ground state about multi-area-model HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent