prnicovich / clusdoc Goto Github PK

Repository associated with Clus-DoC clustering and degree of colocalization analysis

MATLAB 100.00%

clusdoc's Introduction

ClusDoC package for co-clustering analysis for single molecule localization microscopy (SMLM) data

This software was used for the following publications: SV Pageon, PR Nicovich, M Mollazade, T Tabarin, K Gaus. "Clus-DoC: a combined cluster detection and colocalization analysis for single-molecule localization microscopy data" Molecular Biology of the Cell 27 (22), 3627-3636.

Pageon, Sophie V., et al. "Functional role of T-cell receptor nanoclusters in signal initiation and antigen discrimination." Proceedings of the National Academy of Sciences (2016): 201607436.

Requirements

MATLAB 2014b or later
Distributed computing, Image Processing, and statistical analyses toolboxes

Compiled dependent MEX functions for 64 bit PC are included in the repository. Source files are included in the .\private\mexSource folder. You will need to compile these functions and replace those in the .\private\ folder to run on architectures other than 64 bit Windows.

Quick start

Clone all files into the desired folder, either by downloading package link or through git clone https://github.com/PRNicovich/ClusDoC.git.
Navigate to local cloned repository in MATLAB file path
Execute by calling 'ClusDoC' at command prompt.
Once GUI window opens, click on 'Select Input File(s)' button. In subsequent pop-up, select file .\Test dataset\1.txt. File .\Test dataset\coordinates.txt should also load.
Select output folder by clicking 'Set Output Path' button. The default choice of .\Test dataset\ is sufficient.
Proceed with choosing ROIs or downstream analysis.

clusdoc's People

Contributors

Stargazers

Watchers

Forkers

anchalc1 ntoand bioturbonick demo574

clusdoc's Issues

Note for reduced memory usage

I discovered that the entry to DBSCANHandler.m is the primary reason why lots of points can cause memory usage to blow up and block analysis. This modification helps get past it in most situations on my machine.

if length(Data(:,1:2)) <= 20000
        distRow = pdist(Data(:,1:2));
        nPossibleClustering = sum(distRow < DBSCANParams.epsilon);
        if nPossibleClustering >= DBSCANParams.minPts
            checkClusterTest = true;
        else
            checkClusterTest = false;
        end
    else
        checkClusterTest = true; % too many points to check with above method, just assume it works
        % confirmed that this one test is the largest barrier to memory. No
        % other method choked.
    end

Note for clusters with no overlaps

Replace lines 8-20 of ExportDBSCANDataToExcelFiles.m with:

    notemptyA = ~cellfun('isempty', A); % empty array cells are not ROIs, they were unused placeholders in the matrix
    notmissingA = ~cellfun(@(x) isstructmissing(x), A); % missing cells are ROIs that had no interactions, and do no appear in the cellROIPair table; can't use ismissing directly because it fails with structs
    not_empty_or_missingA = notemptyA & notmissingA;
    
    Percent_in_Cluster_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Percent_in_Cluster, A(not_empty_or_missingA), 'UniformOutput', false));
    Number_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Number, A(not_empty_or_missingA), 'UniformOutput', false));
    Area_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Area, A(not_empty_or_missingA), 'UniformOutput', false));
    Density_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Density, A(not_empty_or_missingA), 'UniformOutput', false));
    RelativeDensity_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.RelativeDensity, A(not_empty_or_missingA), 'UniformOutput', false));
    TotalNumber(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.TotalNumber, A(not_empty_or_missingA), 'UniformOutput', false));
    Circularity_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Mean_Circularity, A(not_empty_or_missingA),'UniformOutput', false));
    Number_Cluster_column(not_empty_or_missingA) = cell2mat(cellfun(@(x) x.Number_Cluster, A(not_empty_or_missingA), 'UniformOutput', false));

and then a few lines down:

    Matrix_Result = [Percent_in_Cluster_column(notemptyA)'*100 , Number_column(notemptyA)' , Area_column(notemptyA)' , Density_column(notemptyA)'*1e6 ,...
        RelativeDensity_column(notemptyA)', TotalNumber(notemptyA)', Circularity_column(notemptyA)', Number_Cluster_column(notemptyA)', Number_Cluster_column(notemptyA)'./(1e-6*cellROIPair(:,5))];

The problem is that A contains an empty entry for the cluster in cellROIPair but they're filtered out of all the parameter vectors. This change preserves a zero-value entry for each one, so that the code can continue.

This also depends on returning a missing for varargout{4} in DBSCANHandler if checkClusterTest is false, so that unprocessed ROIs and array placeholders can be differentiated.

Build notes for optics_dbscan_mex_src

There are no build notes or any build_mex.m in optics_dbscan_mex_src. I'm not sure how to build this without it.

Clus-DoC with thunderSTORM tables

I'm looking to utilize this tool in my research as keeping our own tools up to date, customizable and error free has become extremely cumbersome. However, I aquire images on a nikon camera and process with ThunderSTORM. My issue lies in not being able to run coloc analysis as my files only contain single-colour and I'd have to feed in two files to have multi-colour images of the same cell.

Is there a way to circumvent this I am missing, or is it possible to implement this?

Potentially an absolutely wonderful tool though.

Error when zero clusters identified

When running DBSCAN for All and clustering parameters are set that result in 0 clustered points I get the following error message:

Output argument "SumofContour" (and maybe others) not assigned during call to "DBSCANHandler".

Error in ClusDoC>DBSCAN_All (line 2346)
                            [~, ClusterSmoothTable{roiInc, c}, ~, classOut, ~, ~, ~, Result{roiInc, c}] = ...
 
Error while evaluating UIControl Callback.

Is it possible to output empty instead of stopping the entire batch?

Swapped columns (H, I) in excel export

My colleague (@BioTurboNick) and I found that columns H and I in the excel export are swapped. The problem code is the two marked items below from EvalStatisticsOnDBSCANandDoCResults.m:

Matrix_Result=[DensityDofC...
                    Density2...
                    AreaDofC...
                    Area2...
                    CircularityDofC...
                    Circularity2, ...
                    cell2mat(MeanNumMolsPerColocCluster(:)), ...
                    cell2mat(NumColocClustersPerROI(:)), ... ####
                    cell2mat(MeanNumMolsPerNonColocCluster(:)), ... ####
                    cell2mat(NumNonColocClustersPerROI(:))];

Lr threshold calculation may be in error; though impact may be minimal

I may have discovered an error in the Lr calculation and threshold. The result is that all localizations with any neighbors are deemed above the threshold, and only those with no neighbors are deemed below the threshold.

I discovered it as I'm rewriting ClusDoC in Julia, and was struggling with understanding the Lr function.

I don't know what the full effect is yet, I'm still investigating. It may just mean more noise ends up in the final numbers than intended. It may also end up having no real effect (see last sentence).

In brief, I believe what the Lr function is intended to do is to calculate the radius at which one would expect to find the number of neighbors actually observed around a localization, given an even distribution across the ROI.

However, the current implementation of the Lr function includes an extra SizeROI factor, which I think inflates the calculated value of Lr. (was SizeROI at one point expected to be the side of a square instead of an area?)

The threshold as currently implemented is also calculating the number of neighbors expected within Lr_radius. However, there are traces of an old definition which says the threshold calculation was essentially equal to Lr_radius, which is exactly what I would expect the threshold to be based on the above understanding of what Lr is calculating: a radius.

Thresholding on the number of neighbors within Lr_radius does make sense if you don't need something robust to different populations of localizations. It seems that latter part was the intention, but it's only used for all-vs-all within the same ROI, so such normalization may not be necessary.

And it may well be that in most cases, localizations are sparse enough that the result of a corrected calculation would end up being the same. I'm seeing Lr_threshold values of <1 in some test datasets, which would suggest that's the true in large, disperse ROIs. Very dense ROIs more likely to be affected.

Some questions

questions

I am a PhD student from Hainan University, China. We recently developed a software for processing SMLM data and wrote an article for submission, and we used the test data( 1.txt) to test our software. First, there is a question I would like to ask you, which two biomolecules were imaged by SMLM to obtain this two-color data? Secondly, we will mention the use of that data in our article. Therefore, in which way would you like us to thank you for your help (acknowledgement or authorship).
Best wishs.

Error in bulk processing of ClusDoC

When processing data in bulk with the full ClusDoC tool I get the following message after the program finnishes DBSCAN on the first channel and is supposed to move on to the next:

DoC exited with errors
Matrix index is out of range for deletion.

Error in ExportDBSCANDataToExcelFiles (line 8)
    cellROIPair(cellfun('isempty', A), :) = []; % filter out empty ones

Error in DBSCANonDoCResults (line 139)
        ExportDBSCANDataToExcelFiles(cellROIPair, ResultCell, strcat(Path_name, '\DBSCAN Results'), Ch);

Error in ClusDoC>DoC_All (line 2515)
            [ClusterTableCh1, ClusterTableCh2, clusterIDOut, handles.ClusterTable] =
            DBSCANonDoCResults(handles.CellData, handles.ROICoordinates, ...
 
Error while evaluating UIControl Callback.