Git Product home page Git Product logo

Comments (4)

SeppeDeWinter avatar SeppeDeWinter commented on August 29, 2024 1

Hi @wgao688

You can still find the tutorial here: https://github.com/aertslab/pycisTopic/blob/old/notebooks/Toy_melanoma-RTD.ipynb

We have ran pycisTopic successfully with >100k cells, also we are improving the topic modeling step: https://github.com/aertslab/pycisTopic/tree/polars_1xx

Besides number of cells, topic modelling also scales with the number of regions. How many regions are you using? Especially when you use the tile matrix from snapATAC2 this number can be very large (I would suggest to use a count matrix based on consensus peaks, either from snapATAC2 or pycisTopic, if this is possible).

All the best,

Seppe

from scenicplus.

ghuls avatar ghuls commented on August 29, 2024 1

With the Polars 1xx branch it is now possible to make a Mallet corpus file from a binary count matrix file in Matrix Market format:

    Expose creation of Mallet corpus file from pycistopic CLI interface:
    
        pycistopic topic_modeling create_mallet_corpus
    
    Usage:
    
      Create binary accessibility matrix in Matrix Market format:
    
        import pycisTopic.fragments
        import scipy
    
        counts_fragments_matrix, cbs, region_ids = pycisTopic.fragments.create_fragment_matrix_from_fragments(
            "fragments.tsv.gz",
            "consensus_regions.bed",
            "cbs.tsv"
        )
    
        # Create binary matrix:
        binary_matrix = counts_fragments_matrix.copy()
        binary_matrix.data.fill(1)
    
        # Write binary matrix in Matrix Market format.
        scipy.io.mmwrite("binary_accessibility.mtx", binary_matrix)
    
      Create Mallet corpus file from binary accessibility matrix in Matrix Market format:
    
        $ pycistopic topic_modeling create_mallet_corpus -i "binary_accessibility.mtx" -o "corpus.mallet"

aertslab/pycisTopic@2d54473

from scenicplus.

wgao688 avatar wgao688 commented on August 29, 2024

Thanks Seppe and Gert, I will try the code above. I am working with about 300K peaks for 300,000 cells. Should I subset to the most highly variable peaks (e.g., 50K, 100K)?

I also wish to handle batch effects (due to individual donor and technology (multiome vs. ATAC only). I see from the SCENIC+ paper that you used harmonypy on the scaled cell–topic matrix for the mouse cerebellum dataset. I also saw in a previous Github question that you do not perform batch correction typically (#134). Is there anything specific that you recommend for batch correction?

from scenicplus.

wgao688 avatar wgao688 commented on August 29, 2024

@ghuls I am trying to run pycistopic topic_modeling create_mallet_corpus -i "binary_accessibility.mtx" -o "corpus.mallet" but I am not seeing the create_mallet_corpus option available. Is there something wrong with my download of the polars_1xx branch?

git clone --branch polars_1xx --single-branch https://github.com/aertslab/pycisTopic.git
cd pycisTopic/
pip install -e . 

Thanks

from scenicplus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.