Git Product home page Git Product logo

itfsc's Introduction

iTFSC

integrated transcription factor analysis for single cell data (iTFSC) is an R package designed to consolidate transcription factor (TF) information across multiple tools to arrive at a more robust list of TF. The package also allows users to do downstream visualization including differential expression analysis. The package will be validated using four different cancer types.

Requirements

In order to run this package you will need to install the following dependencies: (please use R version 4.1.1)

  • library(Seurat)
  • library(SeuratDisk)
  • library(SCENIC)
  • library(BITFAM)
  • library(Dorothea)
  • library(piano)
  • library(ggplot2)
  • library(dplyr)
  • library(tidyr)
  • library(AUCell)
  • library(RcisTarget)
  • library(GENIE3)
  • library(base)
  • library(tibble)
  • library(ComplexHeatmap)
  • library(ggVennDiagram)
  • library(reshape2)
  • library(piano)
  • library(ggpubr)

If there is any issues install SCENIC, please visit this link for installation: http://htmlpreview.github.io/?https://github.com/aertslab/SCENIC/blob/master/inst/doc/SCENIC_Setup.html

Project detail:

Description: develop an integrated transcription factor analysis tool for single-cell and bulk data. The tool will include 4-5 existing transcription factors tools (eg SCENIC, DORTHEA, BITFAM etc) for single-cell data combined to give the users the transcription factor probability generated by a combined analysis. One way to select the best transcription factor is simply extracting the most common transcription factor generated from multiple tools. Other ways are to use the differential expression to decide on the best transcription factor across different cell types and find a common or high probability one. I am doing something similar for my research project, but I always thought it would be helpful if there was a package or tool to do this for me. The main idea would be to ensure that the transcription factors that we are getting are the ones that are actually involved, and this would be done by reproducibility across tools and through other downstream analyses.

Features: the features tool will include the following features:

  • integrated and fast TF analysis using 4-5 existing tools
  • extract common TFs generated from all tools
  • differential expression between cell types using limma on the output of the results from different tools
  • GSEA on the results from differential expression analysis
  • (if time) Apply the tools for bulk data (given there are at least 100 patients)
  • (if time) use the transcription factors for the deconvolution of bulk data

Example data

The RDS file for these datasets can be downloaded from here: https://drive.google.com/drive/folders/1WL0TxDAQpPGzmGy8gltT-x-ezSw6Ndh1?ths=true

How the user will run the example data:

Expected results:

  • a robust list of TF generated using three methods for extracting transcriptional activity score from single cell data
  • we will also provide a venn diagram to show how many common TFs exists
  • we will also employ heatmaps as depicted in the workflow image below to show individual methods output

Workflow

itfsc's People

Contributors

mahnoorngondal avatar mebonill avatar

Watchers

hsiao yi avatar Benjamin Li avatar  avatar

itfsc's Issues

Add parallel processing

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

design document update

Is your feature request related to a problem? Please describe.
update design document

Describe the solution you'd like
make biorender figure

Describe alternatives you've considered
Use .drawio instead

Additional context
None

Monica - Feedback

WBS Document:
Delegates tasks in a temporal manner with actionable items. The tasks are split into activities that are feasible and accomplishable.
Makes use of multiple packages ( SCENIC, RSHINY, BITFAM, Dorothea)
Details the output of each analysis with use friendliness in mind
Data visualizations are also taken into account
Comment: Workflow is clear and concise, activities are split into manageable tasks.

Materials:
Working with scRNA-Seq data, specifically 4 datasets ( breast cancer, colon cancer, lung cancer, ovarian cancer).
Three different software requirement specification documents. Document SRS doc 3 is the most detailed document, introducing the software package and its utilization for a user.

Comments:
Very well thought out project, WBS document and SDS documents are informative and clear.

I recommend making a single document consisting of the necessary inputs for all the different packages you will run and their outputs. This will enable the user to identify different data types necessary to use your package and ascertain what outputs they will be comparing. This is a small detail but one page takeaway might help the user feel prepared to begin analysis if they can check what they need.
Progress has been made on subsetting the data, writing a function and testing that it works.
Are there transcription that are known to be dysregulated across all cancers (cancer agnostic) and some that specific to each cancer you are investigating? Very cool to test the accuracy of the different trancription inference programs and converge on the output that is the same across programs ( computational). From a biology perspective are you able to identify a transcription factor network that is common in all 4 cancer datasets you are utilizing ( a cancer transcription factor network signature ?) Potentially look into Chip-Seq data as a validation step of findings?

Code and Test File:
R code reading in data, normalization of data, initial run of BITFAM
Test Code: Checking raw counts of scRNA-seq is present and that normalization of data was performe
)

  • The goals of this project can be achieved and progress has been made :)

cell quality check unit test

Cell quality control could be added to the unit test:
Cell quality control can be done using metrics such as the number of reads per cell, the number of genes detected per cell, and the percentage of reads mapping to mitochondrial genes. Its completion can be checked by setting a modest range of these metrics and then checking if cells meet these quality control metrics.

SCENIC implementation

Require assistance in implementing SCENIC code in my analysis.
SCENIC will be the last method that I want to implement in my code. For that, I need assistance in writing code for implementing scenic on my dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.