Git Product home page Git Product logo

snakemake-crispr-guides's People

Contributors

github-actions[bot] avatar m-jahn avatar

Stargazers

 avatar  avatar  avatar

snakemake-crispr-guides's Issues

Test behavior with alternative options

  • different Cas enzymes
  • different scores --> test how prediction changes
  • include warnings for scores that are appropriate for certain applications or not (e.g. Cas9 derived scores vs Cas12 enzyme choice)
  • do online tools find similar guides?

avoid overlap between TSS window (5' UTR) and 3' terminus of upstream gene

  • right now a fixed window of e.g. 500 nt downstream and upstream of TSS is used
  • this sometimes overlaps with upstream genes leading to removal of guides with >1 TSS window
  • ideal is behavior where overlap is reduced by shrinking allowed 5' UTR to the terminus of upstream gene
  • ... and limit downstream TSS window to length of gene if shorter

change license to MIT

  • change license to MIT
  • add explicit statement that third party tools have their own license terms

improve flexibility of handling `gff` input files

  • currently names of chromosomes are hardcoded to NC.XXXX.[12], typical RefSeq numbers
  • however it should be possible to add custom chromosomes or plasmids with arbitrary initial name
  • BSgenome generation fails when ID is missing in gff files
  • could be added automatically from locus_tag

Additional types of CDS annotations

When running the pipeline for Synechococcus PCC 11901, the number of CDS considered by the pipeline was less than the number of CDS annotated in the genome.

This was due to some CDS annotations using GeneMarkS-2+ rather than RefSeq or protein homology. By including GeneMarkS-2+ in the gff_source_type dictionary in the get_genome script, I was able to include the previously missing CDS.

Are there any other methods of annotating CDS when assembling genomes that should be included here?

Add functionality to deal with guides targeting multiple loci (e.g. duplicated) genes

  • right now all "off-targets" meaning all guides that target more than one locus are removed
  • this is not always wanted: duplicated genes can be targeted at multiple sites with the same guide
  • need to add functionality to give users the choice to filter out such "good" off-targets or not
  • multi target guides should be flagged in output tables and in report

When using assembly: "GCF_005577135.1", species name is not parsed correctly, which is causing failure in create_bsgenome.R and design_guides.R

When using "GCF_005577135.1", species name is parsed as "Picosynechococcus sp. PCC 11901 chromosome". The correct and valid name for the organism's name would be “Synechococcus sp. PCC 11901” or "Picosynechococcus sp. PCC 11901" (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2579791).
This causes (in create_bsgenome.R)

txdb <- quiet_txdb( file = genome_gff, organism = genome_name, chrominfo = seqinfo_genome )$result

to break, which can be fixed by hardcoding the taxonomyId:

txdb <- quiet_txdb( file = genome_gff, organism = genome_name, chrominfo = seqinfo_genome, taxonomyId = 2579791 )$result

The same happens in design_guides.R in:

txdb <- makeTxDbFromGFF( file = genome_gff, organism = unname(genome(seqinfo_genome)[1]), chrominfo = seqinfo_genome )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.