Git Product home page Git Product logo

pimms's Issues

:bug: Windows-latest is not windows desktop, but server

Installing all NAGuideR dependencies fails if bioconductor images are used, as the github action testing with windows-latest is using windows server which seems to be compatible with noarch linux packages.

Will probably be good to support NAGuideR trough a separate environment linked to the Snakemake rule and provide instructions on how to install dependencies on windows.

imputation strategies

For masking the missing inputs (to be recovered), one would needs to set value.

Possible options

  • [ ] lower detection limit
  • [ ] standardize data and input mean (feature-wise or sample wise?)
  • [ ] specific token (representing "non-detected"). The lower detection in the data is a numeric "non-deteced" token. One could try to find a learned representation for this (e.g. check BERT)

build docs

  • structure and commit docs folder
  • create API docs with sphinx for first set of objects defined

Experiment02

  • consolidate experiment 02 notebooks
    • move bullets points with tasks to issues
    • make model specific issues with ideas for evaluating them
    • make it work as a script with parameters
    • put more code into configuration files
    • add some more tests for important classes/functions
  • Select training data for performance comparisons
    • Entire data PCA, t-SNE, UMAP -> find a cluster of samples
    • fix data, describe it, order it (500-1000 samples for training, validation and testing data)
  • Compare different setups and performances
    • On fixed data, create performance difference plots between models, etc
  • denoising evaluation

Application ideas

On the gene level, the model could be used to

improved batch of samples

  • each condition is dealt with separately

fill in missing for ML follow-up task

  • replace current proteome imputation by vaep-model based imputation

project config

  • try to find examples for placing project configs
  • will have to be moved to a src folder potentially or read in as a file in src.__init__.py?

3_select_data.ipynb

Fokus: peptides.txt

  • change data loading to new format (less memory needed with folder based structure, gene focused)
    • read non-filtered peptide dumps (
  • transfer code to library
  • get tensorboard in notebook running

FASTA file analysis

Does every gene have unique peptides?

  • add analysis to 01_FASTA_tryptic_digest.ipynb on how many genes have no unique peptide associated to them.

How many duplicated entries are in fasta files?

I3L1U9 and I3L3I0 have identical AA sequences

  • run analysis per gene for proteins of equal length

Aggregating peptides

# up to two missed cleavage sites.
peptides = ("ILTERGYSFTTTAEREIVR",
                 "GYSFTTTAEREIVRDIK",
                           "EIVRDIKEK",
                               "DIKEKLCYVALDFEQEMATAASSSSLEK")
  • math consecutive sequences (the order is known and leads to consecutive overlaps)
  • aggregate peptides in case of overlap with peptides resulting from no miscleavage (having a min lenght of 6), otherwise keep them?
  • distinguish observed vs non-observed for aggregation (only consider peptides with evidence)

Next Steps

  • Create index by date

  • Cluster HeLa Cellines into two parts to see if change of biological sample of HeLa cellline is matched.

  • Download HeLas for comparison from Pride

  • copy files to /tmp/ on Computerome1

  • Uniprod search space of possible peptides sequences (using defined constraints)

  • get latest MaxQuant Parameter file for v1.6.1.*'

  • fasta-trypsin-digest.ipynb (Johannes Müller)

  • gene-name (and more) look -up using knowledge graph package

  • Blast tool (to see how unique peptides match to the genome)

MQ files

  • use previous notebook (to process MQ-output) to analyze an entire MQ-OUTPUT folder
  • provide a set of peptides and check for different additional information in this specific MQ-OUTPUT ("Retention Time", group of proteins

contaminants

Idea: Replace MQ list of internal contaminants by explicit list of contaminants

In order to reduce the dependency of an internal list of contaminants of a specific tool (or MQ version), specify explicitly a list of contaminants.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.