Git Product home page Git Product logo

biotea's Introduction

Releases PiPY Tests Python Version

BioTEA, where Tea is short for Transcript Enrichment Analysis, is a pipeline for Differential Gene expression Analysis with microarray and RNA-seq data. It can download, preprocess and perform DEAs quickly, easily and in a reproducible way from the command line.

Check out the BioTEA Docker container, where the analysis code of BioTEA lives!

Read the publication:

Visentin, L.; Scarpellino, G.; Chinigò, G.; Munaron, L.; Ruffinatti, F.A. BioTEA: Containerized Methods of Analysis for Microarray-Based Transcriptomics Data. Biology 2022, 11, 1346.

Installation

IMPORTANT: BioTEA works on UNIX systems. To run on Windows, use the Windows Subsystems for Linux service.

  1. Install Docker. The exact process is specific to your package manager:
  2. Install Python version 3.9 or over. Again, this is dependent on your package manager:
    • On Ubuntu, run apt update && apt install python3.9. Depending on when you read this guide, you may need to tap into the deadsnakes PPA. Read the guides on the link I provided for more information. Just be sure that the result of python --version is 3.9 or higher when you continue to the following steps.
    • On Arch linux, run pacman -Syu python.
    • On MacOS, follow this guide in the python docs.
  3. Optional but strongly recommended: Make a Python virtual environment to use bioTEA in. You can search online for a way to do this in your OS.
  4. Install bioTEA with pip: pip install biotea.

IMPORTANT: Sometimes, critical bugs are fixed on the main branch but are yet to be released to PyPi. To get the development version of BioTEA, install it with pip install 'biotea @ git+https://github.com/CMA-Lab/bioTEA.git#subdirectory=src/bioTea'. If you run into problems, try and use this bleeding edge version, your issue might be fixed already!

If installed correctly, biotea info should give some information on the tool.

Usage

The publication provides an overview of the tool and its usage. It is a good place to start. For more information on the various commands, read the wiki.

If you run into problems using the BioTEA CLI, read the FAQ page on the wiki. If you still cannot solve the issue, file a bug report, detailing as much as you can your problem, including the versions of bioTEA, Python interpreter, Docker engine and your OS.

If you think that the issue is coming from the docker container (i.e. the Docker container is correctly launched but the analysis fails), you can create an issue in the bioTEA-box repository.

Contributing

To learn how you can contribute to the tool, read the CONTRIBUTING guide.

Version compatibility

The BioTEA cli generally gets more frequent updates than the BioTEA box. This causes their versions to drift apart. We stride to keep compatibility between the BioTEA cli and the box when the major versions are identical. This means that all BioTEA version x.y.z can run any BioTEA box of version x.*.*.

biotea's People

Contributors

mrhedmad avatar

Stargazers

 avatar  avatar  avatar

biotea's Issues

Migrate away from LegacyVersion

Describe the bug
All commands crash due to LegacyVersion being removed in the packaging package. Biotea version 1.0.3.

To Reproduce
Steps to reproduce the behavior:

  1. Install biotea
  2. Run any command

Passing "0" as "--plot-number" causes an internal invalid argument error

Error below:

2022-05-16 15:10:22,175 [ERROR] bioTea.docker_wrapper: Invalid arguments passed to interface. Please open an issue with the bioTEA logs.
Traceback (most recent call last):
  File "/home/hedmad/Files/Repos/Edmund/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 258, in parse_arguments
    self.possible_args[key](value)
  File "/home/hedmad/Files/Repos/Edmund/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 197, in __call__
    raise ValueError(f"Argument check failed. Invalid argument {argument}")
ValueError: Argument check failed. Invalid argument 10000000000.0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hedmad/Files/Repos/Edmund/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 416, in run_biotea_box
    parsed_args = interface.parse_arguments(**arguments)
  File "/home/hedmad/Files/Repos/Edmund/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 261, in parse_arguments
    raise ValueError(f"Argument check failed for key {key}: {value}")
ValueError: Argument check failed for key n_plots: 10000000000.0

The type checker is probably faulty. This is an old bug I thought was fixed, regarding the difference between float and int in the interface.

[BUG] BioTEA prepare cannot process (some) Agilent arrays

Describe the bug
Many Agilent arrays fail to be processed by BioTEA prepare. Some examples include:

  • GSE102238
  • GSE91035
  • GSE71729
  • GSE40098

To Reproduce
Steps to reproduce the behavior:

  1. Download the data of the above GEO datasets;
  2. Try and run BioTEA prepare against the data;
  3. BioTEA fails just after "reading input files..."

Desktop:

  • OS: Arch Linux
  • BioTEA Version (Run biotea info biotea): 1.1.0
  • Docker engine version (Run docker --version): N/A
  • BioTEA container version (if applicable): 1.0.4

[BUG] Sometimes, downloading packages from Bioconductor fails.

Describe the bug
Sometimes, the download of packages from Bioconductor (e.g. in the prep modules) fails, and the container crashes.

To Reproduce
Steps to reproduce the behavior:

  1. Install biotea
  2. Run the analysis presented in the paper
  3. When downloading pd.hugene.1.0.st.v1, sometimes the download cannot be run, an the resulting timeout causes a crash.

Expected behavior
Expected to download the package as normal.

Desktop:

  • OS: Arch Linux
  • BioTEA Version (Run biotea info biotea): 1.0.0
  • Docker engine version (Run docker --version): 20.10.17, build 100c70180f
  • BioTEA container version (if applicable): latest (1.0.1)

[BUG] Setting the batches variable causes an Invalid Argument error

Describe the bug
On a new analysis, setting the experimental_design > batches variable to anything other than None causes a ValueError:

Traceback (most recent call last):
  File "/home/hedmad/Files/panc_dec22/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 292, in parse_arguments
    self.possible_args[key](value)
  File "/home/hedmad/Files/panc_dec22/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 231, in __call__
    raise ValueError(f"Argument check failed. Invalid argument {argument}")
ValueError: Argument check failed. Invalid argument one, two, one, two

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hedmad/Files/panc_dec22/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 450, in run_biotea_box
    parsed_args = interface.parse_arguments(**arguments)
  File "/home/hedmad/Files/panc_dec22/env/lib/python3.10/site-packages/bioTea/docker_wrapper.py", line 295, in parse_arguments
    raise ValueError(f"Argument check failed for key {key}: {value}")
ValueError: Argument check failed for key batches: one, two, one, two

To Reproduce
Steps to reproduce the behavior:

  1. Create a new folder, with a virtual environment.
  2. Install bioTea
  3. Create a mock dataset with 4 random columns of data + a probe_id column.
  4. Create a new option file with biotea initialize
  5. Set the batches variable
  6. Run the analysis with biotea analyze

Expected behavior
The batches should be parsed as normal.

Desktop:

  • OS: Windows Subsystems for Linux running Arch
  • BioTEA Version: 1.0.2
  • Docker engine version: 20.10.21
  • BioTEA container version: 1.0.2

Make error messages prettier

This is part of a larger issue of R error messages, but a good start would be to look at the code and catch errors, giving better error messages.

Generate local database files to use as sources of annotations

The problem: The internal database file is just a .csv file generated manually through scripting. Database packages were downloaded on-the-fly, their annotations dumped, merged and cleaned. However, generating more or updated files is for now undefined. Additionally, the internal database file is only useful for Human chips.

Proposed Solution: A biotea annotations generate command that is used to generate these files. Additionally, biotea annotations apply should be updated to allow local files to act as annotation sources.
The new command should be executed by the box, as the annotation sources need to be manipulated in R (they are bioconductor packages).

Note: The way I did it was with R to dump the information, and Python to clean it up (as it is just much, much faster). Just to say that we might need further processing with Python after R. We therefore have to decide whether to do it all in the box, or R in the box and Python outside.

The plot-number option could have a better default value

The default value for the plot-number option in various modules is something like 10000000.

This was a bad patch to deal with the fact that the default value has to be the same as the input value (there is no or check in the biotea CLI parsers). This should be addressed by adding something like an Inf string, or even false.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.