Git Product home page Git Product logo

pygeneplexus's Introduction

Tests Documentation Status PyPI PyPI - Python Version License Code style: black

PyGenePlexus DOI

A Python package of the GenePlexus analysis pipeline.

Quick start

Installation

Install the GenePlexus package via pip.

pip install geneplexus

Run GenePlexus pipline

Example script

See example/example_run.py for example usage of the API.

Command-line interface

geneplexus --input_file example/input_genes.txt --output_dir example_result

Full CLI options (check out with geneplexus --help)

Run the GenePlexus pipline on a input gene list.

optional arguments:
  -h, --help            show this help message and exit
  -i , --input_file     Input gene list (.txt) file (one gene per line). (default: None)
  -d , --gene_list_delimiter
                        Delimiter used in the gene list. Use 'newline' if the genes are separated
                        by new line, and use 'tab' if the genes are seperate by tabs. Other
                        generic separator are also supported, e.g. ', '. (default: newline)
  -n , --network        Network to use. {format_choices(config.ALL_NETWORKS)} (default: STRING)
  -f , --feature        Types of feature to use. The choices are: {Adjacency, Embedding,
                        Influence} (default: Embedding)
  -g , --gsc            Geneset collection used to generate negatives and the modelsimilarities.
                        The choices are: {GO, DisGeNet} (default: GO)
  -s , --small_edgelist_num_nodes
                        Number of nodes in the small edgelist. (default: 50)
  -dd , --data_dir      Directory in which the data are stored, if set to None, then use the
                        default data directory ~/.data/geneplexus (default: None)
  -od , --output_dir    Output directory with respect to the repo root directory. (default:
                        result/)
  -l , --log_level      Logging level. The choices are: {CRITICAL, ERROR, WARNING, INFO, DEBUG}
                        (default: INFO)
  -q, --quiet           Suppress log messages (same as setting log_level to CRITICAL). (default:
                        False)
  -z, --zip-output      If set, then compress the output directory into a Zip file. (default:
                        False)
  --clear-data          Clear data directory and exit. (default: False)
  --overwrite           Overwrite existing result directory if set. (default: False)
  --skip-mdl-sim        Skip model similarity computation. This computation is not yet available
                        when using custom networks due to the lack of pretrained models for
                        comparison. (default: False)

Dev

Installation

Install the PyGenePlexus package in editable mode with dev dependencies

pip install -e ."[dev]"

Testing

Run the default test suite

pytest test/

By default, test data will be cached. Thus, after the first test run, data redownload will not be tested. To force redownload, specify the --cache-clear option

pytest test/ --cache-clear

Building Documentation

  1. Install doc dependencies pip install -r docs/requirements.txt
  2. Build
cd docs
make html
  1. Open doc open build/html/index.html

pygeneplexus's People

Contributors

christophermancuso avatar dependabot[bot] avatar pre-commit-ci[bot] avatar remylau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pygeneplexus's Issues

add output for CV results

small text files giving the values of each CV fold if it was able to be done. If not, maybe return a message that says CV couldn't be completed

S403

Security issue with pickle

Warn results overwriting

Potential options

  • Prompt for acknowledgment (but probably not good for use cases such as job submissions)
  • CLI option for always overwriting
  • Create new result dir, append some suffix, warn to log about saving to a slightly different dir

N803

Use lowercase argument name, probably won't fix

Deploy to PyPI

After the first release is ready, create a release v1.0.0 and then deployt to PyPI

python3 setup.py -q sdist bdist_wheel 
python3 -m pip install twine
twine upload --skip-existing dist/*

S301

Security issue with pickle

Change default data root dir to .geneplexusdata

Nice lib to use: https://pypi.org/project/pystow/

Great suggestion from Pat:

set default data folder in user's home dir? at one point, I'd cd'd into the result folder, than ran the program again and was surprised to see the downloads starting over. I'd forgotten I was in a different folder. If the input data will never change across any run of PyGenePlexus, then will a user ever want to specific a different data directory? That is, could you create a data directory in the users home folder that would be re-used for all runs of geneplexus? Other software uses dot-files to store configuration or data like this, e.g. .ipython or .java or .keras Could the data directory by default be $HOME/.geneplexusdata and the user would rarely specify it? I think dot-directories for configuration would also work in Windows (but would have to check). To see existing dot-directories on your computer (or HPCC) use ls -ald $HOME/.* Another possibility is to set an env variable like $GENEPLEXUSDATA (this is a lot of work for you but would mean the user would not have to specify the data dir every time, or could run the program in different places (*e.g. home, shared, etc) without specifying a data dir.

N802

Use lowercase function name, probably won't fix

warning for downloading data to much

possible add a warning or note about this is for a research project so if user needs data in many spots please copy data locally. Could add this to

  • repo README
  • jupyter notebook tutorial
  • message when package is installed or a certain function is run

CLI option setting up custom

Currently, if a user wants to use GenePlexus with their custom network or gsc, they will need to set up the required custom files using the geneplexus.custom module first, before they can proceed to run the GenePlexus pipeline using the CLI.

The goal here is to make a CLI option that calls the necessary geneplexus.custom functions to set up custom files, and thus eliminates the need for one to manually prepare them.

Working notes

  • --custom option -> enables preprocessing custom network/gsc data
    • Preprocessing runlong also save to ${data_dir}/custom_logs/${net}_${feature}_${gsc}.log
      • Network stats: num_nodes, num_edges
      • GSC stast: num_genesets, med_size, avg_size, std_size, max_size, min_size
  • Required files
    • Edgelist_xxx.edg (custom network)
    • GSCOriginal_xxx.json (custom gsc)
  • Set up custom network and gsc
    • custom.edgelist_to_node -> NodeOrder_${net}.txt
    • custom.edgelist_to_matrix -> Data_${feature}_{network}.npy
    • custom.subset_gsc_to_network -> GSC_${gsc}_${net}_GoodSets.json, GSC_${gsc}_${net}_universe.txt

speed test Azure

make a new branch an speed test downloading all the data from Azure

Allow user to add their own network or GSC

  • function that takes edge list to numpy array and node order text file
  • add original GSCs to downloadable files
  • function that takes a network and original GSC and return a subset GSC specific to the network
  • add info on exact filename and data format conventions for user added files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.