grimmlab / permgwas Goto Github PK

View Code? Open in Web Editor NEW

23.0 5.0 3.0 13.2 MB

Efficient Permutation-based GWAS for Normal and Skewed Phenotypic Distributions

Home Page: https://doi.org/10.1093/bioinformatics/btac455

License: MIT License

Dockerfile 0.33% Python 99.67%

gwas linear-mixed-models permutations westfall-young skewed-data gpu-acceleration

permgwas's Introduction

permGWAS2

This is an improved version of permGWAS. The original version can be found at permGWAS Version1

permGWAS2 is an open source software tool written in python to efficiently perform genome-wide association studies (GWAS) with permutation-based thresholds. It uses a batch-wise Linear Mixed Model to compute several univariate tests simultaneously. permGWAS2 provides support for multiple CPUs as well as for GPUs.

In contrast to the original version, permGWAS2 allows for two different permutation strategies:

x (default): permute the fixed effects matrix including covariates and the SNP of interest (equivalent to permuting y and the covariance matrix)

y: permute only the phenotype vector (same method as in the original permGWAS)

Details on the architecture of permGWAS and permGWAS2, benchmarking results of the framework and on permutation-based thresholds can be found in our publications.

How to run permGWAS2

Publications & Citation

John, M., Korte, A. & Grimm, D. G. (2023). permGWAS2: Enhanced and Accelerated Permutation-based Genome-wide Association Studies. bioRxiv.

DOI: https://doi.org/10.1101/2023.11.28.569016

John, M., Ankenbrand, M. J., Artmann, C., Freudenthal, J. A., Korte, A., & Grimm, D. G. (2022).
Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions.
Bioinformatics, 2022.

DOI: https://doi.org/10.1093/bioinformatics/btac455

permgwas's People

Contributors

Stargazers

Watchers

Forkers

liverworks gcostaneto xiahui625649

permgwas's Issues

Handling of missing genotype data

Hi.

I would like to confirm whether permGWAS handles missing genotype data.

When I provide a PLINK .bed file with missing genotypes I get this error:

Traceback (most recent call last):
  File "permGWAS.py", line 88, in <module>
    X, y, K, covs, positions, chrom, X_index = prep.load_and_prepare_data(args)
  File "/permGWAS/preprocessing/prepare_data.py", line 75, in load_and_prepare_data
    X = load_files.load_genotype_matrix(arguments.x, sample_index=sample_index[1])
  File "/permGWAS/preprocessing/load_files.py", line 140, in load_genotype_matrix
    raise Exception('Genotype not in additive encoding.')
Exception: Genotype not in additive encoding.

Thanks,
Yaniv

nvidia/cuda version in Dockerfile

permGWAS/config/Dockerfile

Line 1 in 3a55ddf

FROM nvidia/cuda:11.1-base-ubuntu20.04

Shouldn't be FROM nvidia/cuda:11.1.1-base-ubuntu20.04 ?

Sample ids of covariates and phenotype do not match.

Hi. I am trying to run your software on my dataset. My genotype data is a binary plink file. Without any covariate, the software runs just fine. As soon as I add covariates, I get the title error message. I have checked extensively for any mismatch between the cov and pheno files, and found none. I even used R to arrange the IDs so they appear in the same order in both files, but this has not changed anything. I tried to convert the plink file to H5 (in case the double ID columns were causing the issue) but it didn't work. I do not know what else I can do. I am copying the head of both files down below. Any help would be appreciated. I will use the opportunity to ask how should I code missing data (empty, NAs, etc)? Also, if my phenotype is binary, how it should be coded? Plink asks for 2 for cases and 1 for controls. Does your software accepts that?

PHENO:
FID,aod_FR
10E1,1
10E3,1
10E4,1
10E6,1
10N1,1
10N3,1
10N3b,1
10S1,1
10S6,1

COV:
FID,RainfallAve_1km_91to20,Elevation_1haAve,Ca_Mg,SO2_SO4_1,NO2_NO3__1,DDEG,DBH
10E1,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,78.7
10E3,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,49.8
10E4,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,52.1
10E6,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,68.1
10N1,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,71
10N3,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,59
10N3b,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,59
10S1,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,88.8
10S6,761.0971351,86.05000114,4.5,2.5,9.2,378.7459472,65

usage with singularity/ docker without root/sudo permission?

Hi there, I managed to analyse the test data on my local machine using docker and singularity.

for singularity, unless I used sudo to run the container, there were not appropriate permissions to write the results files.

here's what I did:
`docker build -t permgwasimage .

docker run -it -v /home/mshenton/permGWAS:/home/permgwascontainer --name permgwascontainer permgwasimage
`
i can successfully run gwas

`
docker save permgwasimage -o localpermgwas2.tar
singularity build localpermgwas2.sif docker-archive://localpermgwas2.tar

singularity shell --bind /home/mshenton/permGWAS:/home/permgwascontainer localpermgwas2.sif

Done performing GWAS on phenotype phenotype_value for 194 samples and 2001 SNPs.
Elapsed time: 0.993591 s
Save results.
Failure when running permGWAS2.0
[Errno 13] Permission denied: '/home/permgwascontainer3/results/p_values_phenotype_value(3).csv'
`

`
sudo singularity shell --bind /home/mshenton/permGWAS:/home/permgwascontainer localpermgwas2.sif

`
With sudo, I can successfully run gwas. However, I'd like to use this on a server where I don't have sudo priveleges. I can't use Docker there either. Any suggestions?

Thanks for the tool!

bets regards
Matt

could not convert string to float:'1\t13894\t0|t13894'

I use : python3 permGWAS.py -x ./data/mydata.map -y ./data/mypheno.pheno
The following error occurred：
Failure when running permGWAS2.0
could not convert string to float:'1\t13894\t0|t13894'

mypheno.pheno file is
FID IID phenotype_value
11797 11797 0.96590
9936 9936 0.83560

input file formats for permGWAS are not clear

Dear Grimm,
After a lot of try, I still can't run permGWAS. I can run it on test data but when it comes to real data, it gives errors at every step.
I have generated 100s of data format but almost all of them are failing to pass the program to run.
My data is huge and are available in plink formats.
But unlike your samples data, my sample_ID are names not numbers which I feel are making a trouble in analysis but it is quite impossible to change this for 100s of analyses.
Second, it is not clear from the manual if phenotype file must contains the same number of individuals in same sequence present in plink file or it is okay to have different numbers. What if some phenotype values are missing and are filled with 'NA' and how program treats such missing values. Because it is not possible to generate plink files only for those individuals where phenotypes are present if it is a requirement for program.

After trying all the things, very last error:

/data1# python3 permGWAS.py -x ./data/imputed_final_chr7Mod_js_GT3.bed -y ./data/ind5.pheno
GPU is available. Perform computations on device  cuda:0
Checked if all specified files exist. Start loading data.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1490, in array_func
    result = self.grouper._cython_operation(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 959, in _cython_operation
    return cy_op.cython_operation(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 657, in cython_operation
    return self._cython_op_ndim_compat(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 497, in _cython_op_ndim_compat
    return self._call_cython_op(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 541, in _call_cython_op
    func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 173, in _get_cython_function
    raise NotImplementedError(
NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 1692, in _ensure_numeric
    x = float(x)
ValueError: could not convert string to float: 'Alme_22'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 1696, in _ensure_numeric
    x = complex(x)
ValueError: could not convert string to complex: 'Alme_22'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "permGWAS.py", line 88, in <module>
    X, y, K, covs, positions, chrom, X_index = prep.load_and_prepare_data(args)
  File "/data1/preprocessing/prepare_data.py", line 65, in load_and_prepare_data
    y = load_files.load_phenotype(arguments)
  File "/data1/preprocessing/load_files.py", line 206, in load_phenotype
    y = y.sort_values(y.columns[0]).groupby(y.columns[0]).mean()
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1855, in mean
    result = self._cython_agg_general(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1507, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 1503, in grouped_reduce
    applied = sb.apply(func)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py", line 329, in apply
    result = func(self.values, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1503, in array_func
    result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1457, in _agg_py_fallback
    res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 994, in agg_series
    result = self._aggregate_series_pure_python(obj, func)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/ops.py", line 1015, in _aggregate_series_pure_python
    res = func(group)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/groupby/groupby.py", line 1857, in <lambda>
    alt=lambda x: Series(x).mean(numeric_only=numeric_only),
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 11556, in mean
    return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 11201, in mean
    return self._stat_function(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 11158, in _stat_function
    return self._reduce(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 4666, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 96, in _f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 158, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 421, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 727, in nanmean
    the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/nanops.py", line 1699, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric") from err
TypeError: Could not convert Alme_22 to numeric

Even after making all the IID numerical, facing error:

root@58c5c4cccb7c:/data1# python3 permGWAS.py -x ./data/imputed_final_chr7Mod_js_GT3.fam -y ./data/ind6.pheno
GPU is available. Perform computations on device  cuda:0
Checked if all specified files exist. Start loading data.
Samples of genotype and phenotype do not match.

Attaching sample data if you want to test the data.
Please have a look into error and let me know if it can be solved without sample data.
Thanks,

Vinod

sample_data.zip

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.