Git Product home page Git Product logo

rrblup's Introduction

rrBLUP

rrBLUP is a R package which used for genomic prediction with the rrBLUP linear mixed model (Endelman 2011). We transform it into functions in Python.

Requirement

Code for rrBLUP was prepared using Python 3.8.15. The major requirements are listed below, and the full list can be found in requirement.txt:

  • numpy==1.24.0
  • scikit-learn==1.2.0
  • pandas==1.5.2
  • scipy==1.9.3
  • rpy2==3.5.5

We suggest creating a new virtual environment for a clean installation of all the relevant requirements by following commands:

conda create -n rrblup python=3.8.15
conda activate rrblup
conda install --yes --file requirement.txt

Use jupyter notebook in the new virtual environment by following commands:

conda install ipykernel
python -m ipykernel install --user --name rrblup --display-name='rrblup'

Data

We use the genomic dataset with protein as the trait from SoyNAM (soybase.org/SoyNAM/index.php), found in data folder. The default coding format in SoyNAM is : -1 for the missing value, 0 for 0/0 genotype, 1 for 0/1 genotype and 2 for 1/1 genotype. However, rrBLUP R package requires input genotype matrix coded as -1,0,1 for 0/0, 0/1, 1/1 genotype respectively. Therefore, we convert the data coding format in data/data_convertion.ipynb.

Usage

There are two functions A_mat and mixed_solve in rrBLUP.py, which are same as the functions A.mat and mixed.solve in R package rrBLUP. The notebook includes how to use rrBLUP in Python within demo.ipynb. Meanwhile we also provide a R script about how to use these functions in R within demo.R. Detail usages about these two functions are shown below.

A_mat(X, min_MAF = None, max_missing = None, impute_method = "mean", tol = 0.02,
      shrink = False, n_qtl = 100, n_iter = 5, return_imputed = False)
    
    '''
    Parameters:
    -----------
    X [array]:
        Matrix of unphased genotypes for n lines and m biallelic markers, coded as {-1,0,1}
        
    min_MAF [float, default = None]:
        Minimum minor allele frequency, default removes monomorphic markers
        
    max_missing [float, default = None]:
        Maximum proportion of missing data, default removes completely missing markers
        
    impute_method [str("mean" or "EM"), default = "mean"]:
        Method of genotype imputation, there are only two options, "mean" imputes with the mean of each marker and
        "EM" imputes with an EM algorithm
        
    tol [float, default = 0.02]:
        Convergence criterion for the EM algorithm
        
    shrink [Union[bool, str("EJ" or "REG")], default = False]:
        Method of shrinkage estimation, default disable shrinkage estimation; If string, there are only two options,
        "EJ" uses EJ algorithm described in Endelman and Jannink (2012) and "REG" uses REG algorithm described in
        Muller et al. (2015); If True, uses EJ algorithm
    
    n_qtl [int, default = 100]:
        Number of simulated QTL for the REG algorithm
    
    n_iter [int, default = 5]:
        Number of iterations for the REG algorithm
    
    return_imputed [bool, default = False]:
        Whether to return the imputed marker matrix
    
    
    Returns:
    --------
    A [array]:
        Additive genomic relationship matrix (n * n)
    
    (When return_imputed = True)
    imputed [array]:
        Imputed X matrix
    '''


mixed_solve(y, Z = None, K = None, X = None, method = "REML",
            bounds = [1e-09, 1e+09], SE = False, return_Hinv = False)
    
    '''
    Parameters:
    -----------
    y [array]:
        Vector of observations for n lines and 1 observation
    
    Z [array, default = None]:
        Design matrix of the random effects for n lines and m random effects, default to be the identity matrix
    
    K [array, default = None]:
        Covariance matrix of the random effects, if not passed, assumed to be the identity matrix
    
    X [array, default = None]:
        Design matrix of the fixed effects for n lines and p fixed effects, which should be full column rank,
        default to be a vector of 1's
    
    method [str("ML" or "REML"), default = "REML"]:
        Method of maximum-likelihood used in algorithm, there are only two options, "ML" uses full maximum-likelihood
        method and "REML" uses restricted maximum-likelihood method
    
    bounds [list, default = [1e-09, 1e+09]]:
        Lower and upper bound for the ridge parameter
    
    SE [bool, default = False]:
        whether to calculate and return standard errors
    
    return.Hinv [bool, default = False]:
        whether to return the inverse of H = Z*K*Z' + \lambda*I, which is useful for GWAS
    
    
    Returns:
    --------
    Vu [float]:
        Estimator for the marker variance \sigma^2_u
    
    Ve [float]:
        Estimator for the residual variance \sigma^2_e
    
    beta [array]:
        BLUE for the fixed effects \beta
    
    u [array]:
        BLUP for the random effects u
    
    LL [float]:
        maximized log-likelihood
    
    (When SE = True)
    beta.SE [float]:
        Standard error for the fixed effects \beta
    
    u.SE [float]:
        Standard error for the random effects u
    
    (When return_Hinv = True)
    Hinv [array]:
        Inverse of H = Z*K*Z' + \lambda*I
    '''

License

Our code is released under Apache License 2.0.

rrblup's People

Contributors

zhejiangzhuque avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.