Git Product home page Git Product logo

qris's Introduction

QRIS: a machine learning framework to investigate the determinants of retrovirus integration specificity

Preprint Available MIT Licence

✍️Author: Houyu Zhang

📧Email: [email protected]

Copyright (c) 2021 YenLab@SKLEH. All rights reserved.

Introduction

The activity of retrovirus is linked with various diseases. Retrovirus can integrate a copy of its genomic DNA into the host genome and store it as the provirus. It threatens the host cell by interrupting the host genome architecture and transcribing its provirus for detrimental expansion. Meanwhile, depending on where the retrovirus integrates, the host has corresponding defense mechanisms to repress the provirus transcription and hence eliminate the harm. So, the integration site selection is a vital process for retrovirus's fate.

Many efforts were made to understand its integration specificity and found diverse DNA motif preferences across retroviruses genera. However, the effect of genome-wide DNA structural properties, like DNA shapes, on retrovirus integration was less clear. We systematically investigated this issue on six types of retroviruses that are representative of four genera. Here we devised QRIS (Quantify the Retrovirus Integration Specificity), a machine learning framework to assess the DNA shape effect on large-scale retroviruses integration sites. We found that the DNA shape can independently or cooperatively work with the DNA motif to regulate retrovirus integration. Based on this, we classified these retroviruses into three categories: StrongFavor retrovirus, WeakFavor retrovirus, and Strongshape retrovirus. Interestingly, the Strongshape retrovirus can gain specificity through DNA shapes even without a particular DNA motif.

Our qualitative and quantitative evaluation of DNA shape and DNA motif revealed their diverse roles in regulating the retrovirus integration specificity. Our findings may help more precisely control the lentivirus vector for gene therapy and disturb the retrovirus integration during the pathogenic process in the future.

📁Scripts organization

All raw, intermediate, final data and all scripts here can fully reproduce the paper.

  • Scripts head with series number is for specific analysis follows the order in the retrovirus paper.

    Specifically, the bash script is for processing data and corresponding R scripts is used for downstream analysis, statistics and plotting.

  • QRIS_Rawdata: Store raw data, shuffled data, and R object of DNA shape values. These files if for QRIS_Step1.

  • QRIS_Results: store machine learning results and can be visualized using QRIS_Step2.

📑Citation

Zhang H Y, Yang L, Xu J, et al. QRIS: a machine learning framework to investigate the determinants of retroviral integration specificity (in Chinese). Sci Sin Vitae, 2022, 52

qris's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.