Git Product home page Git Product logo

negativerna's Introduction

NeRNA: a negative data generation framework for machine learning applications of non-coding RNAs

About

If you use workflow in your research, please consider citing;

Orhan, M. E., Demirci, Y. M., & Saçar Demirci, M. D. (2023). NeRNA: A negative data generation framework for machine learning applications of noncoding RNAs. Computers in biology and medicine, 159, 106861. https://doi.org/10.1016/j.compbiomed.2023.106861

NeRNA is a novel negative data generation framework that is developed on the KNIME analytics platform. This workflow employs non-coding RNA sequences to generate negative RNAs.

Supervised machine learning-based non-coding RNA (ncRNA) analysis methods have been developed to classify and identify novel sequences. During such analysis, the positive learning data sets usually have known examples of ncRNAs published in databases. On the contrary, neither databases listing the confirmed negative sequences for a specific ncRNA class nor standardized methodologies developed to generate high-quality negative examples. To achieve this challenge, we developed a novel negative data generation method, NeRNA (negative RNA).

Requirements

You can download NeRNA workflow in Knime Workflow folder or directly here.

Firstly, the NeRNA framework is developed on the KNIME Analytics platform; therefore KNIME should be installed. A second required tool is RNAfold application from Vienna RNA package (Please follow the instruction, for the installation RNAfold on their website.). R software environment, seqinR and stringR packages are required for R scripts.

To configure R settings in KNIME:

  • Inside KNIME File -> Preferences -> KNIME (left side of the pop-up) -> R

  • Set to R path and Rserve memory.

  • Please use the following commands in your R / R Studio to install the required packages.

library("Rserve")
Rserve(args = "--vanilla")
#Additionally, the seqinr and stringr packages are required in order to use R scripts.
install.packages("seqinr")
install.packages("stringr")

KNIME Workflow Overview

KNIME Workflow Overview

  • Select Sequence File, Sequence Type, and RNAfold Path: This node configures the location of the Sequence fasta file and RNAfold path. Also, non-coding RNA types should be selected.
  • NeRNA Generation: NeRNA Generation is the primary node of the NeRNA workflow. There are two subgroups in this node: CASE switch and NeRNA generation.
    • CASE switch: This node changes RNAfold and Sequence Converter Calculation parameters by Sequence type condition. Such as, for circRNA sequences, the --circ parameter is used in RNAfold, and for the tRNA condition, the Sequence converter node is modified.
    • NeRNA Generator: Main node of Negative RNA workflow. NeRNA component figure
      • RNAfold Calculation: This node calculates secondary structures for each sequence. Secondary structures are essential since negative sequences are generated based on these structural representations.

        Sequences that RNAfold does not calculate are removed. Check Std Output and R error Output on the RNAfold Calculation node.

      • Checking Wrong Calculation: This node checks the for the structures without mfe(minimum free energy) values.

      • Check Missing Value: This node checks non-calculated sequences. These sequences are removed before the sequence converter process.

      • Sequence Converter: This meta node's task is to reconfigure sequences based on their secondary structures and base pairing.

      • Negative Generator Binary Index Change: This meta node is the main calculation of NeRNA workflow. All sequences are converted to octal representation, and then a novel methodology is applied to each sequence for creating negative RNA sequences.

  • Column Filter: Filtering unused columns like iteration number.
  • Column Rename: This node renames the Column for the FASTA Writer.
  • FASTA Writer: Writes a fasta file based on the file name and output location information taken from the user.

Case Studies

NeRNA workflow is tested on four non-coding RNA classes: microRNA, long non-coding RNA, circular RNA, and tRNA sequences.

In case studies, machine learning and deep learning-based classifiers like Decision Trees (DT), Random Forest (RF), Naive Bayes (NB), Multilayer perceptron (MLP), Convolutional neural network (CNN), and Feed-forward neural networks (FNN) are employed to test novel negative sequences. In the test condition, equal numbers of negative and positive sequences are used to train the models, and the data sets are divided into learning and testing portions at a 70/30 ratio. Additionally, 1000-fold Monte Carlo Cross-Validation is used in the process.

RNA type Organisms Number Sequence Length Min Sequence Length Max Sequence Length Average Source



miRNA
hairpins

Homo sapiens 1917 41 180 81.89 miRBase
Mus musculus 1234 39 147 82.6 miRBase
Bos taurus 1064 43 149 76.23 miRBase
Gallus gallus 882 48 169 87.36 miRBase
Oreochromis niloticus 812 40 100 61.05 miRBase
Equus caballus 715 52 145 104.61 miRBase
Glycine max 684 54 473 135.92 miRBase
Monodelphis domestica 680 44 111 64.92 miRBase
Medico truncatula 672 54 910 165.26 miRBase
Pan troglodytes 655 69 148 89.94 miRBase
tRNA 101* 1110 54 99 77.56 Psi-C Database
lncRNA Homo sapiens 1000 202 29066 1496.97 LNCipedia
circRNA Mus musculus 1000 51 29991 1566.49 circBase

Positive sequences, NeRNA generated negative sequences and the classification results of case studies are available in Case Studies folder. NeRNA Structure Result contains the secondary structures of 5 negative and 5 normal example sequences. Secondary structures of RNAs are constructed using StructureEditortool.

Result

Comparison Analysis

Negative RNA sources in the literature are used to compare with negative data from NeRNA.

negativerna's People

Contributors

mehmeteminorhan avatar

Stargazers

Merve Gul Turan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.