Git Product home page Git Product logo

hedges's Introduction

HEDGES

A package for encoding and decoding arbitrary byte data to and from strands of DNA using a robust an error-correcting code (ECC).

HEDGES Error-Correcting Code for DNA Storage Corrects Indels and Allows Sequence Constraints

William H. Press, John A. Hawkins, Stephen Knox Jones Jr, Jeffrey M. Schaub, and Ilya J. Finkelstein

Proc Natl Acad Sci. accepted for publication (June, 2020)

Installation

The following instructions should work across platforms, except that installing virtualenv with apt-get is Ubuntu specific. For other platforms, install virtualenv appropriately if desired.

First, clone the repository to a local directory:

git clone https://github.com/whpress/hedges.git

Optionally, you can install into a virtual environment (recommended):

sudo apt-get install -y virtualenv
cd hedges
virtualenv envhedges
. envhedges/bin/activate

Now install required packages:

pip install numpy==1.13.3 && pip install -r requirements.txt && python setup.py install

What is supplied

Supplied is not a single program, but a kit for variable user applications. The kit consists of

  1. C++ source code that compiles (in Linux or Windows) to the Python-includable module NRpyDNAcode. Precompiled binaries are supplied for Python 2.7 in Linux and Windows, but recompilation may be necessary if these don't work. This module implements the HEDGES "inner code" as described in the paper.

  2. C++ source code that compiles (in Linux or Windows) to the Python-includable module NRpyRS. Precompiled binaries are supplied for Python 2.7 in Linux and Windows, but recompilation may be necessary if these don't work. This module implements the Schifra Reed-Solomon Error Correcting Code Library. See http://www.schifra.com for details and license restrictions. This module is not needed for the HEDGES inner code, but is needed only to implement the "outer code" as described in the paper. Some users will instead want to utilize their own outer codes.

  3. Python program print_module_test_files.py, which verifies that the above modules can be loaded and prints their usage. Most users will not need to use any of the routines in these files directly, but should instead use the Python functions in the following file:

  4. Python program test_program.py . This defines various user-level functions for implementing the HEDGES inner and Reed-Solomon outer codes as described in the paper. The example inputs arbitrary bytes from the file WizardOfOzInEsperanto.txt, encodes a specified number of packets (each with 255 DNA strands), corrupts the strands with a specified level of random substitutions, insertions, and deletions, decodes the strands, and verifies the error correction. To better validate the installation, the code rate and corruption level set by default are chosen to be stressful to HEDGES and is greater than that in an intended use case.

Testing and familiarization

Run the program test_program.py . It should produce output comparable (but not identical) to the files sample_linux_test_output.txt and sample_windows_test_output.txt. The output will not be identical, because different random numbers are used to create DNA errors in each run.

If the above works, then try varying some of the parameters. In particular, you can change coderatecode to increase or decrease the code rate, the values (srate,drate,irate) to change the fraction of substitutions, deletions, and insertions generated for the test, and totstrandlen, the total strand length of the DNA (including left and right primers). The many other parameters are either self-explanatory, or else described in the paper. Most users will not initially need to change them.

Recompiling the C++ modules

The modules are built using the Numerical Recipes C++ class library nr3python.h . This is included here and also freely available for unlimited distribution at http://numerical.recipes/nr3python.h . Generally, you will not need to understand this library, but, if you are curious, a tutorial on its use is at http://numerical.recipes/nr3_python_tutorial.html . You should also consult this tutorial if you have difficulty recompiling the modules. Note that while other Numerical Recipes routines are copyright and require a license, no restricted routines are used in the two modules here supplied.

In Linux, go to the directory LinuxC++Compile containing the source code and run the script compile_all.sh . Then copy the two files produced, NRpyDNAcode.so and NRpyRS.so, to the directory containing test_program.py. The most common source of errors is the compiler's inability to find required Python and Numpy include and library files that are part of your Python installation. Unfortunately, we can't help you with that.

In Windows, go to the directory WindowsC++Compile and fire up the Community Visual Studio 2019 solution NRpyDNAcode.sln . This should build the two files (in the x64\Release directory) NRpyDNAcode.pyd and NRpyRS.pyd . Copy these to the directory containing test_program.py. If this doesn't work, and you need to build your the Windows modules from scratch, then keep these points in mind: You want to compile to produce .dll files (not .exe files), and you want to then simply rename these to .pyd. As in Linux, a common source of errors is the compiler's inability to find required Python and Numpy include and library files that are part of your Python installation. You'll need to locate them and set appropriate include directories.

LICENSE (MIT License)

Copyright 2020 by William H. Press, John A. Hawkins, Stephen K. Jones jr, and Ilya J. Finkelstein

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Written with StackEdit.

hedges's People

Contributors

whpress avatar

Stargazers

 avatar  avatar  avatar Jiahao Zhou avatar XUYANG avatar Haoling Zhang avatar  avatar csyml avatar smeanapole avatar  avatar Daniel Antonio Negrón avatar slp avatar

Watchers

Henry Lee avatar Daniel Antonio Negrón avatar  avatar  avatar

hedges's Issues

A puzzle in the code

As I read your code, I had a problem. In the shoveltheheap method of NRpyDNAcode.cpp, the variable qq is always 0. I didn't see the update of qq. I'm not very clear about the role of variable qq.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.