Git Product home page Git Product logo

bio's Introduction

.NET Bio

.NET Bio is an open source library of common bioinformatics functions, intended to simplify the creation of life science applications.

The core library implements a range of file parsers and formatters for common file types, connectors to commonly-used web services such as NCBI BLAST, and standard algorithms for the comparison and assembly of DNA, RNA and protein sequences. Sample tools and code snippets are also included.

Build Status

Linux Windows Mac OS X
Build Status Build status Build Status

Using .NET Bio in your application

.NET Bio binaries are distributed using Nuget:

  • .NET Bio Core Includes all the core APIs and web service support.
PM> Install-Package NetBio.Core -Version 3.0.0-alpha
PM> Install-Package NetBio.Padena -Version 3.0.0-alpha
PM> Install-Package NetBio.Pamsam -Version 3.0.0-alpha

Building .NET Bio from source

There are several solution files (.sln) you can use to build .NET Bio on Windows, Mac or Linux.

  • DotNetBio.sln builds the .NET Standard 2.0 assemblies necessary for Windows, Linux or Mac OSX. This is the easiest version to build and the one we recommend you start with. It can be built with Visual Studio, Xamarin Studio, or MonoDevelop.
  • DotNetBio-Fulll builds some optional command line tools which showcase some of the framework classes for .NET Bio.

Project Goals

.NET Bio has been built with specific goals in mind:

Extensibility: .NET Bio is designed to be easy for a programmer to extend with new functions, please refer to the developer documentation available on this site. Developers who extend .NET Bio are encouraged to contribute their code back to the project so that the community as a whole can benefit from their work.

Flexibility: Whatever .NET-supported language you choose, the code you write will work with .NET Bio —so the accessibility of Visual Basic®, the power of C#, the speed and conciseness of functional languages such as F# or the ad-hoc scripting capabilities of Python are all available, as are many others. As a library of common code, .NET Bio can be used to build whatever application type meets your needs, whether integrating with applications such as Microsoft Excel, building commandline or GUI applications from scratch, or creating cloud services or workflow components.

Community: .NET Bio is a community-owned open source project and welcomes participation and contributions from programmers with an interest in the life sciences. We provide forums for discussions and help, documentation and sample applications, and tools to report bugs and request new features.

History of the project

The original home for the project was bio.codeplex.com - we decided not to carry over the history prior to version 2.0 of the project, but you can still go to the older (deprecated) site and get the original source code if necessary.

Additional Information

bio's People

Contributors

acesnik avatar cpatmoore avatar evolvedmicrobe avatar jjby avatar markjulmar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bio's Issues

Bio.Platform.Helpers.PlatformServices causes exceptions to be thrown on fresh build

If you build and copy Bio.Desktop, you can't parse a FASTA File because

Bio.Platform.Helpers.PlatformServices.MaxSequenceSize is set to 0. This leads to a very confusing "sequence is longer than 2GB error when this line of code executes in the FASTA parser.

if ((((long)bufferPosition + line.Length) >= PlatformManager.Services.MaxSequenceSize))
{
    throw new ArgumentOutOfRangeException(
        string.Format(CultureInfo.CurrentUICulture, Properties.Resource.SequenceDataGreaterthan2GB, name));
}

Presumably this is set somehow by nuget or other deployment mechanisms? Do we know how we can make it work for the general build case?

Add LICENSE file

Please add a LICENSE file with the full license text. Additionally it would be great to specify the license (briefly) in the project readme, something along the lines of "Licensed under the Apache 2.0 license".

Mitochondrial genetic code variations

Bio.Algorithms.Translation.Codons contains the standard genetic code. Should there be a second codon map for mitochondrial genes? AGA and AGG are terminating codons in mitochondria, and UGA codes for tryptophan.

Clustal Parser

I have noticed that when I try to parse a ClustalW file, it only parses the first set of alignments and ignores all the rest. Is this a common issue and is there a way around it?

Suffix arrays for alignment

This is a wonderful library of functions for bioinformatic analysis. I'm excited to try it out!

Do you have any plans to implement an alignment algorithm similar to STAR, which uses suffix arrays for string searches?

Can't build with Mono Tarball

A build on windows or mac with either Xamarin or Visual Studio works fine. Additionally, on Linux using any pre-built CLR package (like our Travis CI builds) also works. However, if installing mono from a tarball using

./configure
make

Then the build fails with an error message indicating that the PCL targets cannot be found.

BoyerMoore Updated with example

In testing pattern searches with BoyerMoore i can't get it to return any matches until the search string starts with a *. This also results in position 0 always being the match location. I'm going to try and do further testing but ever the simple example from the cookbook doesn't behave as i would expect.

The type initializer for 'Bio.Alphabets' threw an exception

Hi,

I am getting following error even in basic code using dotnetbio on windows and visual stduio 2015 community edition.

Code: // Create two sequences; normally you'd have this already.
ISequence dna1 = new Sequence(Alphabets.AmbiguousDNA, "ACTGAAGGATATTA");

Error message
The type initializer for 'Bio.Alphabets' threw an exception.

Thx for your help,
Ambi.

Burrows-Wheeler Aligner (BWA) implementation?

Is it possible to consider an implementation of the Burrows-Wheeler Aligner (BWA), considering it's one of the most popular open-source sequence mapping packages available and is currently only available via a native Linux implementation?

https://github.com/lh3/bwa

There have been occasional Windows ports, but none were official, nor fully functional.

FASTA parser doesn't support comments

I realize comments aren't documented (and thus probably not supported by) the NCBI docs linked in the parser file, but I feel this is something simple to support and makes the parser a little less strict as comments are simply ignored. Happy to send a PR for the feature if it's something you're interested in

Issues with installing the BioTools

I used the tools before they became a part of OuterCurve. Now you've move them to GitHub.
I tried to compile the libraries to build the tools but I get this error. What type of project do I need to create for nuget to install correctly?

I'm trying to install the .NET BIO libraries, https://github.com/dotnetbio/bio , from Nuget but when I enter the correct commands into the package console I get this error:

PM> Install-Package NETBioCore.PCL

Install-Package : The current environment doesn't have a solution open.

At line:1 char:1

  • Install-Package NETBioCore.PCL

    • CategoryInfo : InvalidOperation: (:) [Install-Package], Invalid

    OperationException

    • FullyQualifiedErrorId : NuGetNoActiveSolution,NuGet.PackageManagement.Po

    werShellCmdlets.InstallPackageCommand

Order & Orientation of Sequences matters for DeNovo Assembler

I've seen a situation pop up numerous times where the length of the final consensus sequence when calling the following code, changes, if the sequences are ordered from longest to shortest and vice versa, or if the orientation of the sequences changes
Bio.Algorithms.Assembly.OverlapDeNovoAssembler assem = new Bio.Algorithms.Assembly.OverlapDeNovoAssembler();
assem.OverlapAlgorithm.GapOpenCost = -10;
assem.OverlapAlgorithm.GapExtensionCost = -2;
assem.OverlapAlgorithm.SimilarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.AmbiguousDna);
var assembly = assem.Assemble(reads) as Bio.Algorithms.Assembly.OverlapDeNovoAssembly;

compare by assembly.Contigs.First().Consensus.Count
Ive tried to make some simulated data to provide a test case, but can't seem to find one that works.
but I can verify it does this with as little as two sequences

GFF version 3 crash

There's a check for GFF parsing that requires version 2. The parser works fine for version 3 (default version for Ensembl now), since the attributes are stored as free text. I'd suggest changing the version check or removing it if the parser also works fine for version 1.

propose - wrapper powershell module for bio.dll

Currently the way to explore the usage of bio.dll is to create a console application that uses bio.dll in some way. This can be made simpler by creating a wrapper powershell module for bio.dll that would allow a new user to try out the various features of the library in an easy way. This would enable the exploration of the library to be much simpler than creating a console application, and encourage new users to try out and create sample/prototype scripts for their end-use.

What do you think about this?

Three-letter amino acid abbreviations

A low-priority issue, but it would be nice to eventually add the three-letter abbreviations for amino acids to the aminoAcidValueMap in ProteinAlphabet.

Speed bottleneck: Avoid re-parsing bam header when reading many intervals

I've got come code which parses many intervals from a bam file, with calls to the ParseRange() method. The results are correct, but slower than expected.

Reading through the source code (and running the profiler), it looks like the slowdown came from time spent re-parsing the bam header and index in each each call to ParseRange(). Most of my time was spent in BamParser.GetHeader() or BamIndexStorage.Read(). For the particular use case I'm working on, the overhead of re-reading and re-parsing is high. (My bam file has >10000 contigs. I'm parsing several thousand intervals. There are ~100 aligned reads for a typical interval, though some have far higher coverage).

I'd like to speed this up, preferably in a non-hacky way. I've got a change to my copy of the code that does the trick for my use case. The parser caches its I added an optional "useCaching" flag to the ParseRange() method (false by default). If the flag is set to true, the parser re-use the existing SAMAlignmentHeader and BAMIndex objects (if they exist - if they're null we parse as usual). With the current patch it's up to the caller to avoid setting this flag to true when switching between bam files, though that could be an easy sanity-check to add. (And of course, this caching would not be appropriate when parsing a bam file that has been updated on disk in between one ParseRange() call and the next). I can polish that up for a pull request if that sounds like a sane way to proceed and not too much of a corner case to worry about.

.NET Bio flashback

I have to revisit an old .NET bio project from a few years back. However, in the meantime I have left the Windows world behind. So I installed MonoDevelop (on Ubuntu 16.04) to see if it would work. I was trying to migrate the old code but it gives problems with simple things that worked fine. For example this:

    Sequence tsequence = new Sequence(Bio.Alphabets.RNA, line1.ToString());

Now throws an exceptions saying "The type initializer for 'Bio.Alphabets' threw an exception."
Apparently I need to "add a reference to a Bio.Platform.Helpers library."
Any ideas?

BamParser.Parse() returns null objects in rare instances

Today I used dotnetbio to parse a bam file. In rare instances (for a total of 3 reads out of 1 million), BamParser.Parse() returned a null object rather than a SAMAlignedSequence object.

I stepped through in the debugger and I think these lines, in BamParser.cs, are the cause of my issue:
if (alignedSeq.RefEndPos - 1 < start && alignedSeq.RName!="*")
{
return null;
}

Here I'm reading the whole bam, so start == 0. This is a read which is unmapped, but it still has its Pos set to 1 and RName set because its mate was mapped with Pos==1. For the affected read, alignedSeq.RefEndPos is 0. By subtracting 1 from alignedSeq.RefEndPos we get -1, which is less than start == 0, so we return null.

This one-line change fixes the bug, and I believe it's correct in general - I confirmed the unit tests still pass:
if (alignedSeq.RefEndPos - 1 < start && alignedSeq.CIGAR != "*")
{
return null;
}

Sequence type parameter required

// [2] Prepare data
string seq = @"GACGCCGCCGCCACCACCGCCACCGCCGCAGCAGAAGCAGCGCACCGCAGGAGGGAAG";
seq.ToString();
Sequence sequence = new Sequence(Alphabets.DNA, seq);

        // [3] Create and configure service handler
        EbiWuBlastHandler blastService = new EbiWuBlastHandler();
       // NCBIBlastHandler blastService = new NCBIBlastHandler();

        ConfigParameters configParams = new ConfigParameters();
        configParams.UseBrowserProxy = true;
        blastService.Configuration = configParams;

        // [4] Define query.
        BlastParameters searchParams = new BlastParameters();
        searchParams.Add("Program", "blastn");

        searchParams.Add("Database", "em_rel");
        searchParams.Add("Expect", "1e-10");
        //searchParams.Add("Email", "YourAddress@YourInstitution");

        // [5] create and submit request
        string jobID;
        try
        {
            jobID = blastService.SubmitRequest(sequence, searchParams);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Service is not available.");
            Console.WriteLine(ex);
            return;
        }

        // [6] Wait for Ready status
        ServiceRequestInformation info = blastService.GetRequestStatus(jobID);

        if (info.Status != ServiceRequestStatus.Waiting
               && info.Status != ServiceRequestStatus.Ready)
        {
            Console.WriteLine("Service is not ready or waiting.");
            return;
        }

        int maxAttempts = 10;
        int attempt = 1;
        while (attempt <= maxAttempts
                && info.Status != ServiceRequestStatus.Error
                && info.Status != ServiceRequestStatus.Ready)
        {
            ++attempt;
            info = blastService.GetRequestStatus(jobID);
            Thread.Sleep(
                info.Status == ServiceRequestStatus.Waiting
                || info.Status == ServiceRequestStatus.Queued
                ? 20000 * attempt
                : 0);
        }

        // [7] Get results
        IList<BlastResult> results2 =
              blastService.FetchResultsSync(jobID, searchParams) as List<BlastResult>;

I'm running this code mentioned in MBF programming guide but it is giving me an error that sequence type parameter is required: Service not available

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.