Git Product home page Git Product logo

jbwa's Introduction

jbwa

Build Status

Java Bindings (JNI) for bwa

Author: Pierre Lindenbaum PhD. @yokofakun (Institut du Thorax, Nantes, France) BWA is written by Heng Li (Broad Institute)

Motivation

BWA (http://bio-bwa.sourceforge.net/) contains a small C example(https://github.com/lh3/bwa/blob/master/example.c) for running bwa-mem as a library (bwamem-lite). I created some JNI bindings to see if I can bind the C bwa library to java and get the same output than bwamem-lite.

Compilation

I've tested this code under linux and

  • JAVA oracle JDK8
  • GNU Make 3.81
  • gcc 4.8.2
  • wget

BWA for apache2 will be downloaded ( https://github.com/lh3/bwa/tree/Apache2 ) .

typing make, should download the sources bwa, compile and execute some tests.

See also

Contribute

License

The project is licensed under the Apache2 license.

Example (Two FASTQs)

System.loadLibrary("bwajni");
//load the index
BwaIndex index=new BwaIndex(new File(args[0]));
//load the bwa engine
BwaMem mem=new BwaMem(index);
//get reads from two fastqs
KSeq kseq1=new KSeq(new File(args[1]));
KSeq kseq2=new KSeq(new File(args[2]));
//build a list of two fastqs, forward and reverse
List<ShortRead> L1=new ArrayList<ShortRead>();
List<ShortRead> L2=new ArrayList<ShortRead>();
//while something can be done
for(;;)
        {
        //read the pair of fastq
        ShortRead read1=kseq1.next();
        ShortRead read2=kseq2.next();
	//should we analyze and dump the data ?
        if(read1==null || read2==null || L1.size()>100)
                {
                if(!L1.isEmpty())
                        for(String sam:mem.align(L1,L2)) //get the SAM records
                                {
                                System.out.print(sam);
                                }
                if(read1==null || read2==null) break;
                L1.clear();
                L2.clear();
                }
        L1.add(read1);
        L2.add(read2);
        }
kseq1.dispose();
kseq2.dispose();
index.close();
mem.dispose();

Testing

Here is the ouput of the JAVA version:

java  -Djava.library.path=src/main/native -cp src/main/java com.github.lindenb.jbwa.jni.Example2 \
	human_g1k_v37.fasta  tmp1.fq  tmp2.fq

HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192       121     1       229568362       37      13S87M  =       229568362       0       GCTCTTCCGATCTGGCACGTTGAAGGTCTCAAACATGATCTGGGTCATCTTCTCGCGGTTGGCCTTGGGATTGAGGGGGGCCTCGGTGAGCAGGGNGGGG       AB?DDDDDDDBDCDDDDDDDDDDCDDDDCCC>(DCDDDDDDBDDDCCCCBDDDFFEEJIHIJIIHJIJJJJJJIJJJJJJJJJJJJJHHHHHDA2#FCCC    NM:i:1  AS:i:85 XS:i:61
HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192       181     1       229568362       0       *       =       229568362       0       GCTCTTCCGATCTCCCCACCCTGCTCACCGAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAGNNNNNNNNNNNNNNNNNNAACGTGCC       ?DDDDDDDDDDDDDDB?9BDDDDDDDBBB?8,,######################################?12##################FFFFFCCC    AS:i:0  XS:i:0  
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423        69      X       16753128        0       *       =       16753128        0       AGATNGGAAGAGCACACGTCTGAACTCCAGTCACCAAGGAGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACAAATACGGATGAGACATG       CCCF#2ADHHHHHJJJJJJJJJJJJJJ>9:1*1C3C8D600)0*0*/00-.8B)--5B().).=).?CFFFBBBDB########################    AS:i:0  XS:i:0  
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423        137     X       16753128        0       58S34M8S        =       16753128        0       AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAACAAAAAAAGAGATGAACAAGCAAA       CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJHIHIIJJJJJJJHJJIIJJJHFFFFEEEEEEEDDDD##################################    NM:i:0  AS:i:34    XS:i:29 
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463        97      12      110765491       60      70M30S  =       110765491       70      AATTNGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCAAGATCGGAAGAGCACACGTCTGAACTCCAG       CCCF#4BDHGHHHJJJJJJJJJIJHIJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJIIIIHIJJJJIIIJIJJGHEHFFFEDDEEAA@BDDDCDDDD:C@    NM:i:1  AS:i:68 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463        145     12      110765491       60      30S70M  =       110765491       -70     CTCTTTCCCTACACGACGCTCTTCCGATCTAATTTGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCA       DDDDDDDDDCAB=DDBDEEFFFFHHHJJJGHHGGFJJJJJIIIIJJJJJIJJJJJIJIIIJJJJJJJJJJJJIJJHHHFHEEJJIJJHHHHHFFFFFCBC    NM:i:0  AS:i:70 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297        81      4       114279632       60      100M    =       114279455       -277    GATTCCTACTGCACCCATGGAGAATGTGCCTTTTACTGAAAGCAAATCCAAAATTCCTGTAAGGACTATGCCCACTTCCACCCCAGCACCTCCATNTGCA       DCDDDDCACCDBCBCDDDCDDCCA?EEDDDFFDFFFHHHGHHHJJJJJJJJIJJIJIJJIJIJJJJJJJJJJIGJJIIHFIJJJJHGDHHDHDA2#FCCB    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297        161     4       114279455       60      100M    =       114279632       277     CGTGCAAACGGGTGATATACCTCCTCTCTCTGGTGTAAAGCAGATATCCTGCCCCGACTCTTCTGAACCAGCTGTACAAGTCCAGTTAGATTTTTCCACA       CCBFFFFFHHHHFHIJJJJJIIJJJJJJJJJJJHIGIJIIJJJJJJJJJJHIJJJJJJJHHHHHHFDDDFDDEEEDDDADCCDDDCCDCCDEDDDCACCC    NM:i:0  AS:i:100  XS:i:0   
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320        81      2       179597667       60      100M    =       179597628       -139    GGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCCTTAAACCAGGACACCCTCATGGGGAGGGAGCCTGNAATT       ABDDDDDBDDDDDDEDDEDDDEECEEFFFFFFHGHHHHJJIJJJJJIIJJIJJJJJJJJJIIJIHGJJJJJHHEJJIHJJJJJJJJJHHHHHDA2#FCCC    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320        161     2       179597628       60      100M    =       179597667       139     CCCTGCATCATTCATGTCTACTCTGATGATCTCCAAAGAGGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCC       CCCFFFFFHHHHHJJJJJJJJJJJJIJJJJJJJJJJJJJJIIJJIIHJJJJIJJGIIIJJJIIJIIIHGIJJJJJIIEHHHHHHFBFFDEFECDECCDDA    NM:i:0  AS:i:100  XS:i:0   
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408        97      2       220283746       60      100M    =       220283863       217     CAGCNGCTCAAGGCCAAGTGAGGGCCCGGCACCCCAGACTCCTCTTTCTGCGGGCAGGGCACAGGAGGCTAGGCCTGGGGGCTGGGGTCCCGCTGTCAGC       CCCF#2ADHHHHHFIJIIHIGIJJJJJJJJIIJJJJIJJJJJJJIIIJJIGFFFDDDDDDDBDDD?BDBDCBBDDCDDDDDBDDDBB>BBDDDDB@CDCD    NM:i:2  AS:i:93 XS:i:23
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408        145     2       220283863       60      100M    =       220283746       -217    GCCCGGGACCCTCTCCTGCCCCATGTGGAGAAAGGGTCCTCCACCTGTGTGTTTCAAGGGGCCGTGACCTCCAGGTCTCTCCCCCTGCGATCCCATCTTG       BDDBDBC?DDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDCADDDDDBEEEEEFFFFHHIJJJIHGJJJIJJJJJIIIIJIJJJJJHHHGHFFFFFCCC    NM:i:0  AS:i:100  XS:i:0   
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325        97      22      46114322        60      100M    =       46114410        188     AAAGNCCGGAATTGGTACAAGCCATGTTTCCCAAACTGAACAATCAAGAAAGGTAACCCCCCAACCAGCGTGGTCTGGAGTATTTAGCATTCCATATAGG       CCCF#2ADHHHHHJJGHIJJJJJJJJIGJJJJJJJJJJJJJJJJJJJJGHIJJHIJJIIJJHFFFFDDCD?BDDDCCDCD>ACDEEDDDEDDEDCCCCCD    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325        145     22      46114410        60      100M    =       46114322        -188    ATTCCATATAGGGTATTCGATGCACGTGACTGAAAAGCTGTGTGGTTTCTGAGTTGGCACAGAATCTCTAAATACATGTTTCTGTGTTGGTAATGGTTTT       DDCDEDCCDDDDCDDEEDEFFFFFHHHHIJJJJJJJIJJJJIIJJJIIGGJJJJJIJJJJJJJJIIHJJJJJIIJJJJJJJIIJIJIHFHHHFFFFFCCC    NM:i:0  AS:i:100  XS:i:0   
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488       97      3       38763808        60      100M    =       38763855        147     CCACNATACGGTAGCAAGTCTTGCGCACCTGCCAGCCCACATCCCATGGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGT       CCCF#4ADHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJEIJJIJJJHHHFFFFFFFEEEEEEEDABBDDDBBCCDBD>BDDDDEDDDD>    NM:i:2  AS:i:93 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488       145     3       38763855        60      100M    =       38763808        -147    GGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGTGGGGAGAGGTGACTGATGGTGGGTGATGGCCAGTGGGCAAAGGGGAT       DDCDDDB?DCCCDECDDCDDDCDDEEDEFFFFFFHHHJJIJJJIJIIJIJJJIJJIJJJJJJJJIJJJJJJJJJJIJJJJJJJJJJJHHHHHFFFFFCCC    NM:i:1  AS:i:95 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375       97      7       35293037        60      100M    =       35293129        192     CAGCNAGGGGCACAGACGGATGCGCAGCATCCCCAGTCCTCGGCGGACAGCCGGGTAGCCCAACTTACCCAGGGGTTTGATTGTGTTCTCCGTCGCCTCC       CCCF#2ADHHHHHJIIJJJJIJJJJJJJJJIJJJJJIJJJJJJJJDDDDDDDDDDBBDDDDDDDDDDDDDDDDDDDBBBDDDDDDDDCEDCB?ABDBDD1    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375       145     7       35293129        60      100M    =       35293037        -192    TCGCCTCCTTCTCCTTAGAGCCGCCGCTCGACATGAGCGCGGCAATGGAGAAGGCGTTGGCCCGGGAGGAGAGTTGGGGCTTGGGGGACGCCGTGAACTC       DDBBBDDCA8DDDCC@DDDBDDDDDDDDDDEDDDDDDDDDDDDEDDDDCCDDDDFFFHHJJJJJJJJJHJJJJJJJJJJJJJJJJJJHHHHHFFFFDCBB    NM:i:1  AS:i:95 XS:i:20
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300       97      2       40401764        60      100M    =       40401971        307     CAAGNTACATAAGATGTAGGTTTGGATTGATGGTTAAGGGTATTTGGGGAAAAATAAGGAACATTAAAAAAATAAGTCTTACCAAACAGGTATTTTCCTT       CCCF#4=DHHHHHIJJHIJJHIJJJHIJJIIJJEGHJJJJDGIJJJJJJGHHIJJIIJJJIIIIJIJJHHFDEDECDDEEDDDDDDDDDDCCDEEEDDCD    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300       145     2       40401971        60      100M    =       40401764        -307    TTGTGAAGCCACCTAAAAAAGAAAAAAACAACAACAAATGTTATAATTTGACACTCTACATAACAAATACCAGTGACATCAGACTGCCTGACAACCCACC       @CC@DDDDDDDDDDDDDDDDDDFHHHHEIIHIIIJJJIJJJJJJJJJIHDIJJJJJIIJJJJIJJJJHFJJJJJJIJJJJJJJJJJJHHHHHFFFFDBCB    NM:i:0  AS:i:100  XS:i:0   

And the ouput of the Native C version:

bwa mem human_g1k_v37.fasta tmp1.fq tmp2.fq 2> /dev/null | grep -v -E '^@'

HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192       121     1       229568362       37      13S87M  =       229568362       0       GCTCTTCCGATCTGGCACGTTGAAGGTCTCAAACATGATCTGGGTCATCTTCTCGCGGTTGGCCTTGGGATTGAGGGGGGCCTCGGTGAGCAGGGNGGGG       AB?DDDDDDDBDCDDDDDDDDDDCDDDDCCC>(DCDDDDDDBDDDCCCCBDDDFFEEJIHIJIIHJIJJJJJJIJJJJJJJJJJJJJHHHHHDA2#FCCC    NM:i:1  AS:i:85 XS:i:61
HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192       181     1       229568362       0       *       =       229568362       0       GCTCTTCCGATCTCCCCACCCTGCTCACCGAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAGNNNNNNNNNNNNNNNNNNAACGTGCC       ?DDDDDDDDDDDDDDB?9BDDDDDDDBBB?8,,######################################?12##################FFFFFCCC    AS:i:0  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423        69      X       16753128        0       *       =       16753128        0       AGATNGGAAGAGCACACGTCTGAACTCCAGTCACCAAGGAGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACAAATACGGATGAGACATG       CCCF#2ADHHHHHJJJJJJJJJJJJJJ>9:1*1C3C8D600)0*0*/00-.8B)--5B().).=).?CFFFBBBDB########################    AS:i:0  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:1424:2423        137     X       16753128        0       58S34M8S        =       16753128        0       AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAACAAAAAAAGAGATGAACAAGCAAA       CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJHIHIIJJJJJJJHJJIIJJJHFFFFEEEEEEEDDDD##################################    NM:i:0  AS:i:34    XS:i:29
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463        97      12      110765491       60      70M30S  =       110765491       70      AATTNGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCAAGATCGGAAGAGCACACGTCTGAACTCCAG       CCCF#4BDHGHHHJJJJJJJJJIJHIJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJIIIIHIJJJJIIIJIJJGHEHFFFEDDEEAA@BDDDCDDDD:C@    NM:i:1  AS:i:68 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:2908:2463        145     12      110765491       60      30S70M  =       110765491       -70     CTCTTTCCCTACACGACGCTCTTCCGATCTAATTTGGGGAACAGCTTTCCAAAGTCATCTCCCTTATTTGCATTGCAGTCTGGATCATAAATATTGGGCA       DDDDDDDDDCAB=DDBDEEFFFFHHHJJJGHHGGFJJJJJIIIIJJJJJIJJJJJIJIIIJJJJJJJJJJJJIJJHHHFHEEJJIJJHHHHHFFFFFCBC    NM:i:0  AS:i:70 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297        81      4       114279632       60      100M    =       114279455       -277    GATTCCTACTGCACCCATGGAGAATGTGCCTTTTACTGAAAGCAAATCCAAAATTCCTGTAAGGACTATGCCCACTTCCACCCCAGCACCTCCATNTGCA       DCDDDDCACCDBCBCDDDCDDCCA?EEDDDFFDFFFHHHGHHHJJJJJJJJIJJIJIJJIJIJJJJJJJJJJIGJJIIHFIJJJJHGDHHDHDA2#FCCB    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:4663:2297        161     4       114279455       60      100M    =       114279632       277     CGTGCAAACGGGTGATATACCTCCTCTCTCTGGTGTAAAGCAGATATCCTGCCCCGACTCTTCTGAACCAGCTGTACAAGTCCAGTTAGATTTTTCCACA       CCBFFFFFHHHHFHIJJJJJIIJJJJJJJJJJJHIGIJIIJJJJJJJJJJHIJJJJJJJHHHHHHFDDDFDDEEEDDDADCCDDDCCDCCDEDDDCACCC    NM:i:0  AS:i:100  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320        81      2       179597667       60      100M    =       179597628       -139    GGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCCTTAAACCAGGACACCCTCATGGGGAGGGAGCCTGNAATT       ABDDDDDBDDDDDDEDDEDDDEECEEFFFFFFHGHHHHJJIJJJJJIIJJIJJJJJJJJJIIJIHGJJJJJHHEJJIHJJJJJJJJJHHHHHDA2#FCCC    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:6872:2320        161     2       179597628       60      100M    =       179597667       139     CCCTGCATCATTCATGTCTACTCTGATGATCTCCAAAGAGGCTGTGCCTTCCACAAATGCTATCCTGTATCTGTCAGAAGCAGCTATTTCTTTGCCATCC       CCCFFFFFHHHHHJJJJJJJJJJJJIJJJJJJJJJJJJJJIIJJIIHJJJJIJJGIIIJJJIIJIIIHGIJJJJJIIEHHHHHHFBFFDEFECDECCDDA    NM:i:0  AS:i:100  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408        97      2       220283746       60      100M    =       220283863       217     CAGCNGCTCAAGGCCAAGTGAGGGCCCGGCACCCCAGACTCCTCTTTCTGCGGGCAGGGCACAGGAGGCTAGGCCTGGGGGCTGGGGTCCCGCTGTCAGC       CCCF#2ADHHHHHFIJIIHIGIJJJJJJJJIIJJJJIJJJJJJJIIIJJIGFFFDDDDDDDBDDD?BDBDCBBDDCDDDDDBDDDBB>BBDDDDB@CDCD    NM:i:2  AS:i:93 XS:i:23
HWI-1KL149:20:C1CU7ACXX:4:1101:9215:2408        145     2       220283863       60      100M    =       220283746       -217    GCCCGGGACCCTCTCCTGCCCCATGTGGAGAAAGGGTCCTCCACCTGTGTGTTTCAAGGGGCCGTGACCTCCAGGTCTCTCCCCCTGCGATCCCATCTTG       BDDBDBC?DDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDCADDDDDBEEEEEFFFFHHIJJJIHGJJJIJJJJJIIIIJIJJJJJHHHGHFFFFFCCC    NM:i:0  AS:i:100  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325        97      22      46114322        60      100M    =       46114410        188     AAAGNCCGGAATTGGTACAAGCCATGTTTCCCAAACTGAACAATCAAGAAAGGTAACCCCCCAACCAGCGTGGTCTGGAGTATTTAGCATTCCATATAGG       CCCF#2ADHHHHHJJGHIJJJJJJJJIGJJJJJJJJJJJJJJJJJJJJGHIJJHIJJIIJJHFFFFDDCD?BDDDCCDCD>ACDEEDDDEDDEDCCCCCD    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:9815:2325        145     22      46114410        60      100M    =       46114322        -188    ATTCCATATAGGGTATTCGATGCACGTGACTGAAAAGCTGTGTGGTTTCTGAGTTGGCACAGAATCTCTAAATACATGTTTCTGTGTTGGTAATGGTTTT       DDCDEDCCDDDDCDDEEDEFFFFFHHHHIJJJJJJJIJJJJIIJJJIIGGJJJJJIJJJJJJJJIIHJJJJJIIJJJJJJJIIJIJIHFHHHFFFFFCCC    NM:i:0  AS:i:100  XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488       97      3       38763808        60      100M    =       38763855        147     CCACNATACGGTAGCAAGTCTTGCGCACCTGCCAGCCCACATCCCATGGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGT       CCCF#4ADHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJEIJJIJJJHHHFFFFFFFEEEEEEEDABBDDDBBCCDBD>BDDDDEDDDD>    NM:i:2  AS:i:93 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11401:2488       145     3       38763855        60      100M    =       38763808        -147    GGACTCTTCGTGGTATCCAGTTTGCAGCAGGGACAGTGGCGAATGCATCCTGTGGGGAGAGGTGACTGATGGTGGGTGATGGCCAGTGGGCAAAGGGGAT       DDCDDDB?DCCCDECDDCDDDCDDEEDEFFFFFFHHHJJIJJJIJIIJIJJJIJJIJJJJJJJJIJJJJJJJJJJIJJJJJJJJJJJHHHHHFFFFFCCC    NM:i:1  AS:i:95 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375       97      7       35293037        60      100M    =       35293129        192     CAGCNAGGGGCACAGACGGATGCGCAGCATCCCCAGTCCTCGGCGGACAGCCGGGTAGCCCAACTTACCCAGGGGTTTGATTGTGTTCTCCGTCGCCTCC       CCCF#2ADHHHHHJIIJJJJIJJJJJJJJJIJJJJJIJJJJJJJJDDDDDDDDDDBBDDDDDDDDDDDDDDDDDDDBBBDDDDDDDDCEDCB?ABDBDD1    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:11658:2375       145     7       35293129        60      100M    =       35293037        -192    TCGCCTCCTTCTCCTTAGAGCCGCCGCTCGACATGAGCGCGGCAATGGAGAAGGCGTTGGCCCGGGAGGAGAGTTGGGGCTTGGGGGACGCCGTGAACTC       DDBBBDDCA8DDDCC@DDDBDDDDDDDDDDEDDDDDDDDDDDDEDDDDCCDDDDFFFHHJJJJJJJJJHJJJJJJJJJJJJJJJJJJHHHHHFFFFDCBB    NM:i:1  AS:i:95 XS:i:20
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300       97      2       40401764        60      100M    =       40401971        307     CAAGNTACATAAGATGTAGGTTTGGATTGATGGTTAAGGGTATTTGGGGAAAAATAAGGAACATTAAAAAAATAAGTCTTACCAAACAGGTATTTTCCTT       CCCF#4=DHHHHHIJJHIJJHIJJJHIJJIIJJEGHJJJJDGIJJJJJJGHHIJJIIJJJIIIIJIJJHHFDEDECDDEEDDDDDDDDDDCCDEEEDDCD    NM:i:1  AS:i:98 XS:i:0
HWI-1KL149:20:C1CU7ACXX:4:1101:12054:2300       145     2       40401971        60      100M    =       40401764        -307    TTGTGAAGCCACCTAAAAAAGAAAAAAACAACAACAAATGTTATAATTTGACACTCTACATAACAAATACCAGTGACATCAGACTGCCTGACAACCCACC       @CC@DDDDDDDDDDDDDDDDDDFHHHHEIIHIIIJJJIJJJJJJJJJIHDIJJJJJIIJJJJIJJJJHFJJJJJJIJJJJJJJJJJJHHHHHFFFFDBCB    NM:i:0  AS:i:100  XS:i:0

Example (One FASTQ)

(compare to https://github.com/lh3/bwa/blob/master/example.c )

System.loadLibrary("bwajni");
BwaIndex index=new BwaIndex(new File("hg19.fa"));
BwaMem mem=new BwaMem(index);
KSeq kseq=new KSeq(new File("input.fastq.gz");
ShortRead read=null;
while((read=kseq.next())!=null)
        {
        for(AlnRgn a: mem.align(read))
                {
                if(a.getSecondary()>=0) continue;
                System.out.println(  read.getName()+"\t"+  a.getStrand()+"\t"+  a.getChrom()+"\t"+
                        a.getPos()+"\t"+ a.getMQual()+"\t"+ a.getCigar()+"\t"+  a.getNm() );
                }
        }
kseq.dispose();
index.close();
mem.dispose();

Testing

Here is the ouput of the JAVA version:

gunzip -c input.fastq.gz | head -n 4000 |\
java  -Djava.library.path=src/main/native -cp src/main/java \
   com.github.lindenb.jbwa.jni.Example human_g1k_v37.fasta -| tail 


HWI-1KL149:20:C1CU7ACXX:4:1101:3077:33410       +       3       38647538        60      89M11S  1
HWI-1KL149:20:C1CU7ACXX:4:1101:3396:33445       +       8       52567289        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:10013:33288      -       1       156104115       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:10390:33496      -       6       123824853       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:13537:33483      +       2       157367092       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:14139:33390      +       20      31413797        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:14514:33458      +       2       179401813       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:15292:33282      +       15      63335820        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:16960:33276      -       12      110782784       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:17355:33322      +       6       126077895       60      100M    1

And the ouput of the Native C version:

gunzip -c input.fastq.gz | head -n 4000 |\
bwa-0.7.4/bwamem-lite human_g1k_v37.fasta - | tail 

HWI-1KL149:20:C1CU7ACXX:4:1101:3077:33410       +       3       38647538        60      89M11S  1
HWI-1KL149:20:C1CU7ACXX:4:1101:3396:33445       +       8       52567289        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:10013:33288      -       1       156104115       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:10390:33496      -       6       123824853       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:13537:33483      +       2       157367092       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:14139:33390      +       20      31413797        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:14514:33458      +       2       179401813       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:15292:33282      +       15      63335820        60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:16960:33276      -       12      110782784       60      100M    1
HWI-1KL149:20:C1CU7ACXX:4:1101:17355:33322      +       6       126077895       60      100M    1

GUI

As a test I also created a swing-Based interface for BWA:

java  -Djava.library.path=src/main/native  -cp src/main/java \
	com.github.lindenb.jbwa.jni.BwaFrame human_g1k_v37.fasta

ScreenShot

WEB-SERVICE

As an example I've implemented a WebService for BWA.

Server

The server is launched with make 'test.ws.server'

java   -Djava.library.path=src/main/native -cp src/main/java com.github.lindenb.jbwa.ws.server.BWAServiceImpl \
   -R human_g1k_v37.fasta -p 8081
Apr 26, 2013 9:54:34 PM com.github.lindenb.jbwa.ws.server.BWAServiceImpl main
INFO: Loading index for /commun/data/pubdb/broadinstitute.org/bundle/1.5/b37/human_g1k_v37.fasta
Apr 26, 2013 9:54:48 PM com.github.lindenb.jbwa.ws.server.BWAServiceImpl main
INFO: Service is published: http://localhost:8081/

once published, the server provides a WSDL -bases service description

<?xml version="1.0" encoding="UTF-8"?>
<!-- Published by JAX-WS RI at http://jax-ws.dev.java.net. RI's version is JAX-WS RI 2.2.4-b01. -->
<!-- Generated by JAX-WS RI at http://jax-ws.dev.java.net. RI's version is JAX-WS RI 2.2.4-b01. -->
<definitions xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsp="http://www.w3.org/ns/ws-policy" xmlns:
wsp1_2="http://schemas.xmlsoap.org/ws/2004/09/policy" xmlns:wsam="http://www.w3.org/2007/05/addressing/metadata" xmlns:soap="http://schemas.xmlsoap.org/wsdl/
soap/" xmlns:tns="http://server.ws.jbwa.lindenb.github.com/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.xmlsoap.org/wsdl/" targetName
space="http://server.ws.jbwa.lindenb.github.com/" name="BWAServiceImplService">
  <types>
    <xsd:schema>
      <xsd:import namespace="http://server.ws.jbwa.lindenb.github.com/" schemaLocation="http://localhost:8081/?xsd=1"/>
    </xsd:schema>
  </types>
  <message name="align">
    <part name="parameters" element="tns:align"/>
  </message>
  <message name="alignResponse">
    <part name="parameters" element="tns:alignResponse"/>
  </message>
  <message name="Exception">
    <part name="fault" element="tns:Exception"/>
  </message>
  <message name="getReferenceName">
    <part name="parameters" element="tns:getReferenceName"/>
  </message>
  <message name="getReferenceNameResponse">
    <part name="parameters" element="tns:getReferenceNameResponse"/>
  </message>
  <portType name="BWAService">
    <operation name="align">
      <input wsam:Action="http://server.ws.jbwa.lindenb.github.com/BWAService/alignRequest" message="tns:align"/>
      <output wsam:Action="http://server.ws.jbwa.lindenb.github.com/BWAService/alignResponse" message="tns:alignResponse"/>
      <fault message="tns:Exception" name="Exception" wsam:Action="http://server.ws.jbwa.lindenb.github.com/BWAService/align/Fault/Exception"/>
    </operation>
    <operation name="getReferenceName">
      <input wsam:Action="http://server.ws.jbwa.lindenb.github.com/BWAService/getReferenceNameRequest" message="tns:getReferenceName"/>
      <output wsam:Action="http://server.ws.jbwa.lindenb.github.com/BWAService/getReferenceNameResponse" message="tns:getReferenceNameResponse"/>
    </operation>
  </portType>
  <binding name="BWAServiceImplPortBinding" type="tns:BWAService">
(...)
  </binding>
  <service name="BWAServiceImplService">
    <port name="BWAServiceImplPort" binding="tns:BWAServiceImplPortBinding">
      <soap:address location="http://localhost:8081/"/>
    </port>
  </service>
</definitions>

The Client

when the server is up and running, A client is generated for this service:

wsimport -keep -d tmp -p com.github.lindenb.jbwa.ws.client "http://localhost:8081/?wsdl"

The Makefile contains a target named 'test.ws.client' that reads a FASTQ, invoke the web-service and dump the result as XML:

gunzip -c test.fastq.gz |\
 java  -cp tmp  com.github.lindenb.jbwa.ws.client.BWAServiceClient 

Output:

<?xml version="1.0" encoding="UTF-8"?>
<bwa-service reference="human_g1k_v37.fasta">
  <Alignment xmlns="" xmlns:ns2="http://server.ws.jbwa.lindenb.github.com/">
    <chrom>1</chrom>
    <cigar>13S87M</cigar>
    <MQual>37</MQual>
    <nm>1</nm>
    <position>0</position>
    <readBases>CCCCNCCCTGCTCACCGAGGCCCCCCTCAATCCCAAGGCCAACCGCGAGAAGATGACCCAGATCATGTTTGAGACCTTCAACGTGCCAGATCGGAAGAGC</readBases>
    <readName>HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192 1:N:0:CAAGGAGC</readName>
    <secondary>-1</secondary>
    <strand>45</strand>
  </Alignment>
  <Alignment xmlns="" xmlns:ns2="http://server.ws.jbwa.lindenb.github.com/">
    <chrom>2</chrom>
    <cigar>82M18S</cigar>
    <MQual>0</MQual>
    <nm>5</nm>
    <position>0</position>
    <readBases>CCCCNCCCTGCTCACCGAGGCCCCCCTCAATCCCAAGGCCAACCGCGAGAAGATGACCCAGATCATGTTTGAGACCTTCAACGTGCCAGATCGGAAGAGC</readBases>
    <readName>HWI-1KL149:20:C1CU7ACXX:4:1101:13638:2192 1:N:0:CAAGGAGC</readName>
    <secondary>0</secondary>
    <strand>43</strand>
  </Alignment>
(...)

jbwa's People

Contributors

akiezun avatar lbergelson avatar lindenb avatar seppinho avatar shuang-broad avatar tomwhite avatar yhcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jbwa's Issues

jbwa and missing alignment score

bwa mem filters alignments with an alignment score < 30 by default (AS Tag).
When calling BwaMem.align the AS tag is not returned, results can therefore differ from bwa.

Solution: (a) return ar.a[i].score (or ar.a[i].truesc) parameter in BwaMem_align and (b) filter these alignments on java side. (see mem_alnreg_t in bwamem.h for details)

Fyi, I see this behavior especially for circular mtDNA reads and supplemental alignments (we use it for https://mtdna-server.uibk.ac.at). I can provide an example if necessary. @lindenb @SHuang-Broad

Typo in Makefile

Hi Pierre

On line 16, it should read

FASTQ2=test/R2.fq

, i.e. 2 instead of 1.

Publishing to maven central

Since we're using jbwa as a dependency for gatk4, we'd like to make jbwa available through maven central at some point in the not to distant future. We currently published a version on our own artifactory repository to enable us to use it during development, but we can't rely on that when we make releases to maven central.

Would you be willing to publish it to central? We could offer help but there would be some work you'd have to do to get accounts set up.

Alternatively, we could publish it ourselves under whatever name you prefer (com.github.lindenb:jbwa being one possible option), or we could fork the repo and publish our fork under our own name. (org.broadinstitute:jbwa)

Would any of these options work for you?

can we pass bwa index pointer directly?

I see that the index is passed to the JNI laver via the BWAIndex object. Really though I think only the pointer is actually used. Can it be passed directly rather then passing a Java object that needs to be unwrapped in the C layer? I mean changing this API

    private native AlnRgn[] align(BwaIndex bwaIndex,byte bases[])  throws IOException;
    private native String[] align2(BwaIndex bwaIndex,final ShortRead ks1[],final ShortRead ks2[])  throws IOException;

to

    private native AlnRgn[] align(long bwaIndexPtr,byte bases[])  throws IOException;
    private native String[] align2(long bwaIndexPtr,final ShortRead ks1[],final ShortRead ks2[])  throws IOException;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.