10xgenomics / lariat Goto Github PK
View Code? Open in Web Editor NEWLinked-Read Alignment Tool
Home Page: https://support.10xgenomics.com/genome-exome/software/pipelines/latest/algorithms/overview
License: MIT License
Linked-Read Alignment Tool
Home Page: https://support.10xgenomics.com/genome-exome/software/pipelines/latest/algorithms/overview
License: MIT License
The submodule link for this repo is not publically accessible, is this intentional?
https://github.com/10XDev/biogo.bam
I'm not sure how anyone else is posting issues related to this project given it is currently impossible to build from source. Can this be resolved please?
For future reference, the default Lariat alignment scoring parameters are
AS:f = -2 * mismatches - 3 * indels - 5 * clipped - 0.5 * clipped_bases - 4 * improper_pair
The default BWA scoring parameters are
AS:i = A * matches - B * mismatches - O * opens - E * extends - L * clipped - U * improper_pair
A = 1
B = 4
O = 6
E = 1
L = 5
U = 17
if i == 0 || (i > 0 && position_list[i].pos-position_list[i-1].pos > 50000) {
lariat/go/src/inference/lariat.go
Line 1367 in fca4756
Thanks!
There is no version info displayed in lariat output.
Please add version info to lariat output , or add "-version" param.
Thanks!
I believe there should be updates to the README:
Please contact us if you're interested in using Lariat independently of the Long Ranger pipeline.
The easiest way to do this is as follows:
Starting from Long Ranger 2.2, users could use "align" pipeline to run lariat (without the rest of the variant calling, phasing and sv calling steps)
https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines
Steps to reproduce
git clone --recursive https://github.com/10XGenomics/lariat.git
make -C lariat/go
Log
make -C src/gobwa/bwa libbwa.a
make[1]: Entering directory '/home/sjackman/src/lariat/go/src/gobwa/bwa'
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS utils.c -o utils.o
…
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS malloc_wrap.c -o malloc_wrap.o
ar -csru libbwa.a utils.o kthread.o kstring.o ksw.o bwt.o bntseq.o bwa.o bwamem.o bwamem_pair.o bwamem_extra.o malloc_wrap.o
make[1]: Leaving directory '/home/sjackman/src/lariat/go/src/gobwa/bwa'
go install -ldflags "-X inference.__VERSION__ '29a7f74'" lariat
src/inference/bamwriter.go:8:2: no buildable Go source files in /home/sjackman/src/lariat/go/src/gobwa
make: *** [Makefile:9: lariat] Error 1
In the .fasth
files being input to lariat, I noticed that there are 3 variations on the 10X barcode field
The readme says:
The 10X barcode string is of the form
ACGTACGTACGTAC-1
I found that barcodes not in the whitelist appear as AGCTAGCTAGCTAGCT
Barcodes that are in the whitelist and found in the fastq seem to be marked as
AGCTAGCTAGCTAGCT-1,AGCTAGCTAGCTAGCT
Barcodes that mismatch by one base at the beginning of the barcode seem to be considered still as a match and seem to be specified by:
AGCTAGCTAGCTAGCT-1,GGCTAGCTAGCTAGCT
where the barcode at the beginning is the actual barcode it maps to and the one after the -1,
is the barcode read from the fastq that mismatches by one base only at the beginning.
Can someone please confirm this specification?
Thanks!
Hello,
I am wondering if it is normal that longranger wgs (in the lariat step) is taking a long time to complete: it is now going on since two weeks.
This is the command:
longranger wgs --id=lngrgrwgs_to_NRG --fastqs=../input_fastqs --reference=refdata-genome_for_longranger --vcmode=freebayes --somatic --localcores=8 --localmem=70 --sex=f
The input files are 150 GB (two lanes of 33-37 GB each with 2.5 GB I1.fq.gz file), the reference is 4.5 Gb and has been formatted (number of sequences, max sequence length) to fit longranger mkref
requirements. The output is 1.4 TB now, the stdout shows
2018-03-06 00:03:45 [runtime] (run:local) ID.lngrgrwgs_to_NRG.PHASER_SVCALLER_CS.PHASER_SVCALLER._LINKED_READS_ALIGNER.BARCODE_AWARE_ALIGNER.fork0.chnk127.main
only the _log file and the journal folder (which is empty) are being updated. Htop shows that there are 8 jobs running. The _log file shows
2018-03-19 05:17:48 [jobmngr] Attempted to reserve 4 threads, but only 0 were available.
Is this a normal computation time for this process?
Thanks,
Dario
I have noticed that in both the example WGS dataset NA12878 and our own WGS data, Longranger does not detect any split/chimeric reads. In my experience, this has caused problems in SV detection on getting the exact break points right. It seems that split read information is supposed to be incorporated into SV calls, since there are fields in both Loupe and the large_sv_calls.bedpe file to report them.
Since Longranger/Lariat uses BWA for alignment, and BWA-MEM and BWA-SW are both capable aligning split reads, my question is, are the current parameters that are being used taking advantage of this function?
$ git clone --recursive https://github.com/10XGenomics/lariat.git
Cloning into 'lariat'...
remote: Counting objects: 59, done.
remote: Total 59 (delta 0), reused 0 (delta 0), pack-reused 59
Unpacking objects: 100% (59/59), done.
Checking connectivity... done.
No submodule mapping found in .gitmodules for path 'go/src/code.google.com/p/biogo.bam'
You can't make this project licenced under MIT if it links to BWA at the object level. BWA is GPLv3.
Please fix the licence text to correspond to the README that states that this project is licensed under GPLv3. The license text for GPLv3 can be found here: http://www.gnu.org/licenses/gpl-3.0.txt
I've seen this twice now where Lariat aligned BAMs contain alignments where the SA tag reports a position of -1. (For an example, see hall-lab/extract_sv_reads#8). My expectation was that the position in the SA tag is 1-based, although the specification is not explicit on this point. For BWA-MEM aligned BAMs, I've never encountered a negative position in this tag.
Since it's unclear to me, I thought I'd open an issue to ask what a position of -1 means in the SA tags for Lariat aligned BAMs?
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.