eyzhao / hrdetect-pipeline Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Hello:
Very useful algorithm. Do you know why insertion were not included in the SV signature analysis? There are only four type SVs including Deletion, Insertion, Dup, inter-chromosomal translocation. Any reason to exclude the Insertion? Also, did intra-chromosomal translocation should included as "Inversion" or "TRA"? Thanks.
commit 7088953
Merge: 100c662 2530cc2
Author: Eric Zhao [email protected]
Date: Sat Jan 20 09:27:43 2018 -0800
running the R script files with SignIT and hrdtools installed.
hrdtools c91fea411c
SignIT d178f93f
Rscript ../get_mutation_catalog.R -v final.vcf -c mutation_catalog.tsv
Error in .local(file, genome, param, ...) : 'genome' argument is missing
Calls: readVcf -> readVcf -> .local
Execution halted
readVcf has missing genome option that has no default. added
vcf <- readVcf(args[['v']],args[['r']] )
Moreover, unlike mentioned in the script, there's no
no such thing as a hg19 default for the -r option, it has to be specified.
Rscript ../get_mutation_catalog.R -v final.vcf -c mutation_catalog.tsv -r hg19
########
Rscript ../signit_exposures.R -c mutation_catalog.tsv -o signit_output.Rds
Error in get_exposures(file = args[["catalog"]], reference_signatures = reference, :
unused argument (file = args[["catalog"]])
Execution halted
get_exposures has no argument "file ="
Error in get_exposures(args[["catalog"]], reference_signatures = reference, :
Mutation catalog is not properly formatted.
Must have two columns: mutation_type (character) and count (integer)
Execution halted
the problem is that get_exposures expects a data frame, not a file name
Must read.table with as.is=T, because first column has to be character and not factor
##########
Rscript ../signit_summary_table.R -i signit_output.Rds -o signit_exposure_summary.tsv
Error: could not find function "%>%"
Execution halted
caused by
if (is.null(args[['signit']])) {
library(signit)
} else {
library(devtools)
library(tidyverse)
library(rjags)
library(nnls)
library(dbscan)
library(Rtsne)
load_all(args[['signit']])
}
not all of the libraries are loaded with library(signit). It appears only tidyverse is really needed.
Rscript ../flank_indels.R -v final.vcf -o indels_annotated.tsv
Error in .local(file, genome, param, ...) : 'genome' argument is missing
Calls: %>% ... get_flanking_sequence -> parse_indel -> readVcf -> readVcf -> .local
Execution halted
same thing, missing "hg19" in readVcf
Error in [[<-
(*tmp*
, name, value = c("T", "C", "A", "T", "A", "G", :
15620 elements in value to replace 15613 elements
Calls: %>% ... get_flanking_sequence -> parse_indel ->
Execution halted
this is caused by entries such as "A,C" in the ALT column of vcfs (more than 1 somatic allele). Don't know if !isSNV from vcf_to_snv_indel.R is able to correctly leave it out.
Rscript ../run_test.R -l segments.tsv -o hrd_score.tsv -r hg19
Error in function_list[k] : could not find function "GRanges"
Calls: %>% ... eval -> _fseq -> freduce -> withVisible ->
Execution halted
missing library(GenomicRanges)
error message:
Miniconda3-latest-Linux-x86_64.sh: line 370: /Users/chenj9/Desktop/Work/bin/hrdetect-pipeline-master/dependencies/miniconda3/pkgs/python-3.7.0-hc3d631a_0/bin/python: cannot execute binary file
I've been testing the HRDetect pipeline as well as the scarHRD tool. From what i can tell, both use the same definition of genomic scars when calculating the HRDscore (Ref: https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058-014-0475-x) . However, as you can see in the picture below, the range of the HRDscores reported by these two tools is very different.
Does HRDetect calculate the HRDscore simply as the unweightedsum of TAI, LST and LOH ? Alternatively, is there any reason that would explain these differences ?
Best,
-Mathias
Lines 45-48 of calculate_sv_catalog.R implies that the first interSV distance is calculated from the start of the chromosome. The first distance should rather be 0, i.e. c(0, pos[1:n()-1]) should be c(pos[1], pos[1:n()-1]) in the following:
mutate(
prev_pos = c(0, pos[1:n()-1]),
distance = pos - prev_pos
)
About 10% of our SVs change classification (clustered vs non-clustered) with the change.
From Snakefile:
'Rscript ' + PROJECT_DIR + '/scripts/basic-functions/row_bind_tables.R -i {input} -o {output}'
The row_bind_tables.R is missing from the distribution.
In hrdscore/run_test.R, you are missing the usual instruction gsub("chr","",...) found elsewhere in your script files. As a result, if the "chr" strings are found in file segments.tsv, the TAI score won't be computed, as there will be no overlap with subtelomere regions.
From your hrdtools package:
subtelomeres <- GRanges(ddply(get_subtelomere_regions(), 'chromosome', function(z) {
return(data.frame(start = c(0, z$end), end = c(z$start, .Machine$integer.max)))
}))
with the default function get_subtelomere_regions <- function(chr_label = F) {...}
results in:
GRanges object with 52 ranges and 0 metadata columns:
seqnames ranges strand
[1] 1 [ 0, 10000] *
[2] 1 [249223345, 2147483647] *
[3] 10 [ 0, 82827] *
[4] 10 [135508492, 2147483647] *
[5] 11 [ 0, 116986] *
... ... ... ...
[48] 9 [141144172, 2147483647] *
[49] X [ 0, 182990] *
[50] X [155250482, 2147483647] *
[51] Y [ 0, 132990] *
[52] Y [ 59353488, 2147483647] *
This will not overlap segments with the "chr" string. There is also the X/23 and Y/24 issue.
Also could you provide the source or a reference for your subtelomere Rdata file?
In the get_mutation_catalog.R
script muts.to.sigs.input will not work because the library is not called.
Missing library(deconstructSigs)
Take this segments.tsv for illustration, where both segments are > 10Mbp and display different copy number counts but same lohtype:
chr start end copy_number lohtype
1 752001 16851000 3 ASCNA
1 16858001 27015000 5 ASCNA
Run these instructions, adapted from hrdscore/run_test.R:
segment.file<-"segments.tsv"
loh_ranges <- read_tsv(segment.file) %>% as.data.frame %>% GRanges
run_test(
loh_ranges,
loh.col = 'lohtype',
cnv.col = 'copy_number',
genome = "hg19"
) %>%
as_tibble()
The above returns:
[1] "Importing data..."
[1] 2
[1] 2
[1] "LOH Score: 0"
[1] "TAI Score: 1"
[1] "LST Score: 0"
[1] "Total HRD Score: 1"
loh tai lst total
1 0 1 0 1
However LST should be 1. This is because lst_test_bool from hrdtools only looks for differences in lohtype, irrespective of copy-number differences. This is a problem when consecutive segments are ASCNA (or consecutive ALOH, DLOH, etc).
lst_test_bool <- function(gr, loh.col) {
df <- as.data.frame(gr)
next_is_long <- c(is_long[2:length(is_long)], FALSE)
same_lohtype_as_next <- c(df[1:(dim(df)[1] - 1), loh.col] == df[2:dim(df)[1], loh.col], FALSE)
same_chr_as_next <- as.logical(c(df[1:(dim(df)[1] - 1), 'seqnames'] == df[2:dim(df)[1], 'seqnames'], TRUE))
return(is_long & next_is_long & !same_lohtype_as_next & same_chr_as_next)
}
Thanks for your works .
when I read the file hrdetect-pipeline\Snakefile , I fond that the most important scripts /scripts/hrdetect/hrdetect_input_table.R and /scripts/hrdetect/compute_hrdetect.R were missing.
Could you share these ?
Hi,
Thanks a lot for this contribution, the workflow is bound to be very useful in many settings. However, I believe it would be valuable to also provide a bundle/zip with example input files to see how the input looks for a successful run, and what it produces. I could not see this info anywhere?
An additional question: Are all the input files (SNVs, deletions, CNAs with LOH info, SVs) strictly necessary to make a prediction/detection, or is the implementation able to assign a prediction based on a subset of the input data (e.g. SNVs, InDels, and CNAs without LOH info)?
regards,
Sigve
Hello,
Thanks for getting this package together. I have been trying to install the miniconda version as well as each of the R libraries by itself but I am having no luck with either.
You are already aware of the SignIT issue here:
eyzhao/SignIT#3
I am anticipating that installing all the required R libraries will be enough to run the necessary scripts for HRDetect, but I can't get SignIT to install.
library(tidyverse)
library(devtools)
library(rstan)
library(signit) ***
library(argparse)
library(BSgenome)
library(BSgenome.Hsapiens.NCBI.GRCh38)
library(copynumber)
library(dbscan)
library(docopt)
library(doParallel)
library(futile.logger)
library(GenomicRanges)
library(nnls)
library(optparse)
library(plyr)
library(readr)
library(rjags)
library(RSvgDevice)
library(Rtsne)
library(snow)
library(stringr)
library(VariantAnnotation)
library(hrdtools)
So I tried with the miniconda pipeline, and had no luck there either.
make dependencies
, the installation will not progress even after 24hrs unless I change the order of the dependencies/environment.yaml
file to be in the following order:- r
- conda-forge
- bioconda
- main
dependencies/Makefile
has a svn checkout section that doesn't work:packages/hrdtools/DESCRIPTION:
mkdir -p packages/hrdtools && \
svn checkout https://svn01.bcgsc.ca/svn/personal/ezhao/projects/signatures/trunk/hrdtools/ packages/hrdtools
instead, I replaced this with the following to make it work:
packages/hrdtools/DESCRIPTION:
mkdir -p packages/hrdtools && \
git clone https://github.com/eyzhao/hrdtools.git packages/hrdtools
Now even after this, I get a final error during the pipeline installation:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > conda activate dependencies
#
# To deactivate an active environment, use:
# > conda deactivate
#
echo "export R_LIBS=/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/library && export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib" > miniconda3/envs/dependencies/etc/conda/activate.d/env_vars.sh
mkdir -p packages/hrdtools && \
git clone https://github.com/eyzhao/hrdtools.git packages/hrdtools
Cloning into 'packages/hrdtools'...
remote: Enumerating objects: 97, done.
remote: Total 97 (delta 0), reused 0 (delta 0), pack-reused 97
Unpacking objects: 100% (97/97), done.
source /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/bin/activate dependencies && \
echo 'Installing R packages into this R install:' `which R` && \
export R_LIBS=/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/library && \
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib
Installing R packages into this R install: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/bin/R
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/bin/Rscript r_dependencies.R && \
source /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/bin/deactivate && \
echo 'installed packages' > r_packages
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/libR.so)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/./libRblas.so)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicuuc.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicuuc.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicui18n.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicui18n.so.58)
make: *** [r_packages] Error 1
Another thing I noticed is that dependencies/Makefile
has a hardcoded path to JAGS
$ head dependencies/Makefile
jags_lib := /gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib
makefile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
makefile_dir := $(patsubst %/,%, $(dir $(makefile_path)))
Thanks in advance for your insights.
Take this segments.2.tsv file for example
chr start end copy_number lohtype
1 752001 16851000 3 ASCNA
1 16858001 17015000 5 ASCNA
1 17017001 17113000 6 ASCNA
1 17114001 72747000 3 ASCNA
1 72815001 120532000 3 ASCNA
1 120532001 120773000 5 ASCNA
1 120902001 121352000 3 ASCNA
1 142536001 143959000 7 ASCNA
After loading functions from hdrtools:
Loading segments:
loh_ranges <- read_tsv("segments.2.tsv") %>% as.data.frame %>% GRanges
Instructions from run_test:
loh.col='lohtype'; cnv.col='copy_number'
loh_ranges = sort(loh_ranges)
loh_ranges <- contig(loh_ranges, loh.col, cnv.col)
loh_ranges <- filter_ranges(loh_ranges, loh.col, cnv.col)
gives the output
loh_ranges
GRanges object with 3 ranges and 4 metadata columns:
seqnames ranges strand | lohtype copy_number neighbours_equivalent
|
[1] 1 [ 1, 121352000] * | ASCNA 3 FALSE
[2] 1 [ 16858001, 120532000] * | ASCNA 5 TRUE
[3] 1 [121352001, 143959000] * | ASCNA 7 FALSE
length_ratio
[1] 0.8543246
[2] 1.3885738
[3] 4.5859247
Notice that the first segment now overlaps the second one.
I partly pinpointed the problem to the function add_neighbours_column . You might need to change the line
return( i != 1 && i != dim(chr_data)[1] && chr_data[i-1, col_names] == chr_data[i+1, col_names] )
to
return( i != 1 && i != dim(chr_data)[1] && all( chr_data[i-1, col_names] == chr_data[i+1, col_names] ) )
because chr_data[i-1, col_names] == chr_data[i+1, col_names] returns a vector, and, for example,
TRUE && ( c( 1,2 ) == c( 1,3 ) )
[1] TRUE
Not sure if this is the fix.
HRDetect
Herty Liany
[email protected]
Dear Eric,
i tried to git clone the HRDetect from github repository:
https://github.com/eyzhao/hrdetect-pipeline.git
It is successful, however when i attempted to perform "make" in order to build the source project, it failed with message:
_**Cloning into 'git/hrdtools'
Permission denied(publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.
Makefile:30: recipe for target 'git/hrdtools' failed
make: *** [git/hrdtools] Error 128**_
I am running in linux environment.
thanks very much for your help.
Regards,
Herty
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.