eyzhao / hrdetect-pipeline Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 14.0 12.09 MB

License: Other

Makefile 1.70% Python 20.98% R 77.32%

hrdetect-pipeline's People

Contributors

Stargazers

Watchers

Forkers

xtmgah fanglv1990 wangdi2014 anyone1985 jh2663 miro1942 karthick37 rogerzhanglijie artbio nvk747 haz1y bharatpanera sunkrock

hrdetect-pipeline's Issues

SV input type

Hello:
Very useful algorithm. Do you know why insertion were not included in the SV signature analysis? There are only four type SVs including Deletion, Insertion, Dup, inter-chromosomal translocation. Any reason to exclude the Insertion? Also, did intra-chromosomal translocation should included as "Inversion" or "TRA"? Thanks.

some bugs

commit 7088953
Merge: 100c662 2530cc2
Author: Eric Zhao [email protected]
Date: Sat Jan 20 09:27:43 2018 -0800

running the R script files with SignIT and hrdtools installed.
hrdtools c91fea411c
SignIT d178f93f

Rscript ../get_mutation_catalog.R -v final.vcf -c mutation_catalog.tsv

Error in .local(file, genome, param, ...) : 'genome' argument is missing
Calls: readVcf -> readVcf -> .local
Execution halted

readVcf has missing genome option that has no default. added
vcf <- readVcf(args[['v']],args[['r']] )

Moreover, unlike mentioned in the script, there's no
no such thing as a hg19 default for the -r option, it has to be specified.

Rscript ../get_mutation_catalog.R -v final.vcf -c mutation_catalog.tsv -r hg19

########

Rscript ../signit_exposures.R -c mutation_catalog.tsv -o signit_output.Rds

Error in get_exposures(file = args[["catalog"]], reference_signatures = reference, :
unused argument (file = args[["catalog"]])
Execution halted

get_exposures has no argument "file ="

Error in get_exposures(args[["catalog"]], reference_signatures = reference, :

         Mutation catalog is not properly formatted. 
         Must have two columns: mutation_type (character) and count (integer)

Execution halted

the problem is that get_exposures expects a data frame, not a file name

Must read.table with as.is=T, because first column has to be character and not factor

##########

Rscript ../signit_summary_table.R -i signit_output.Rds -o signit_exposure_summary.tsv

Error: could not find function "%>%"
Execution halted

caused by
if (is.null(args[['signit']])) {
library(signit)

} else {
library(devtools)
library(tidyverse)
library(rjags)
library(nnls)
library(dbscan)
library(Rtsne)

load_all(args[['signit']])

}

not all of the libraries are loaded with library(signit). It appears only tidyverse is really needed.

Rscript ../flank_indels.R -v final.vcf -o indels_annotated.tsv

Error in .local(file, genome, param, ...) : 'genome' argument is missing
Calls: %>% ... get_flanking_sequence -> parse_indel -> readVcf -> readVcf -> .local
Execution halted

same thing, missing "hg19" in readVcf

Error in [[<-(*tmp*, name, value = c("T", "C", "A", "T", "A", "G", :
15620 elements in value to replace 15613 elements
Calls: %>% ... get_flanking_sequence -> parse_indel -> $<- -> $<- -> [[<- -> [[<-
Execution halted

this is caused by entries such as "A,C" in the ALT column of vcfs (more than 1 somatic allele). Don't know if !isSNV from vcf_to_snv_indel.R is able to correctly leave it out.

Rscript ../run_test.R -l segments.tsv -o hrd_score.tsv -r hg19

Error in function_list[k] : could not find function "GRanges"
Calls: %>% ... eval -> _fseq -> freduce -> withVisible ->
Execution halted

missing library(GenomicRanges)

make dependencies error

error message:

Miniconda3-latest-Linux-x86_64.sh: line 370: /Users/chenj9/Desktop/Work/bin/hrdetect-pipeline-master/dependencies/miniconda3/pkgs/python-3.7.0-hc3d631a_0/bin/python: cannot execute binary file

Error in snakemake

FileNotFoundError in line 3 of /home/ubuntu/efs/hrdetect-pipeline/projects/pog.smk:
[Errno 2] File b'config/pog_custom_cohorts.tsv' does not exist: b'config/pog_custom_cohorts.tsv'

reported range of HRDscores

I've been testing the HRDetect pipeline as well as the scarHRD tool. From what i can tell, both use the same definition of genomic scars when calculating the HRDscore (Ref: https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058-014-0475-x) . However, as you can see in the picture below, the range of the HRDscores reported by these two tools is very different.

Does HRDetect calculate the HRDscore simply as the unweightedsum of TAI, LST and LOH ? Alternatively, is there any reason that would explain these differences ?

Best,

-Mathias

clustered SVs

Lines 45-48 of calculate_sv_catalog.R implies that the first interSV distance is calculated from the start of the chromosome. The first distance should rather be 0, i.e. c(0, pos[1:n()-1]) should be c(pos[1], pos[1:n()-1]) in the following:

mutate(
prev_pos = c(0, pos[1:n()-1]),
distance = pos - prev_pos
)

About 10% of our SVs change classification (clustered vs non-clustered) with the change.

scripts/basic-functions/row_bind_tables.R missing

From Snakefile:
'Rscript ' + PROJECT_DIR + '/scripts/basic-functions/row_bind_tables.R -i {input} -o {output}'
The row_bind_tables.R is missing from the distribution.

TAI score assumes no "chr" strings in segments.tsv

In hrdscore/run_test.R, you are missing the usual instruction gsub("chr","",...) found elsewhere in your script files. As a result, if the "chr" strings are found in file segments.tsv, the TAI score won't be computed, as there will be no overlap with subtelomere regions.

From your hrdtools package:

subtelomeres <- GRanges(ddply(get_subtelomere_regions(), 'chromosome', function(z) {
return(data.frame(start = c(0, z$end), end = c(z$start, .Machine$integer.max)))
}))

with the default function get_subtelomere_regions <- function(chr_label = F) {...}

results in:

GRanges object with 52 ranges and 0 metadata columns:
seqnames ranges strand

[1] 1 [ 0, 10000] *
[2] 1 [249223345, 2147483647] *
[3] 10 [ 0, 82827] *
[4] 10 [135508492, 2147483647] *
[5] 11 [ 0, 116986] *
... ... ... ...
[48] 9 [141144172, 2147483647] *
[49] X [ 0, 182990] *
[50] X [155250482, 2147483647] *
[51] Y [ 0, 132990] *
[52] Y [ 59353488, 2147483647] *

This will not overlap segments with the "chr" string. There is also the X/23 and Y/24 issue.

Also could you provide the source or a reference for your subtelomere Rdata file?

library(deconstructSigs) missing

In the get_mutation_catalog.R script muts.to.sigs.input will not work because the library is not called.

Missing library(deconstructSigs)

LST score only compares lohtype but ignore differences in copy_number

Take this segments.tsv for illustration, where both segments are > 10Mbp and display different copy number counts but same lohtype:

chr start end copy_number lohtype
1 752001 16851000 3 ASCNA
1 16858001 27015000 5 ASCNA

Run these instructions, adapted from hrdscore/run_test.R:

segment.file<-"segments.tsv"
loh_ranges <- read_tsv(segment.file) %>% as.data.frame %>% GRanges
run_test(
loh_ranges,
loh.col = 'lohtype',
cnv.col = 'copy_number',
genome = "hg19"
) %>%
as_tibble()

The above returns:

[1] "Importing data..."
[1] 2
[1] 2
[1] "LOH Score: 0"
[1] "TAI Score: 1"
[1] "LST Score: 0"
[1] "Total HRD Score: 1"

A tibble: 1 x 4

loh   tai   lst total

1 0 1 0 1

However LST should be 1. This is because lst_test_bool from hrdtools only looks for differences in lohtype, irrespective of copy-number differences. This is a problem when consecutive segments are ASCNA (or consecutive ALOH, DLOH, etc).

lst_test_bool <- function(gr, loh.col) {
df <- as.data.frame(gr)
next_is_long <- c(is_long[2:length(is_long)], FALSE)
same_lohtype_as_next <- c(df[1:(dim(df)[1] - 1), loh.col] == df[2:dim(df)[1], loh.col], FALSE)
same_chr_as_next <- as.logical(c(df[1:(dim(df)[1] - 1), 'seqnames'] == df[2:dim(df)[1], 'seqnames'], TRUE))

return(is_long & next_is_long & !same_lohtype_as_next & same_chr_as_next)

}

missing the most important scripts

Thanks for your works .
when I read the file hrdetect-pipeline\Snakefile , I fond that the most important scripts /scripts/hrdetect/hrdetect_input_table.R and /scripts/hrdetect/compute_hrdetect.R were missing.

Could you share these ?

Example input files

Hi,

Thanks a lot for this contribution, the workflow is bound to be very useful in many settings. However, I believe it would be valuable to also provide a bundle/zip with example input files to see how the input looks for a successful run, and what it produces. I could not see this info anywhere?

An additional question: Are all the input files (SNVs, deletions, CNAs with LOH info, SVs) strictly necessary to make a prediction/detection, or is the implementation able to assign a prediction based on a subset of the input data (e.g. SNVs, InDels, and CNAs without LOH info)?

regards,
Sigve

Installation issues

Hello,

Thanks for getting this package together. I have been trying to install the miniconda version as well as each of the R libraries by itself but I am having no luck with either.

You are already aware of the SignIT issue here:
eyzhao/SignIT#3

I am anticipating that installing all the required R libraries will be enough to run the necessary scripts for HRDetect, but I can't get SignIT to install.

library(tidyverse)
library(devtools)
library(rstan)
library(signit) ***
library(argparse)
library(BSgenome)
library(BSgenome.Hsapiens.NCBI.GRCh38)
library(copynumber)
library(dbscan)
library(docopt)
library(doParallel)
library(futile.logger)
library(GenomicRanges)
library(nnls)
library(optparse)
library(plyr)
library(readr)
library(rjags)
library(RSvgDevice)
library(Rtsne)
library(snow)
library(stringr)
library(VariantAnnotation)
library(hrdtools)

So I tried with the miniconda pipeline, and had no luck there either.

Firstly, when installing the miniconda version by running make dependencies, the installation will not progress even after 24hrs unless I change the order of the dependencies/environment.yaml file to be in the following order:

- r
- conda-forge
- bioconda
- main

Then dependencies/Makefile has a svn checkout section that doesn't work:

packages/hrdtools/DESCRIPTION:
        mkdir -p packages/hrdtools && \
                svn checkout https://svn01.bcgsc.ca/svn/personal/ezhao/projects/signatures/trunk/hrdtools/ packages/hrdtools

instead, I replaced this with the following to make it work:

packages/hrdtools/DESCRIPTION:
        mkdir -p packages/hrdtools && \
                git clone https://github.com/eyzhao/hrdtools.git packages/hrdtools

Now even after this, I get a final error during the pipeline installation:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > conda activate dependencies
#
# To deactivate an active environment, use:
# > conda deactivate
#

echo "export R_LIBS=/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/library && export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib" > miniconda3/envs/dependencies/etc/conda/activate.d/env_vars.sh
mkdir -p packages/hrdtools && \
        git clone https://github.com/eyzhao/hrdtools.git packages/hrdtools
Cloning into 'packages/hrdtools'...
remote: Enumerating objects: 97, done.
remote: Total 97 (delta 0), reused 0 (delta 0), pack-reused 97
Unpacking objects: 100% (97/97), done.
source /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/bin/activate dependencies && \
        echo 'Installing R packages into this R install:' `which R` && \
        export R_LIBS=/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/library && \
        export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib
Installing R packages into this R install: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/bin/R
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/bin/Rscript r_dependencies.R && \
        source /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/bin/deactivate && \
        echo 'installed packages' > r_packages
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/libR.so)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/./libRblas.so)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicuuc.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicuuc.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicui18n.so.58)
/home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/R: /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../.././libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /home/apandey/Downloads/hrdetect-pipeline/dependencies/miniconda3/envs/dependencies/lib/R/bin/exec/../../lib/../../libicui18n.so.58)
make: *** [r_packages] Error 1

Another thing I noticed is that dependencies/Makefile has a hardcoded path to JAGS

$ head dependencies/Makefile
jags_lib := /gsc/software/linux-x86_64-centos6/JAGS-4.3.0/lib
makefile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
makefile_dir := $(patsubst %/,%, $(dir $(makefile_path)))

Thanks in advance for your insights.

Overlapping ranges/segments after call of filter_ranges in run_test

Take this segments.2.tsv file for example

chr start end copy_number lohtype
1 752001 16851000 3 ASCNA
1 16858001 17015000 5 ASCNA
1 17017001 17113000 6 ASCNA
1 17114001 72747000 3 ASCNA
1 72815001 120532000 3 ASCNA
1 120532001 120773000 5 ASCNA
1 120902001 121352000 3 ASCNA
1 142536001 143959000 7 ASCNA

After loading functions from hdrtools:

Loading segments:
loh_ranges <- read_tsv("segments.2.tsv") %>% as.data.frame %>% GRanges

Instructions from run_test:
loh.col='lohtype'; cnv.col='copy_number'

loh_ranges = sort(loh_ranges)
loh_ranges <- contig(loh_ranges, loh.col, cnv.col)
loh_ranges <- filter_ranges(loh_ranges, loh.col, cnv.col)

gives the output

loh_ranges

GRanges object with 3 ranges and 4 metadata columns:
seqnames ranges strand | lohtype copy_number neighbours_equivalent
|
[1] 1 [ 1, 121352000] * | ASCNA 3 FALSE
[2] 1 [ 16858001, 120532000] * | ASCNA 5 TRUE
[3] 1 [121352001, 143959000] * | ASCNA 7 FALSE
length_ratio

[1] 0.8543246
[2] 1.3885738
[3] 4.5859247

Notice that the first segment now overlaps the second one.

I partly pinpointed the problem to the function add_neighbours_column . You might need to change the line

return( i != 1 && i != dim(chr_data)[1] && chr_data[i-1, col_names] == chr_data[i+1, col_names] )

return( i != 1 && i != dim(chr_data)[1] && all( chr_data[i-1, col_names] == chr_data[i+1, col_names] ) )

because chr_data[i-1, col_names] == chr_data[i+1, col_names] returns a vector, and, for example,

TRUE && ( c( 1,2 ) == c( 1,3 ) )
[1] TRUE

Not sure if this is the fix.

Could not read from remote repository

HRDetect
Herty Liany
[email protected]
Dear Eric,
i tried to git clone the HRDetect from github repository:

https://github.com/eyzhao/hrdetect-pipeline.git

It is successful, however when i attempted to perform "make" in order to build the source project, it failed with message:

_**Cloning into 'git/hrdtools'
Permission denied(publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.
Makefile:30: recipe for target 'git/hrdtools' failed
make: *** [git/hrdtools] Error 128**_

I am running in linux environment.
thanks very much for your help.

Regards,
Herty