alexandrovlab / sigprofilerextractorr Goto Github PK

An R wrapper for SigProfilerExtractor that allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.

License: BSD 2-Clause "Simplified" License

R 100.00%

bioinformatics somatic-variants cancer-genomics mutational-signatures mutation-analysis

sigprofilerextractorr's Introduction

SigProfilerExtractorR

An R wrapper for running the SigProfilerExtractor framework.

INTRODUCTION

The purpose of this document is to provide a guide for using the SigProfilerExtractor framework to extract the De Novo mutational signatures from a set of samples and decompose the De Novo signatures into the COSMIC signatures. An extensive Wiki page detailing the usage of this tool can be found at https://osf.io/t6j7u/wiki/home/. For users that prefer working in a Python environment, the tool is written in Python and can be found and installed from: https://github.com/AlexandrovLab/SigProfilerExtractor

Installation
Functions
Citation
Copyright
Contact Information

Installation

PREREQUISITES

devtools (R)

>> install.packages("devtools")

reticulate* (R)

>> install.packages("reticulate")

*Reticulate has a known bug of preventing python print statements from flushing to standard out. As a result, some of the typical progress messages are delayed.

QUICK START GUIDE

This section will guide you through the minimum steps required to extract mutational signatures from genomes:

First, install the python package using pip. The R wrapper still requires the python package:

pip install SigProfilerExtractor

Open an R session and ensure that your R interpreter recognizes the path to your python installation:

$ R
>> library(reticulate)
>> use_python("path_to_your_python")
>> py_config()
python:         /anaconda3/bin/python
libpython:      /anaconda3/lib/libpython3.6m.dylib
pythonhome:     /anaconda3:/anaconda3
version:        3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy:          /anaconda3/lib/python3.6/site-packages/numpy
numpy_version:  1.16.1

If you do not see your python path listed, restart your R session and rerun the above commands in order.

Install SigProfilerExtractorR using devtools:

>>library(devtools)
>>install_github("AlexandrovLab/SigProfilerExtractorR")

Load the package in the same R session and install your desired reference genome as follows (available reference genomes are: GRCh37, GRCh38, mm9, and mm10):

>> library(SigProfilerExtractorR)
>> install("GRCh37", rsync=FALSE, bash=TRUE)

This will install the human 37 assembly as a reference genome.

SUPPORTED GENOMES

Other available reference genomes are GRCh38, mm9 and mm10 (and genomes supported SigProfilerMatrixGenerator. Information about supported will be found at https://github.com/AlexandrovLab/SigProfilerMatrixGeneratorR

Quick Example:

Signatures can be extracted from vcf files or tab delimited mutational table using the sigprofilerextractor function.

>> help(sigprofilerextractor)

This will show the details about the sigprofilerextractor funtion.

>> library(SigProfilerExtractorR)
>> path_to_example_data <- importdata("matrix")
>> data <- path_to_example_data # here you can provide the path of your own data
>> sigprofilerextractor("matrix", 
                     "example_output", 
                     data, 
                     minimum_signatures=2,
                     maximum_signatures=3,
                     nmf_replicates=5,
                     min_nmf_iterations = 1000,
                     max_nmf_iterations =100000,
                     nmf_test_conv = 1000,
                     nmf_tolerance = 0.00000001)

The example file will generated in the working directory. Note that the parameters used in the above example are not optimal to get accurate signatures. Those are used only for a quick example.

Functions

The list of available functions are:

importdata
sigprofilerextractor
estimate_solution

importdata

Imports the path of example data.

importdata(datatype)

datatype: Type of example data. There are two types: 1. "vcf", 2. "matrix".

importdata Example

library(SigProfilerExtractorR)
path_to_example_table = importdata("matrix")
data = path_to_example_table 
# This "data" variable can be used as a parameter of the "project" argument of the sigprofilerextractor function.

# To get help on the parameters and outputs of the "importdata" function, please use the following:
help(importdata)

sigprofilerextractor

Extracts mutational signatures from an array of samples.

sigprofilerextractor(input_type, output, input_data, reference_genome="GRCh37",
                     opportunity_genome = "GRCh37", context_type = "default",
                     exome = False, minimum_signatures=1, maximum_signatures=10,
                     nmf_replicates=100, resample = T, batch_size=1, cpu=-1,
                     gpu=F, nmf_init="random", precision= "single",
                     matrix_normalization= "gmm", seeds= "random",
                     min_nmf_iterations= 10000, max_nmf_iterations=1000000,
                     nmf_test_conv= 10000, nmf_tolerance= 1e-15,
                     nnls_add_penalty=0.05, nnls_remove_penalty=0.01,
                     initial_remove_penalty=0.05, get_all_signature_matrices= False)

Category	Parameter	Variable Type	Parameter Description
Input Data
	input_type	String	The type of input: "vcf": used for vcf format inputs. "matrix": used for table format inputs using a tab seperated file. "bedpe": used for bedpe file with each SV annotated with its type, size bin, and clustered/non-clustered status. "seg:TYPE": used for a multi-sample segmentation file for copy number analysis. The accepted callers for TYPE are the following {"ASCAT", "ASCAT_NGS", "SEQUENZA", "ABSOLUTE", "BATTENBERG", "FACETS", "PURPLE", "TCGA"}. For example, when using segmentation file from BATTENBERG then set input_type to "seg:BATTENBERG".
	output	String	The name of the output folder. The output folder will be generated in the current working directory.
	input_data	String	Path to input folder for input_type: vcf bedpe Path to file for input_type: matrix seg:TYPE
	reference_genome	String	The name of the reference genome. The default reference genome is "GRCh37". This parameter is applicable only if the input_type is "vcf".
	opportunity_genome	String	The build or version of the reference genome for the reference signatures. The default opportunity genome is GRCh37. If the input_type is "vcf", the opportunity_genome automatically matches the input reference genome value. Only the genomes available in COSMIC are supported (GRCh37, GRCh38, mm9, mm10 and rn6). If a different opportunity genome is selected, the default genome GRCh37 will be used.
	context_type	String	A string of mutaion context name/names separated by comma (","). The items in the list defines the mutational contexts to be considered to extract the signatures. The default value is "96,DINUC,ID", where "96" is the SBS96 context, "DINUC" is the DINUCLEOTIDE context and ID is INDEL context.
	exome	Boolean	Defines if the exomes will be extracted. The default value is "False".
NMF Replicates
	minimum_signatures	Positive Integer	The minimum number of signatures to be extracted. The default value is 1.
	maximum_signatures	Positive Integer	The maximum number of signatures to be extracted. The default value is 25.
	nmf_replicates	Positive Integer	The number of iteration to be performed to extract each number signature. The default value is 100.
	resample	Boolean	Default is True. If True, add poisson noise to samples by resampling.
	seeds	String	It can be used to get reproducible resamples for the NMF replicates. A path of a tab separated .txt file containing the replicated id and preset seeds in a two columns dataframe can be passed through this parameter. The Seeds.txt file in the results folder from a previous analysis can be used for the seeds parameter in a new analysis. The Default value for this parameter is "random". When "random", the seeds for resampling will be random for different analysis.
NMF Engines
	matrix_normalization	String	Method of normalizing the genome matrix before it is analyzed by NMF. Default is value is "gmm". Other options are, "log2", "custom" or "none".
	nmf_init	String	The initialization algorithm for W and H matrix of NMF. Options are 'random', 'nndsvd', 'nndsvda', 'nndsvdar' and 'nndsvd_min'. Default is 'random'.
	precision	String	Values should be single or double. Default is single.
	min_nmf_iterations	Integer	Value defines the minimum number of iterations to be completed before NMF converges. Default is 10000.
	max_nmf_iterations	Integer	Value defines the maximum number of iterations to be completed before NMF converges. Default is 1000000.
	nmf_test_conv	Integer	Value defines the number number of iterations to done between checking next convergence. Default is 10000.
	nmf_tolerance	Float	Value defines the tolerance to achieve to converge. Default is 1e-15.
Execution
	cpu	Integer	The number of processors to be used to extract the signatures. The default value is -1 which will use all available processors.
	gpu	Boolean	Defines if the GPU resource will used if available. Default is False. If True, the GPU resources will be used in the computation. Note: All available CPU processors are used by default, which may cause a memory error. This error can be resolved by reducing the number of CPU processes through the cpu* parameter.*
	batch_size	Integer	Will be effective only if the GPU is used. Defines the number of NMF replicates to be performed by each CPU during the parallel processing. Default is 1.
Solution Estimation Thresholds
	stability	Float	Default is 0.8. The cutoff thresh-hold of the average stability. Solutions with average stabilities below this thresh-hold will not be considered.
	min_stability	Float	Default is 0.2. The cutoff thresh-hold of the minimum stability. Solutions with minimum stabilities below this thresh-hold will not be considered.
	combined_stability	Float	Default is 1.0. The cutoff thresh-hold of the combined stability (sum of average and minimum stability). Solutions with combined stabilities below this thresh-hold will not be considered.
	allow_stability_drop	Boolean	Default is False. Defines if solutions with a drop in stability with respect to the highest stable number of signatures will be considered.
Decomposition
	cosmic_version	Float	Takes a positive float among 1, 2, 3, 3.1, 3.2, 3.3, and 3.4. Default is 3.4. Defines the version of the COSMIC reference signatures.
	nnls_add_penalty	Float	Takes any positive float. Default is 0.05. Defines the strong (add) thresh-hold cutoff to assign signatures to a sample.
	nnls_remove_penalty	Float	Takes any positive float. Default is 0.01. Defines the weak (remove) thresh-hold cutoff to assign signatures to a sample.
	initial_remove_penalty	Float	Takes any positive float. Default is 0.05. Defines the initial weak (remove) thresh-hold cutoff to assign COSMIC signatures to a sample.
	make_decomposition_plots	Boolean	Defualt is True. If True, Denovo to Cosmic sigantures decompostion plots will be created as a part the results.
	collapse_to_SBS96	Boolean	Defualt is True. If True, SBS288 and SBS1536 Denovo signatures will be mapped to SBS96 reference signatures. If False, those will be mapped to reference signatures of the same context.
Others
	get_all_signature_matrices	Boolean	If True, the Ws and Hs from all the NMF iterations are generated in the output.
	export_probabilities	Boolean	Defualt is True. If False, then doesn't create the probability matrix.

sigprofilerextractor Example

library(SigProfilerExtractorR)   
# to get input from vcf files.  
path_to_example_folder_containing_vcf_files = importdata("vcf")
data = path_to_example_folder_containing_vcf_files # you can put the path to your folder containing the vcf samples.  
sigprofilerextractor("vcf", "example_output", data, minimum_signatures=1, maximum_signatures=10)


# Wait untill the excecution is finished. The process may a couple of hours based on the size of the data.
# Check the current working directory for the "example_output" folder.


# to get input from table format (mutation catalog matrix)
path_to_example_table = importdata("matrix")
data = path_to_example_table # you can put the path to your tab delimited file containing the mutational catalog matrix/table
sigprofilerextractor("matrix", "example_output", data, opportunity_genome="GRCh38", minimum_signatures=1,maximum_signatures=10)

sigprofilerextractor Output

To learn about the output, please visit https://osf.io/t6j7u/wiki/home/

Estimation of the Optimum Solution (estimate_solution)

Estimate the optimum solution (rank) among different number of solutions (ranks).

estimate_solution(base_csvfile, 
          All_solution, 
          genomes, 
          output, 
          title,
          stability, 
          min_stability, 
          combined_stability)

Parameter	Variable Type	Parameter Description
base_csvfile	String	Default is "All_solutions_stat.csv". Path to a csv file that contains the statistics of all solutions.
All_solution	String	Default is "All_Solutions". Path to a folder that contains the results of all solutions.
genomes	String	Default is Samples.txt. Path to a tab delimilted file that contains the mutation counts for all genomes given to different mutation types.
output	String	Default is "results". Path to the output folder.
title	String	Default is "Selection_Plot". This sets the title of the selection_plot.pdf
stability	Float	Default is 0.8. The cutoff thresh-hold of the average stability. Solutions with average stabilities below this thresh-hold will not be considered.
min_stability	Float	Default is 0.2. The cutoff thresh-hold of the minimum stability. Solutions with minimum stabilities below this thresh-hold will not be considered.
combined_stability	Float	Default is 1.0. The cutoff thresh-hold of the combined stability (sum of average and minimum stability). Solutions with combined stabilities below this thresh-hold will not be considered.

estimate_solution Example

estimate_solution(base_csvfile="All_solutions_stat.csv", 
          All_solution="All_Solutions", 
          genomes="Samples.txt", 
          output="results", 
          title="Selection_Plot",
          stability=0.8, 
          min_stability=0.2, 
          combined_stability=1.25)

estimate_solution Output

The files below will be generated in the output folder:

File Name	Description
All_solutions_stat.csv	A csv file that contains the statistics of all solutions.
selection_plot.pdf	A plot that depict the Stability and Mean Sample Cosine Distance for different solutions.

GPU support

If CUDA out of memory exceptions occur, it will be necessary to reduce the number of CPU processes used (the cpu parameter).

For more information, help, and examples, please visit: https://osf.io/t6j7u/wiki/home/

Citation

Islam SMA, Díaz-Gay M, Wu Y, Barnes M, Vangara R, Bergstrom EN, He Y, Vella M, Wang J, Teague JW, Clapham P, Moody S, Senkin S, Li YR, Riva L, Zhang T, Gruber AJ, Steele CD, Otlu B, Khandekar A, Abbasi A, Humphreys L, Syulyukina N, Brady SW, Alexandrov BS, Pillay N, Zhang J, Adams DJ, Martincorena I, Wedge DC, Landi MT, Brennan P, Stratton MR, Rozen SG, and Alexandrov LB (2022) Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. doi: 10.1016/j.xgen.2022.100179.

Copyright

This software and its documentation are copyright 2018 as a part of the sigProfiler project. The SigProfilerExtractor framework is free software and is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Contact Information

Please address any queries or bug reports to Mark Barnes at [email protected] or Marcos Díaz-Gay at [email protected].

sigprofilerextractorr's People

Contributors

Stargazers

Watchers

Forkers

kevin-apl jingquanlim

sigprofilerextractorr's Issues

Runtime error when using example data

Hi,

I pip installed your tool and ran the example, get the following error:

Any help would be appreciated!

Thanks

======== Error:

Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 4.77 seconds.
Matrices generated for 2 samples with 5997 errors. Total of 1909 SNVs, 49 DINUCs, and 0 INDELs were successfully analyzed.
Extracting signature 1 for mutation type 96
The matrix normalizig cutoff is 9600

Traceback (most recent call last):
File "", line 1, in
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
prepare(preparation_data)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/shared/pipeline-user/run_data/Exome_data_neogenomics/Tumor_only_neo_batch4_wrapper_run/tumor_only_neobatch4/sigprof/run_sig.py", line 9, in
main_function()
File "/shared/pipeline-user/run_data/Exome_data_neogenomics/Tumor_only_neo_batch4_wrapper_run/tumor_only_neobatch4/sigprof/run_sig.py", line 8, in main_function
sig.sigProfilerExtractor("vcf", "example_output", "vcftest", minimum_signatures=1, maximum_signatures=3,reference_genome="GRCh38",exome=True)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/SigProfilerExtractor/sigpro.py", line 818, in sigProfilerExtractor
i = i)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/SigProfilerExtractor/subroutines.py", line 789, in decipher_signatures
results = parallel_runs(excecution_parameters, genomes=genomes, totalProcesses=totalProcesses, verbose = False)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/site-packages/SigProfilerExtractor/subroutines.py", line 685, in parallel_runs
pool = multiprocessing.Pool()
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/pool.py", line 174, in init
self._repopulate_pool()
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
w.start()
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/shared/pipeline-user/bcbio/anaconda/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Decomposition function does not decomposite the de novo signatures to cosmic global signatures.

Running the function 'sigprofilerextractor', the suggested solution is 6 de novo signature

	Stability (Avg Silhouette)	Minimum Stability	Matrix Frobenius%	Mean Sample L1%	Maximum Sample L1%	Mean Sample L2%	Maximum Sample L2%	Significant Decrease of L2	Mean Sample KL	Maximum Sample KL
1	1	1	57.73%	51.11%	147.37%	50.98%	145.87%	TRUE	0.2305	2.0591
2	1	1	26.30%	28.52%	127.72%	25.53%	95.14%	TRUE	0.0984	1.5946
3	0.99	0.98	20.30%	24.68%	128.24%	20.89%	92.43%	FALSE	0.0834	1.561
4	0.94	0.9	16.82%	21.43%	121.02%	16.11%	96.94%	TRUE	0.0703	1.3847
5	0.8	0.12	15.70%	18.83%	83.93%	13.67%	62.86%	FALSE	0.0542	0.6204
6*	0.88	0.32	14.62%	16.52%	49.01%	11.80%	33.24%	TRUE	0.0425	0.2881
7	0.87	0.3	11.86%	14.77%	50.16%	10.35%	31.06%	FALSE	0.0348	0.3028
8	0.68	-0.49	11.95%	14.49%	49.32%	9.97%	34.42%	FALSE	0.0333	0.3079
9	0.78	-0.03	9.32%	12.23%	26.99%	8.27%	27.23%	TRUE	0.0252	0.1273
10	0.6	-0.02	9.84%	12.23%	27.57%	8.50%	26.69%	FALSE	0.0238	0.1286

However, in the decomposition folder, all samples are assigned to 8 signatures (signature A~ F, SBS1, SBS5). Here is the first a few line from the file 'Decomposed_Solution_Signatures_SBS96.txt'

MutationsType	SBS1	SBS5	Signature A	Signature B	Signature C	Signature D	Signature E	Signature F
A[C>A]A	0.000886	0.011998	0.001695	0.000153	0	0.000733	9.86E-05	0.002613
A[C>A]C	0.00228	0.009438	0.002864	0.003437	0.00096	0.004188	0.000455	1.77E-06
A[C>A]G	0.000177	0.00185	0.000913	0.000123	0	0.000608	0	1.93E-06
A[C>A]T	0.00128	0.006609	0.012232	0.019794	0	0.011603	0	1.31E-06
C[C>A]A	0.00186	0.010098	0.002824	0.002846	0.002816	0.016223	1.9E-05	0.024425
C[C>A]C	0.00122	0.005699	0.001623	0.002203	0.000484	0.027418	0.000285	0.000546
C[C>A]G	0.000115	0.00172	0.001503	0.000929	0.000289	0.00955	0	1.53E-06
C[C>A]T	0.00114	0.010098	0.015617	0.059091	0.000421	0.129235	0.001499	0.024569

The file 'comparison_with_global_ID_signatures.csv' shows

De novo extracted	Global NMF Signatures	Cosine Similarity
Signature 96-A	Signature 96-A	1
Signature 96-B	Signature 96-B	1
Signature 96-C	Signature 96-C	1
Signature 96-D	Signature 96-D	1
Signature 96-E	Signature 96-E	1
Signature 96-F	Signature 96-F	1

Why the de novo signatures are not deomposited to Cosmic SBS signatures?

Sigprofileextractor GRCh38 Exome Reference File Not Found Error After Installation

Hi,

I am planning to use VCF files for my samples to look for gene signatures in GRCh38, and followed your code to install the GRCh38 reference:

> library(SigProfilerExtractorR)
> install("GRCh38", rsync=FALSE, bash=TRUE)
Beginning installation. This may take up to 40 minutes to complete.
The transcriptional reference data for GRCh38 has been saved.
All reference files have been created.
Verifying and benchmarking installation now...
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.12 seconds.
Matrices generated for 1 samples with 0 errors. Total of 9631 SNVs, 0 DINUCs, and 0 INDELs were successfully analyzed.
Installation was succesful.
SigProfilerMatrixGenerator took 7.2261998653411865 seconds to complete.
To proceed with matrix_generation, please provide the path to your vcf files and an appropriate output path.
Installation complete.

However when I run sigprofilerextractor I am given a file not found error message. Is there another way to get the exome.interval_list?:

> sigprofilerextractor("vcf",".","vcf",reference_genome = "GRCh38",exome = TRUE)

************** Reported Current Memory Use: 0.61 GB *****************

Starting matrix generation for SNVs and DINUCs...Error: FileNotFoundError: [Errno 2] No such file or directory: '/home/python/Python-3.10.11_install/lib/python3.10/site-packages/SigProfilerMatrixGenerator/references/chromosomes/exome/GRCh38/GRCh38_exome.interval_list'

simultaneous analysis of genomes and exomes

Hello,

I would like to analyze in the same batch WGS and WES data. Is it possible to do this using the SigProfilerExtractor?
How is exome normalisation performed?

Thank you in advance for your help.
Maria

Error in refgen while using SigProfilerExtractorR

Hi all,

I am trying to use the function as the following:

sigprofilerextractor("table", output="output",filteredT , refgen = "GRCh37",
genome_build = "GRCh37", minsigs = 1, maxsigs = 3,
replicates = 5, mtype = c("96,DINUCS,ID"), init = "random",
exome = F, cpu = -1)

I received this error below;

Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: sigProfilerExtractor() got an unexpected keyword argument 'refgen'

BTW, I installed reference genome for human by using install("GRCh37", rsync=FALSE, bash=TRUE)

after about 40 minutes wait time, the installation was successful.

trouble to extract mouse 'mm10' signatures through SigProfilerExtractorR

Hi, I am new to the tool. I've tried to follow the 'easy to follow' instruction to install and run SigProfilerExtractorR on my VCF called against mouse (mm10). The simple R code that I used is:

sigprofilerextractor("vcf", "output", data, minsigs=1, maxsigs=3, replicates=10, cpu=-1, refgen = 'mm10',genome_build="mm10", mtype=c("DINUC,ID,96"))

But, the job_metadata file gives the following:

-------Vital Parameters Used for the execution -------
input_type: vcf
inputdata: vcftest
startProcess: 1
endProcess: 3
totalIterations: 10
cpu: -1
hierarchy: False
refgen: GRCh38
genome_build: GRCh38
mtype: ['default']
init: random

At least 2 things are out of expectation here: 1) I specified 'mm10', which I successfully installed I believe, but here says 'GRCh38'; 2) I specified the mutation types that I want, but it still works on the 'default' mtype.

I don't have this issue when I run SigProfilerMatrixGeneratorR/PlottingR. Can anybody help? Plus, not sure if it is due to the above I always get an error message as below:

**Time taken to collect 10 iterations for 1 signatures is 0.55 seconds
Optimization time is 0.01352691650390625 seconds
The reconstruction error is 0.0445, average process stability is 1.0 and
the minimum process stability is 1.0 for 1 signatures

Extracting signature 2 for mutation type 96
Error in py_call_impl(callable, dots$args, dots$keywords) :
IndexError: index 1 is out of bounds for axis 1 with size 1

Detailed traceback:
File "/usr/local/lib/python3.6/site-packages/sigproextractor/sigpro.py", line 571, in sigProfilerExtractor
gpu=gpu,)
File "/usr/local/lib/python3.6/site-packages/sigproextractor/subroutines.py", line 984, in decipher_signatures
results = parallel_runs(genomes=genomes, totalProcesses=totalProcesses, iterations=totalIterations, n_cpu=cpu, verbose = False, resample=resample, seeds = seeds, init=init, normalization_cutoff=normalization_cutoff, gpu=gpu)
File "/usr/local/lib/python3.6/site-packages/sigproextractor/subroutines.py", line 461, in parallel_runs
result_list = pool.map(pool_nmf, seeds)
File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value**

Thanks!

cosmic v2

Hi there, just wanted to let you know what when trying to run sigprofilerextractorR using cosmic v2, the reference file is not found:
/opt/anaconda3/lib/python3.8/site-packages/SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v2.0_SBS_GRCh37.txt

because that file is actually named "COSMIC_v2_SBS_GRCh37.txt"

Renaming that file to be "v2.0" instead of "v2" made the program run as expected

I imagine this would also be a problem for the "v1" labeled files too

Cheers,
~ Annabel

ModuleNotFoundError: No module named 'SigProfilerExtractor'

This is not an issue per se but I think someone else would probably look at here to find a solution.
It seems there is a typo in the readme:

pip install sigproextractor

this should be:

pip install SigProfilerExtractor

otherwise then there is an error in R, python does not find the correct package.

Thanks!

error in de_novo_fit_penalty

Hi, I'm trying to use sigprofiler to extract mutations from mm10 catalogues.
I'm getting an error when I run the code:
Line:

sigprofilerextractor("vcf",
"sigprofiler",
"/Volumes/archive/cancergeneticslab/ConorM/MouseAnalysis/totalVCFhets/",
"mm10")
Error:

TypeError: sigProfilerExtractor() got an unexpected keyword argument 'de_novo_fit_penalty'

Other info:

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

py_config()
python: /usr/bin/python3.6
libpython: /usr/lib64/libpython3.6m.so
pythonhome: //usr://usr
version: 3.6.8 (default, Aug 13 2020, 07:46:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
numpy: /usr/local/lib64/python3.6/site-packages/numpy
numpy_version: 1.17.4
sys: [builtin module]

NOTE: Python version was forced by use_python function

Trying to set de_novo_fit_penalty manually did not help. Any suggestions?
Thanks
Conor

No NAMESPACE file

Hi,
The package supposed to have a NAMESPACE file as during the installation it throws an error that NAMESPACE file is required.

I tried to install with the simple command but the error comes every time I do it.

`devtools::install_github("AlexandrovLab/SigProfilerExtractorR")
Downloading GitHub repo AlexandrovLab/SigProfilerExtractorR@master
from URL https://api.github.com/repos/AlexandrovLab/SigProfilerExtractorR/zipball/master
Installing SigProfilerExtractorR
'/data/users/prakha/Main/envs/RP/lib/R/bin/R' --no-site-file --no-environ
--no-save --no-restore --quiet CMD INSTALL
'/tmp/Rtmpq3kdOM/devtoolsc4fe179727219/AlexandrovLab-SigProfilerExtractorR-01dca1c'
--library='/data/users/prakha/Main/envs/RP/lib/R/library' --install-tests

installing source package ‘SigProfilerExtractorR’ ...
ERROR: a 'NAMESPACE' file is required
removing ‘/data/users/prakha/Main/envs/RP/lib/R/library/SigProfilerExtractorR’
Installation failed: Command failed (1)`

Can you help with this error?

unexpected keyword argument 'collapse_to_SBS96'

Hi,

I am trying to run the sigprofilerextractor() function as following:

sigprofilerextractor("vcf",
output,
inputdata,
"GRCh37"
#exome = TRUE #when WES turn TRUE
#minimum_signatures= 1,
#maximum_signatures= 25,
#nmf_replicates= 100,
#min_nmf_iterations = 10000,
#max_nmf_iterations =1000000,
#nmf_test_conv = 10000,
#nmf_tolerance = 0.00000001,
)

The type of inputdata file is vcf format.

I am having the following error when trying to run:
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: sigProfilerExtractor() got an unexpected keyword argument 'collapse_to_SBS96'

error in de_novo_fit_penalty like in issue #7

Hi, I'm experiencing the same issue as #7 after having updated SigProfilerExtractor to v1.1.13
Thanks for your help.

Best
Francois

no cosmic decomposition for strand bias

Hi all,
I have an issue with the cosmic decompostion step of suggested signature.
While the decomposition works well with the SBS96 matrix, it doesn't seems to operate on strand bias matrix.
I used the SBS384 matrix generated by SigProfilerMatrixGeneratorR, but I didn't get any mapping to COSMIC signatures.

Here's the De_Novo_map_to_COSMIC_SBS384.csv file :
De_Novo_map_to_COSMIC_SBS384.csv

and the log file :
Cosmic_SBS384_Decomposition_Log.txt

thanks for your help