Git Product home page Git Product logo

hotspot3d's Introduction

HotSpot3D

This 3D proximity tool can be used to identify mutation hotspots from linear protein sequence and correlate the hotspots with known or potentially interacting domains, mutations, or drugs. Mutation-mutation and mutation-drug clusters can also be identified and viewed.

Usage

    Program:     HotSpot3D - 3D mutation proximity analysis program.

     Stable:     v0.6.0 

       Beta:     up to v1.8.2
    
     Author:     Beifang Niu, John Wallis, Adam D Scott, Sohini Sengupta, & Amila Weerasinghe

Usage: hotspot3d [options]

       Preprocessing
         drugport  --  0) Parse drugport database (OPTIONAL)
         uppro     --  1) Update proximity files
         prep      --  2) Run preprocessing steps 2a-2f
             calroi    --  2a) Generate region of interest (ROI) information
             statis    --  2b) Calculate p_values for pairs of mutations
             anno      --  2c) Add region of interest (ROI) annotation
             trans     --  2d) Add transcript annotation
             cosmic    --  2e) Add COSMIC annotation to proximity file
             prior     --  2f) Prioritization

       Analysis
	     main      --  Run analysis steps a-f (beta)
             search    --  a) 3D mutation proximity searching
             cluster   --  b) Determine mutation-mutation and mutation-drug clusters
             sigclus   --  c) Determine significance of clusters (BETA/OPTIONAL)
             summary   --  d) Summarize clusters (OPTIONAL)
             visual    --  e) Visulization of 3D proximity (OPTIONAL)

Support

For user support please email [email protected] & [email protected]

Update

To reinstall code of the same version (in some cases, may need --sudo):

cpanm --reinstall HotSpot3D-#.tar.gz

Install (Ubuntu 14.04.01)

Make sure that you have cpanm:

cpan App::cpanminus

For configuration, we recommend using local::lib:

cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)

Dependencies include the modules: LWP::Simple, Test::Most, List::Util, List::MoreUtils, Parallel::ForkManager

cpanm LWP::Simple

cpanm Test::Most

cpanm List::Util

cpanm List::MoreUtils

cpanm Parallel::ForkManager

Install HotSpot3D package:

git clone https://github.com/ding-lab/hotspot3d

cd hotspot3d

For the latest stable version:

git checkout v0.6.0

cpanm HotSpot3D-0.6.0.tar.gz

For the latest beta version:

git pull origin v1.8.2

cpanm HotSpot3D-1.8.2.tar.gz

Final note: Installations under some organizations may use an internal perl version. To make use of the /usr/ perl, edit the first line of ~/perl5/bin/hotspot3d.

from: #!/org/bin/perl

to: #!/usr/bin/perl

We have developed webservice for HotSpot3D, and you can submit your mutation file to do analysis online here: http://niulab.scgrid.cn/HotSpot3D/ , once you don't want to install HotSpot3D locally.

Configure Environment

It is helpful to add your perl5 lib directory, and to add your perl5 bin directory.

You can add the following lines to your ~/.bash_profile. Then run 'source ~/.bash_profile'.

	export PERL5LIB=~/perl5/lib/perl5/:${PERL5LIB}

	export PERL5BIN=~/perl5/bin/:${PERL5BIN}

	export PATH=~/perl5/bin/:${PATH}

Add cosmic v67 information to 3D proximity results :

	mkdir preprocessing_dir/cosmic

	cp COSMIC/cosmic_67_for_HotSpot3D_missense_only.tsv.bz2 ./preprocessing_dir/cosmic/

	cd ./preprocessing_dir/cosmic/ 

	bzip2 -d cosmic_67_for_HotSpot3D_missense_only.tsv.bz2

Example - Preprocessing

Download from Synapse

  1. Go to https://www.synapse.org/#!Synapse:syn8699796, and check out the wiki for any updates/details.

  2. Select the Files tab, then go into the AverageResidueDistance data directory (syn8717211).

  3. The DrugPort processing results are located here (syn9704835) for those interested.

  4. Select the reference/assembly version of interest (GRCh37 with Ensembl version 74 (syn9701918), or GRCh38 with Ensembl version 87 (syn9704851)).

  5. You will need to download the hugo.uniprot.pdb.transcript.csv (syn9704852).

  6. Two download options are available, prioritization.tar.gz (syn9704853) contains all human proteins that have been preprocessed. This is a large file that can take an hour or more depending on internet speeds. Alternatively, you can download the prioritization/ (syn9705109) or any specific protein proximity files within. The proximity files are compressed for faster/more targeted downloading.

NOTE: Proximity data only contains pairs within 20Angstroms between mutations. This should be sufficient for many HotSpot3D applications.

Generate on your own

  1. (Optional) Run drugport module to parse Drugport data and generate a drugport parsing results flat file :

     hotspot3d drugport --pdb-file-dir=pdb_files_dir
    
  2. Run 3D proximity calculation that also updates any existing preprocessed data (default launches LSF jobs) :

     hotspot3d uppro --output-dir=preprocessing_dir --pdb-file-dir=pdb_files_dir --drugport-file=drugport_parsing_results_file 1>hotspot3d.uppro.err 2>hotspot3d.uppro.out
    
  3. Run automated preprocessing for other measurments and annotations (can alternatively run steps 2a-2f individually) :

     hotspot3d prep --output-dir=preprocessing_dir
    

Example - Analysis

3D proximity searching based on prioritization results and visualization

  1. Proximity searching (acquire proximity information for input mutations):

     hotspot3d search --maf-file=your.maf --prep-dir=preprocessing_dir
    
  2. Cluster pairwise data:

     hotspot3d cluster --pairwise-file=3D_Proximity.pairwise --maf-file=your.maf
    
  3. Cluster significance calculation:

     hotspot3d sigclus --prep-dir=preprocessing_dir --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
    
  4. Clustering Summary:

     hotspot3d summary --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
    
  5. Visualization (works with PyMol):

     hotspot3d visual --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters --pdb=3XSR
    

Annotations

Check out scripts/ for various annotation scripts to add more details to the .clusters file.

HGNC download can be found here: http://www.genenames.org/cgi-bin/genefamilies/.

Information on the Ensembl .gtf can be found here: http://useast.ensembl.org/info/website/upload/gff.html, and downloads can be found at the Ensembl ftp site, ftp://ftp.ensembl.org/pub/.

See the scripts/README.annotations for more details.

Tips

Mutation file - Standard .maf with custom coding transcript and protein annotations (ENST00000275493 and p.L858R)

There are only a handful of columns necessary from .maf files. They are:

	Hugo_Symbol
	
	Chromosome
	
	Start_Position
	
	End_Position
	
	Variant_Classification
	
	Reference_Allele
	
	Tumor_Seq_Allele1
	
	Tumor_Seq_Allele2
	
	Tumor_Sample_Barcode

And two non-standard columns:

	a transcript ID column
	
	a protein peptide change column (HGVS p. single letter abbreviations, ie p.T790M)

Current Annotation Support:

	Transcript ID - Ensembl coding transcript ID's (ENST)

	Gene name - HUGO symbol

Clustering with different pairs data:

	For monomers, you need to include the option '--meric-type monomer'

	For homomers, you need to include the option '--meric-type homomer'

	For heteromers, you need to include the option '--meric-type heteromer'

	For both homomers & heteromers simultaneously, you need to include the option '--meric-type multimer'

	For no regard to *mer status, you can include the option 
	'--meric-type unspecified', although this is run by default without the option

	For DrugPort only, do not input the .pairwise file; input only DrugPort pairs file.

	For *mer+DrugPort include the .pairwise file with the DrugPort pairs file, 
	and include the appropriate --meric-type as described above.

Clustering based on different distance measures:

    There are some pairs found on multiple structures. 
	In HotSpot3D versions v0.6.2 and earlier, 
	clustering only used the shortest distance among different structures 
	(shortest structure distance, SSD). 
	In HotSpot3D versions v0.6.3 and later, 
	clustering can be done using the average distance among different structures 
	(average structure distance, ASD), and this is now default.

Citation

If you use HotSpot3D in your research, please cite:

  • Protein-structure-guided discovery of functional mutations across 19 cancer types; Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, Wyczalkowski MA, Liang WW, Zhang Q, McLellan MD, Sun SQ, Tripathi P, Lou C, Ye K, Mashl RJ, Wallis J, Wendl MC, Chen F, Ding L; Nat Genet 2016 Aug;48(8):827-37

hotspot3d's People

Contributors

adamds avatar amilacsw avatar beifang avatar envest avatar kuanlinhuang avatar mhbailey avatar rmashl avatar shoy91 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hotspot3d's Issues

request for prioritization.tar.gz (syn9704853)

Hello, Can you provide prioritization.tar.gz (syn9704853)? It is much more convent for users to download one file from synapse. And files in syn9705109 have both compressed gz file and uncompressed file, so I have to download them both when I use synapserutils::syncFromSynapse(' syn9705109 ') . Actully, the gz file is much smaller than the uncompressed files, I think you may use two dir to store the gz file and uncompressed files separately and download gz files only can save much time. I have struggled with this issue for several days and the download is fair with uncertain reason.

Hope your advice!

Issue with sigclus

Hello,
I've been trying to run 'sigclus' with the demo data provided as well as the pre-processed proximity files. However - I've run into a number of issues.
Firstly, if you try running sigclus with the pre-processed proximity files, you get an error like:
Can't use an undefined value as an ARRAY reference at /home/daniel/perl5/lib/perl5/TGI/Mutpro/Main/Significance.pm line 461
Which when digging into looks like Significance.pm assumes a single entry for the final column in the proximity file, but there can sometimes be multiple entries.
If you force there to be one entry in the final column, and re-run sigclus one gets weird behaviour:
Half the time you'll get an error like 'Illegal division by zero at /home/daniel/perl5/lib/perl5/TGI/Mutpro/Main/Significance.pm line 504.'
And the other half of the time the program will spit out p-values for some of the clusters, but not others.
Most concerningly, if you try running 'sigclus' enough times, it will eventually spit out results - the error behaviour is not consistent.

Is this just a consequence of this function being in 'beta' mode currently? Or am I doing something wrong?

Homology modeled structure?

Is it possible to read in a homology modeled structure that is not in RCSB pdb using the

-pdb-file-dir

argument?

Thanks

Possible p-value bug

$ hotspot3d

Version: V0.6.3

$ hotspot3d search --maf-file=$maf --prep-dir=$dir --output-prefix=$prefix 1>t.out 2>t.err

resulted in a lot of

Use of uninitialized value $threed_cut in numeric le (<=) at /mypath/perl5/lib/perl5/TGI/Mutpro/Main/Proximity.pm line 502, line 2835.

Observed in Proximity.pm that both keys 'pvalue_cutoff' and 'p_value_cutoff' are used.
This patch works for me:

--- Proximity.pm.orig   2016-10-04 16:14:08.000000000 -0500
+++ Proximity.pm    2016-10-09 21:06:56.707320000 -0500
@@ -63,13 +63,13 @@
     );
     if ( $help ) { print STDERR help_text(); exit 0; }
     unless( $options ) { die $this->help_text(); }
-   if ( not defined $this->{'p_value_cutoff'} ) {
+   if ( not defined $this->{'pvalue_cutoff'} ) {
         if ( not defined $this->{'3d_distance_cutoff'} and not defined $this->{'pvalue_cutoff'} ) {
             warn "HotSpot3D::Cluster warning: no pair distance limit given, setting to default p-value cutoff = 0.05\n";
-            $this->{'p_value_cutoff'} = $PVALUEDEFAULT;
+            $this->{'pvalue_cutoff'} = $PVALUEDEFAULT;
             $this->{'3d_distance_cutoff'} = $MAXDISTANCE;
         } else {
-            $this->{'p_value_cutoff'} = 1;
+            $this->{'pvalue_cutoff'} = 1;
         }
     } else {
         if ( not defined $this->{'3d_distance_cutoff'} ) {

url?

Hi,

I am trying to run the demo code

hotspot3d uppro --measure average --output-dir prep/ --pdb pdb/ --gene-file demo.maf --parallel local --max-processes 6 1>demo.uppro.out 2>demo.uppro.err

and it gave the error:

HotSpot3D::HugoGeneMethods::makeHugoGeneObjects ERROR: no data from url request.

Any ideas how to fix this?

Thanks

How long does Drugport pasre when we are running this step

if the net is unstable, always running or interrupt. But we want to quickly acquire drug-parsing data, do you have better methods to solve time ? we try to acquire drug and pdb files data from reptile ,but it require about 18G space, if we change drug and pdb , could need download repeatedly , do you have better methods to solve the problem of downloading repeatedly ?
Thank you

Summary Bug

Hi,

When I try to use the summary command with a cluster file, it raises me the following error "Not a valid clusters file!". I think it's because of the column "Weight" that in the previous versions was called "Recurrence", because when I only change the name Weight to Recurrence I can normally run the command.

sub readClustersFile {
	my ( $this ) = @_;
	my $infh = new FileHandle;
	unless( $infh->open( $this->{'clusters_file'} , "r" ) ) { die "Could not open clusters file $! \n" };
	my @cols;
	while ( my $line = <$infh> ) {
		chomp( $line );
		if ( $line =~ /^Cluster/ ) {
			my $i = 0;
			my %cols = map{ ( $_ , $i++ ) } split( /\t/ , $line );
			unless( defined( $cols{"Cluster"} )
				and defined( $cols{"Gene/Drug"} )
				and defined( $cols{"Mutation/Gene"} )
				and defined( $cols{"Degree_Connectivity"} )
				and defined( $cols{"Closeness_Centrality"} )
				and defined( $cols{"Geodesic_From_Centroid"} )
				and defined( $cols{"Recurrence"} ) ) {
				die "Not a valid clusters file!\n";

Thanks for the work.

Best,
Luís Nunes

Use GRCh38

Hello,

I would like to use the new transcripts for GRCh38, do you know if it is possible by changing in the Trans.pm the download path or I would need more changes?

Thank you

Cluster Question

Good afternoon,

I don't know if it's already possible to do in your algorithm, but I was thinking if we can cluster by protein feature. For example, output clusters that are associated with the active site of a protein. As you already use the annotation from Uniprot, maybe it could be possible to use them.

Thanks.

Best,
Luís

Adding a runnable demo

Please add a runnable demo including an example data to show how to use hotspot3d. Thank you.

Webservice not working?

Hi,

Just following up on the issue I raised under Issue #26.

I'm trying to run this mutation file (tab delimited .txt in mutation file format) on your website but it is saying "Submission failed" every time I try to submit.
concatenated_APP_gene_maf.txt

I also saved the Demo Data into a text file and tried to run it. Also didn't work.
demo_maf.txt

I believe my files contain data in the right format .Could you assist us in processing these files/resolve this issue? Thanks so much.

hotspot3d search question

Hello,
I used my maf file to run "hotspot3d search" ,and
my cmd line:

~/perl5/bin/hotspot3d search --maf-file my.maf --drugport-file=drugport_results --p-value-cutoff 0.05 --3d-distance-cutoff 10 --transcript-id-header Transcript_ID --amino-acid-header HGVSp_Short --prep-dir preprocessing_dir/

But the output :
......
Argument "208B" isn't numeric in addition (+) at /home/perl5/lib/perl5/TGI/Mutpro/Main/Proximiy.pm line 691, line 2605.
Argument "208B" isn't numeric in addition (+) at /home/perl5/lib/perl5/TGI/Mutpro/Main/Proximiy.pm line 691, line 2606.
searching done...
Creating 3D_Proximity.pairwise
Creating 3D_Proximity.cosmic
Creating 3D_Proximity.roi
Could not open drug target output file

Could you tell me how to solve this problem?
(I have tried running hotspot3d correctly for two weeks.)
Thanks.

PyMOL question : Is pymol show the changed structure or original structure if mutations in the clusters

Good morning,

   I have a question about pymol. I understand the how hotspot3d predict the clusters. But when I read a paper “Pathogenic Germline Variants in 10,389 Adult Cancers”,Figure 7B in the paper show the protein structures gene RET and MET. All the mutations are very close to each other. As far as I know those mutations are not clustered in the original proteins. I want to know if the pymol can show the changed structure when mutation influence the protein 3d structure.

Thanks.

Best,
Weifeng

hotspot3d search doesn't work

I tried to run the hotspot3d as the README_demo.
cmd line:
hotspot3d search --maf-file /Demo/demo.maf --prep-dir/Hotspot3D/ --3d-distance-cutoff 10 1>demo.search.out 2>demo.search.err
error:Could not open sites file
As the manu of hotspot3D, maf file and sites-file is required at least one. So I want to know why it dosesn't work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.