gifford-lab / gem3 Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 5.0 64.27 MB

GEM software suite (GEM, GPS, KMAC, KSM, RMD, etc.)

HTML 0.19% Java 97.51% Scheme 0.11% Perl 2.13% Shell 0.05%

gem3's People

Contributors

Stargazers

Watchers

Forkers

skchronicles rakarnik jdalcin ciaranomara fengpku

gem3's Issues

GEM not producing X_All_Read_Distributions.png

Hello,

So I'm been trying to run GEM on my dataset and it's not producing that particular plot (X_ALL_Read_distributions.png).
Here is the code that I used

java -Xmx10G -jar gem.jar --d Read_Distribution_default.txt --g mm10.chrom.sizes --genome ../mm10 --expttest ../../2020-126__ChIP-Seq_Insm1-Neurod1-Pax6_Magnuson_Dudek/05-samtools/3600-KD-0006/Aligned.out.filtered.sort.bam --ctrltest ../../2020-126__ChIP-Seq_Insm1-Neurod1-Pax6_Magnuson_Dudek/05-samtools/3600-KD-0009/Aligned.out.filtered.sort.bam --f SAM --k_min 6 --k_max 13 --seed 1000

Please let me know if you need additional information to pinpoint the issue.

Error running GEM

Hello,
After running several GEM jobs I received the following error and stops running:

IP: 16932832 Ctrl: 16296675 IP/Ctrl: 1,04

Sorting reads and selecting enriched regions ...
java.lang.ArrayIndexOutOfBoundsException: -1
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.calcSlope(KPPMixture.java:3078)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.getSlope(KPPMixture.java:3057)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.calcIpCtrlRatio(KPPMixture.java:1995)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:360)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:166)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)

I must say that I got this error while running with two particular BAMs but not with some other BAMs using the same command, since with the latter I got nice data.
I’d appreciate any advice.
Regards,

Error: Exception in thread "main" java.lang.NullPointerException

Hi, I am running gem with the S.irio genome with this command

java -Xmx10G -jar Read_Distribution_default.txt --s genome size readdistribution.txt --genome genome --expt $file.sam --f SAM --k_min 6 --k_max 20 --k_seqs 600 --k_neg_dinu_shuffle --t $threads --outNP

and got this error:

Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.SAMReader.countReads(SAMReader.java:86)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)

the Si_genome.chrom.size.txt looks like this:

Please help! Thanks!

How to get the coordinates from KSM results

After I run java -Xmx10G -jar ~/opt/gem.v3.4/gem.jar KSM --fasta test.pk.fa --g ~/opt/gem.v3.4/hg19.chrom.sizes --ksm test.ksm.lst --out test.pk.gemscan

I got "test.pk.gemscan.motifInstances.txt", however, I failed try to convert the results back to the coordinates by the position stored in fasta file.

For instance:

chr1:6259452-6259796(.)
TACAACGTGCTGGGATCGGCAGGGGCTCTGGCCCGCTCCGGCCGACCTGCCAGCCCACCCCAGCAGGACGCTGCAGGGCGCCGTCCCCAGCGAGCCTGGGTAGATGCCGGGCTCGGCGAGGCCCACGTGCCTCCCCTGGAGCCGAGGCCTCACGCGGAGCCATACTAACCACAGGAGCCATGGCGGCAGCGGAGTTAGAAAGGGAGGTGAGCGAACTACGCAGACGCAAAGAGCCCGCAGCGCGCAAGGCACGCAGGGTCCAGGCCGCACTAATCACTTTGCCACGCCCCTCGTCCGCCACCTTTTCTCTTGGTTATGTACGATAGGGGAGCGATTGGTTTTTC

got motifs at the same location for two motifs

Motif SeqID Motif_Name SeqName Match SeqPos Coord Strand Score
0 0 noc_gem_test_2.m0_c1723 chr1:6259452-6259796(.) ATCCCAGCAC:ATCCCAGC,ATCCNAGCA,TCCCAGCA,ATCCCAGNA,ATCCCAGNNC, 331 1:6259812:- - 1.24
1 0 noc_gem_test_2.m1_c1799 chr1:6259452-6259796(.) ATCCCAGCAC:ATCCCA,CAGCAC, 330 1:6259794:- - 1.37

But one SeqPos is 330 while another is 331.

their offset values in the KSM motif file seems the same for ATCCCAGCAC as 2.

so how exactly I could find out how to get the correct coordinates as both should be:

chr1:6259459-6259468

I attached the files for your references.

Thank you!

x.zip

NullPointerException error message

Hello,
I received the following error message when I ran a GEM job:

	Exception in thread "main" java.lang.NullPointerException
		at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.plotAllReadDistributions(KPPMixture.java:3248)
		at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:269)
		at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)

Does anyone know why I received this message?

It is worth noting that I ran 3 GEM jobs and I received this error message for 2 of the jobs. These 2 jobs also produced warning messages that the read distribution was not updated because too few significant binding events were identified. These 2 jobs also did not generate a "Finished" message in the standard output; whereas, the 3rd job (the job without any error messages) did completely finish.

Thanks a lot,
Gavriel

Exception in thread "main" java.lang.NullPointerException: for ChIP-nexus/ChIP-exo BAM file

Hi,

I am trying to run the following command on the ChIP-nexus/ChIP-exo BAM file.
java -jar -Xmx400G ~/bin/gem/gem.jar --expt nexus_filtered_combined.bam --d ~/bin/gem/Read_Distribution_ChIP-exo.txt
--g ~/bin/gem/mm10.chrom.sizes --f SAM --out test --outNP --mrc 150

I get the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.SAMReader.countReads(SAMReader.java:86)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)

I used --mrc 150 because we have our pipeline to remove duplicates.

What is it that I may be doing wrong?

comparing two KSMs

Thanks for developing this package I found KSM representation is really superior to the traditional PWMs. However, I wonder if there is a tool to compare two KSMs calculating their similarity? Previously when using PWMs, I can easily calculate a Pearson correlation efficient or distance score between any pair of PWMs, but with KSMs I'm just not sure how to do that ? Any thoughts?

Loss of events when processing BED files in GEM for RMD input

Hello,

I am trying to use GEM and RMD to obtain co-binding matrices from ATAC-seq data.
I used HINT to obtain my TF binding calls, and am having the following issues:

When trying to run RMD using BED-format peaks I get the following error:
TF#0: loading ASCL1_O_LO_mpbs.bedException in thread "main" java.lang.NumberFormatException: For input string: "Start"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
at java.base/java.lang.Integer.parseInt(Integer.java:658)
at java.base/java.lang.Integer.parseInt(Integer.java:776)
at edu.mit.csail.cgs.deepseq.utilities.CommonUtils.load_narrowPeak(CommonUtils.java:160)
at edu.mit.csail.cgs.deepseq.analysis.TFBS_SpaitialAnalysis.loadBindingEventsSimple(TFBS_SpaitialAnalysis.java:341)
at edu.mit.csail.cgs.deepseq.analysis.TFBS_SpaitialAnalysis.main(TFBS_SpaitialAnalysis.java:132)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:327)

I therefore decided to process my BED files with GEM. This way, RMD proceeds correctly, but during the GEM step, I lose about an order of magnitude number of calls. For instance, my most prominently bound factor goes from 15k calls in the HINT prediction to file to 1800 after processing the BED file through GEM.

Ideally, I would like to keep the most events possible for the analysis.

Any help will be greatly appreciated.

Kind regards,

Ricardo

GEM doesnt produce any graphical output

Hi,
I am using chipexo data to analyze BAM files.
I get an error at the nd of the run:
Loading genome sequences ...
/cluster/genomes/Genome is not a valid path. Use default path.
chr6_dbb_hap3.fa genome sequence file is not found at directory /cluster/genomes/Genome_chrfa_only.

the log file reads like this :
2019-10-16 15:07:45
Options:
--d Read_Distribution_ChIP-exo.txt --g hg19.chrom.sizes -- hg19/ --expt unique.bam --f BAM --out peaks --k_min 6 --k_max 36 --smooth 3 --mrc 20
Original read count stats:
_IP Bases: 2419980 HitCounts: 1.8277762E7
At Poisson p-value 1.0e-03, in a 301bp window, expect 3 reads.
The genome is segmented into 191530 regions for analysis.
Running with 8 threads ...

Events discovered
Significant: 53487
Insignificant: 10344
Filtered: 1618
Finish binding event prediction: 1.6m
Refine read distribution from 2000 binding events.
Running with 8 threads ...

Events discovered
Significant: 30452
Insignificant: 54181
Filtered: 4569
Finish binding event prediction: 31.5s
Refine read distribution from 2000 binding events.

The protein for which I am analyzing the data is 36 base pairs in binding.

Can you kindly help as to where I am going un-prescribed.
best regards.
Amit

Exception in thread "main" java.lang.NullPointerException: Can't have a null genome

Hi there

I am running a KMAC followed by KSM analysis on DAPseq data.
I can generate the kmer set motif using KMAC using sequences correspnding to DAP-seq peaks
but then when I try to search the KSM on these same sequences I run into the error as below.

Please help me :)
Romain

Error:

Scanning KSM motifs ...
... iswap_spe_w100k10k30_0.m0 ...
... iswap_spe_w100k10k30_0.m1 ...
... iswap_spe_w100k10k30_0.m2 ...
... iswap_spe_w100k10k30_0.m3 ...
... iswap_spe_w100k10k30_0.m4 ...
... iswap_spe_w100k10k30_0.m5 ...
... iswap_spe_w100k10k30_0.m6 ...
... iswap_spe_w100k10k30_0.m7 ...
... iswap_spe_w100k10k30_0.m8 ...
... iswap_spe_w100k10k30_0.m9 ...
Note: for motif instances on the minus strand, the SeqPos is the position on the reverse compliment of the input sequence.
Exception in thread "main" java.lang.NullPointerException: Can't have a null genome
at edu.mit.csail.cgs.datasets.general.Region.(Region.java:134)
at edu.mit.csail.cgs.datasets.general.Region.fromString(Region.java:695)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.findMotifInstances(MotifScan.java:254)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.main(MotifScan.java:32)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:315)

My commands are:

java -Xmx8G -jar ${GEM}/gem.jar KMAC
--pos_seq $POS
--k_win 100
--k_min 10
--k_max 30
--k_top 10
--out_name $RUN
--print_aligned_seqs ON
--gc -1
--neg_seq $NEG

java -Xmx8G -jar ${GEM}/gem.jar KSM --fasta $POS --ksm ksm_list.txt --out $RUN.KSM.scan

comparing KSM motif scan scores with PWM and TFFM scores in a ROCs

Hi, I am comparing KSM scores with scores of a single PWM matrix and a single TFFM matrix in a ROC for a given plant TF.
I am not sure how to deal with the scores output in the KSM motif scan output. I did as follow:
For each searched sequences (for a positive and for a negative test set) I collect scores for all kmer in all motifs (KMAC detected 13 motif in the training set, n=800, DAPseq peaks).
then I take either the sum of all scores for a given searched sequence, or the mean or just keep the best.
Then I compare the curve and AUC for the KSM, PWM and TFFM (see plots below).
My first question is: is it correct to compute the sum, mean or best as I do?
Second question: is it correct to then compare these sum, mean or max KSM score against the PWM or TFFM scores obtained with a single matrix?

GC content of genome reported by GEM

Hello,
I am running GEM on my data and I have a question regarding a section of the standard output from the GEM job that I ran.

The relevant section is as follows:

Loading genome sequences ...
Done, 2.3m
GC content=0.41

Use di-nucleotide shuffled sequences as negative sequences.

Estimated GC content is 0.43. Set [--gc -1] to use the estimated GC content.
Provided GC content is 0.41.

My questions:

When GEM says "GC content=0.41”, is GEM referring to the GC content across all the genome sequences I read in? If so, then that is an error because I know the GC content of the genome I work on is 0.56 (i.e. 56%).
When GEM uses di-nucleotide shuffled sequences as negative sequences, is it shuffling sequences across the whole genome? Are these shuffled sequences the same length as the positive sequences (i.e. 61 bp)? Is the "Estimated GC content is 0.43" line referring to the estimated GC content of the shuffled sequences?

Thank you very much,
Gavriel

How to interpret KMAC output

Hello,

A few questions about KMAC output for ChIP-seq data and apologies if these have been answered elsewhere. I have 5000 top, IDR-filtered ChIP-seq hits for a TF of interest and 5000 negative regions that are negative for that TF but are accessible as measured by ATAC-seq and active as measured by H3K27ac in my tissue of interest (see image below):

How do I interpret the motif spatial distributions? Are these results implying that the second motif is often found in the reverse complemented orientation 2 base pairs upstream from the first motif? They look like they may be connected/part of a larger motif (especially the the first few k-mers for the second motif), is this a reasonable interpretation?

Why are the PWM hit ratios so much worse than the KSM ratios? Does it even make sense to use the PWMs to identify TF identities using STAMP if those PWMs appear more frequently in the negative regions than positive?

How do you recommend I draw a reasonable threshold for what motifs are significant? AUC > ?

Thanks for your time,

How to build gem.jar from GitHub?

I am playing with some of the code and would like to rebuild the gem.jar file. I could not find the right Ant target to use. Could you please give me some pointers on how to build the JAR? Thanks!

Can KMAC discover motifs from SELEX 20/40bp data?

Hi,

I do SELEX on transcriptions factors and I typically discover motifs via MEME using enriched sub-sequences (10bp/12bp). Can KMAC discover motifs in 20bp (or 40bp) SELEX data?

thanks much,
jose

Dockerfile for CID/MICC

Hi,

I made a Dockerfile for CID/MICC and it's now on dockerhub.
I couldn't find a license file of your software, is it ok to open it to the public?

Thanks,

Error: java.lang.NullPointerException

Hi,
I am running GEM with my chip-exo data and my organism is Candida glabrata.

The code I was using is:
java -Xmx10G -jar gem.jar --d Read_Distribution_ChIP-exo.txt --smooth 3 --g cglabrata.chrom.sizes --genome /projects/academic/lrusche/chip_exocg/GCA_014217725.1_ASM1421772v1_genomic.fna --s 12300000 --expt Hst1_1.bed --ctrl No_tag.bed --f BED --out gem_Hst1_1 --nrf --outBED --k_min 6 --k_max 13

However, I got this error:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.BEDFileReader.countReads(BEDFileReader.java:124)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)

Could you please tell me how to fix it? Thank you!

error running GEM

Hi, I am running gem with the S.irio genome with this command
java -Xmx10G -jar gem.jar --d Read_Distribution_default.txt --g mm8.chrom.sizes --genome /home/scc/software/biosoft/data/bowtie2/hg19.idx --s 2000000000 --expt /home/liusiyu/S1.sam --ctrl /home/liusiyu/input.sam --f SAM --out /home/liusiyu/S1_GEM --k_min 6 --k_max 13
and got this error:
java.io.FileNotFoundException: GEM_Log.txt (Permission denied)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:158)
at java.base/java.io.FileWriter.(FileWriter.java:82)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)

Options:
--d Read_Distribution_default.txt --g mm8.chrom.sizes --genome /home/scc/software/biosoft/data/bowtie2/hg19.idx --s 2000000000 --expt /home/liusiyu/S1.sam --ctrl /home/liusiyu/input.sam --f SAM --out /home/liusiyu/S1_GEM --k_min 6 --k_max 13
java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.log(KPPMixture.java:3661)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:197)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:166)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)
Please help! Thanks!

License info for GEM

Hi,

I contribute to bioconda, https://bioconda.github.io, and I would like to create a package for "gem" to ease its distribution. However, in order to create the recipe I need an information for the License, and if possible, License file. Can you please provide this info?

Thank you,
Natasha

An issue about the multiple controls

Thank you for this algorithm that can analyze the ChIP-like datasets, such as ChIP-seq, ChIP-exo and even DAP-seq. While using GEM to perform peak calling for my DAP-seq results, I am curious about if it is allowed to apply multiple negative controls as the multi-condition and the multi-replicates do mentioned in the website (https://groups.csail.mit.edu/cgs/gem/#readdistrib). And If the multi-controls is accepted, then I am curious that if there is some statistical calculation to support the control used in the peak-calling process. Basically, I am wondering how the algorithm runs with the negative control when doing peak calling. It would be extremely nice to have your suggestions and some recommended references for a deeper understanding of the algorithm.

Peak recall consistency over multiple runs?

Hi
Ive tried running GEM over a few datasets now and what ive noticed is the peak set returned for a particular experiment changes with each run, even when running GEM with the exact same parameter flags and values. The recalled set doesnt change by much but it is different with each run.

Is this an expected result? If so, then is there any way to modify the behavior of GEM to ensure it gives the exact same result each time its run over an experiment?

KMAC not recognizing --neg_seq option

Hello,

I am trying to include a set of negative sequences in my KMAC run using the --neg_seq option, as in the command listed below:

java -Xmx8G -jar gem.jar KMAC --pos_seq peaks_forkmac.fa --neg_seq background_peaks_forkmac.fa --k_win 500 --k_min 5 --k_max 13 --k_top 10 --out_name kmac_peaks_kwin500_kmin5_kmax13_vsbg

However, it is not recognizing this file and is instead still using the default set of di-nucleotide shuffled sequences as negative sequences. The first several lines of the output are below:

KMAC motif discovery (version 1.3)

Options:
KMAC --pos_seq peaks_forkmac.fa --neg_seq background_peaks_forkmac.fa --k_win 500 --k_min 5 --k_max 13 --k_top 10 --out_name kmac_peaks_kwin500_kmin5_kmax13_vsbg

Loaded 36718 input positive sequences.
Use di-nucleotide shuffled sequences as negative sequences.

Am I using this option correctly? Thank you in advance for your help.

KSM output usage questions

We are looking for known motifs/binding sites that are enriched in one set of genomic sequences versus another. After running your KSM analysis using hundreds of known motifs, we have a VERY large matrix of motifs and match instances.

1- We would like to know how you recommend proceeding to measure the frequency of motif matches, and then enrichment.

2- Based on your descriptions in the paper (Guo et al., Genome Research, 2018), is it correct that the number of unique SeqIDs in the second column of the output file for each motif would be the Kmer group hit count for that motif? In other words, would that be the number of places that motif was found in the query sequence?

3- Also, we understood from the paper that the last "Score" column in the output matrix is the odds ratio of that motifs appearance in the query sequence versus the negative training sequences. But, how would you recommend using this score to sort the motif match instances for those that are most significant? For example, how should we decide on a cut off?

4- We are also interested in how your algorithm can detect positional dependencies of motifs.
How can we extract the most frequent flanking Kmers for any given motif?

Thank you!

Commas instead dots

Dear GEM3 developer,

I'm facing a problem with KMAC and KSM. It seems that KMAC generates KSM motifs with commas instead of dots for:

the fourth row of the header
the values in column HgpLg10
the values list at the end of the file

When I try to run KSM on a KMAC output
java -jar ~/gem/gem.jar KSM --fasta ES_Oct4_61bp.fasta --ksm Oct4.ksm_list.txt --out Oct4.KSM.scan

java returns the following error:

Exception in thread "main" java.lang.NumberFormatException: For input string: "3,10"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at edu.mit.csail.cgs.deepseq.discovery.kmer.GappedKmer.loadKSM(GappedKmer.java:313)
at edu.mit.csail.cgs.deepseq.discovery.kmer.GappedKmer.loadKSM(GappedKmer.java:270)
at edu.mit.csail.cgs.deepseq.utilities.CommonUtils.loadKsmFile(CommonUtils.java:800)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.findMotifInstances(MotifScan.java:180)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.main(MotifScan.java:32)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:315)

This is true for both Java 1.7 and 1.8.

Here, the first lines of the KSM m0:

#Oct4.m0
#KSM version: 1
#5000/5000
#3,10
#k-mer/r.c. Offset PosCt wPosCt NegCt HgpLg10 CIDs PosHits (base85 encoding) NegHits (base85 encoding)
ATGCAAA/TTTGCAT 1 529 529 61 -113,4 * "9Ouf2^CY>4An'M.q7d0..h4N!)?Fl&QU1%L2uE?i^;'&VL8q!('/**(g>>"9<rP,m+/^$j&[J!!!R-!XM%VTRafXYQ+ZV!*fXP&-/J5!'h8*"Ft.e^]4?G!!=VL"<_(P#m^MI!YSodJ3a/E!<<\)!!!!a!!!Q1!._ibJIMm8!!!!1JI2AW+@/Oq!!iW<"9\_?!!<3d+Ue>^!!!-%#ljr+5QhX[!/)JN!!!!a!$DPT!!:RK&cht46jNdc!!!!-!!"5DJcGiR![7UE!!u0r!0@53!5JR7QiROj!!E9E!<<*%TKiJ[J7JpB!'gYe!'gMaz#QOi)!!!$E!!#h,!!#gs!WW3#"9Lgg!.Y%L![7XF5QCcq!)NY"z!!!-%!<H(!!E9%(]XO9!!!Q1!!iiE!!!-&&BFhOTE##n#QOi*!$G5@"A'^9!!'!#PeL!!!!"!.Y%L#QP,1!WW3#!/1E'!!!!Iz!$DmS!<E0#z#QOj5J,k*"!WX>C!!3-##QOo+Jj9;9!!!!1!'hZW!"csH%MJgF!!!$"&-)\1+96p'!!!!#'EAsM+:&5Q(dJ(Q+92ZI#XAA4!!#OkJ,fS2z63%&u!!3-#!!!''!'gMa!0@0!!3-#J,oWMJ.R52z!!!"L#QOjTJ-5iP0E?V'!!3-#!!!!%!!!"Lz!.Y%L!!!!A!$D7B!!!!)!"^7Q!<Q't!!!!A!.Y%L!13dz!!%NM!!!!"!!"-,!!!!1&:a!<@cqz#Q~ zz!"],A!!!-%!!E9&z!!!!%!!!"Lz!!E9%zz!!!!#z!!Ei5zJ,fQLzz!!!!a!"],1zz!!39'zzz!!#7az!<<*"!!!!bzzzz"9Ac.!!!9)z!!!Q1z"98EEzzzzzzz!!!'#!!!'#z!!!-%zz!!!'#!!!$"!!!!A!!!9)z!!!!"!'gMazz!!iRTzzzzzz!!!!%zzz!!iQ)z!"],1zz#QOi)z!!!Q1z!!!!)zzzz#QTATzzzzz!!!!#zz!!!!"zzzzz&-)\3#QOi)!$D7Azzzzzzzzzzz!!!9)!!!!1zzzzzzzzzzz!!!$"zzzzz!!!-%zzzzz#QOi)zzzzz!.Y(Mz!!!!#!!!!a~
ATGCNAAT/ATTNGCAT 1 433 467 19 -118,5 0,-1 N.A. N.A.
ATGCANAT/ATNTGCAT 1 402 444 19 -111,5 0,2 N.A. N.A.
TATGCANA/TNTGCATA 0 367 408 26 -93,5 -3,4,5 N.A. N.A.
ATGNAAAT/ATTTNCAT 1 372 401 15 -103,0 6,0 N.A. N.A.

Moreover, I was wondering if it could be possible to save the aligned sequences (with N) that appear in the 'Aligned bound sequences' heatmap in an additional text file.

Otherwise, KMAC is one of the most convincing motif discovery software. I really appreciate that it is able to identify motifs so quickly on ChIP-seq/DAP-seq/Genome data.
Best,
Simon

--micc parameter

Hello

I am running CID with different settings of '--micc'

On my data I get the same number of interaction with --micc 2 or 3 or 4 or 5 (but lower than not setting --micc at all, which I think uses the default value of 1).
In all the above cases, the lowest value in column 7 is "2"

EDIT I tried with --micc 1,2,4,5,8,10,15,20, all output with --micc different than 1 are the same

Thank you

gem3 --subsf format question

Hi,
Thank you for the amazing tool. I'm a phd student learning to analyze chip-seq data.
I am trying to use --subsf to use only a subset of the genome to apply gem.

commandline used:
java -Xmx10G -jar /scratch/kimj50/ChIP-seq/GEM/gem/gem.jar
--d /scratch/kimj50/ChIP-seq/GEM/gem/Read_Distribution_default.txt
--g /scratch/kimj50/ChIP-seq/GEM/gem/ce10.chrom.sizes
--genome /scratch/kimj50/ChIP-seq/GEM/references
--f SAM
--exptN2 DPY-27_N2_emb_ext75_SE172_input_SE176.sam DPY-27_N2_emb_ext10_CJ132_input_CJ19.sam DPY-27_N2_emb_ext1_SE30_input_EE6.sam DPY-27_N2_emb_ext2_CJ39_input_CJ43.sam
--ctrlN2 input_SE176_N2_emb_ext75.sam input_CJ19_N2_emb_ext10.sam input_EE6_N2_emb_ext1.sam input_CJ43_N2_emb_ext2.sam
--out N2_groseq
--k_min 6 --k_max 13
--min 50
--subsf groseq_emb_X_GEMformat.bed

groseq_emb_X_GEMformat.bed: contains regions of chrX
chrX:323289-335587
chrX:371074-377975
chrX:380383-383809
....

.GEM_events output file contains regions that are not chrX, which makes me think I got the format wrong.
Thanks! - Jun

gifford-lab / gem3 Goto Github PK

gem3's People

Contributors

Stargazers

Watchers

Forkers

gem3's Issues

Events discovered Significant: 53487 Insignificant: 10344 Filtered: 1618 Finish binding event prediction: 1.6m Refine read distribution from 2000 binding events. Running with 8 threads ...

Events discovered Significant: 30452 Insignificant: 54181 Filtered: 4569 Finish binding event prediction: 31.5s Refine read distribution from 2000 binding events.

Recommend Projects

Recommend Topics

Recommend Org

Events discovered
Significant: 53487
Insignificant: 10344
Filtered: 1618
Finish binding event prediction: 1.6m
Refine read distribution from 2000 binding events.
Running with 8 threads ...

Events discovered
Significant: 30452
Insignificant: 54181
Filtered: 4569
Finish binding event prediction: 31.5s
Refine read distribution from 2000 binding events.