gifford-lab / gem3 Goto Github PK
View Code? Open in Web Editor NEWGEM software suite (GEM, GPS, KMAC, KSM, RMD, etc.)
GEM software suite (GEM, GPS, KMAC, KSM, RMD, etc.)
Hello,
So I'm been trying to run GEM on my dataset and it's not producing that particular plot (X_ALL_Read_distributions.png).
Here is the code that I used
java -Xmx10G -jar gem.jar --d Read_Distribution_default.txt --g mm10.chrom.sizes --genome ../mm10 --expttest ../../2020-126__ChIP-Seq_Insm1-Neurod1-Pax6_Magnuson_Dudek/05-samtools/3600-KD-0006/Aligned.out.filtered.sort.bam --ctrltest ../../2020-126__ChIP-Seq_Insm1-Neurod1-Pax6_Magnuson_Dudek/05-samtools/3600-KD-0009/Aligned.out.filtered.sort.bam --f SAM --k_min 6 --k_max 13 --seed 1000
Please let me know if you need additional information to pinpoint the issue.
Hello,
After running several GEM jobs I received the following error and stops running:
IP: 16932832 Ctrl: 16296675 IP/Ctrl: 1,04
Sorting reads and selecting enriched regions ...
java.lang.ArrayIndexOutOfBoundsException: -1
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.calcSlope(KPPMixture.java:3078)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.getSlope(KPPMixture.java:3057)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.calcIpCtrlRatio(KPPMixture.java:1995)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:360)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:166)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)
I must say that I got this error while running with two particular BAMs but not with some other BAMs using the same command, since with the latter I got nice data.
Iโd appreciate any advice.
Regards,
Hi, I am running gem with the S.irio genome with this command
java -Xmx10G -jar Read_Distribution_default.txt --s genome size readdistribution.txt --genome genome --expt $file.sam --f SAM --k_min 6 --k_max 20 --k_seqs 600 --k_neg_dinu_shuffle --t $threads --outNP
and got this error:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.SAMReader.countReads(SAMReader.java:86)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
the Si_genome.chrom.size.txt looks like this:
lcl|chrSI_scaffold1 72289
lcl|chrSI_scaffold2 42255
lcl|chrSI_scaffold3 19911
lcl|chrSI_scaffold4 116093
lcl|chrSI_scaffold5 264581...
Please help! Thanks!
After I run java -Xmx10G -jar ~/opt/gem.v3.4/gem.jar KSM --fasta test.pk.fa --g ~/opt/gem.v3.4/hg19.chrom.sizes --ksm test.ksm.lst --out test.pk.gemscan
I got "test.pk.gemscan.motifInstances.txt", however, I failed try to convert the results back to the coordinates by the position stored in fasta file.
For instance:
chr1:6259452-6259796(.)
TACAACGTGCTGGGATCGGCAGGGGCTCTGGCCCGCTCCGGCCGACCTGCCAGCCCACCCCAGCAGGACGCTGCAGGGCGCCGTCCCCAGCGAGCCTGGGTAGATGCCGGGCTCGGCGAGGCCCACGTGCCTCCCCTGGAGCCGAGGCCTCACGCGGAGCCATACTAACCACAGGAGCCATGGCGGCAGCGGAGTTAGAAAGGGAGGTGAGCGAACTACGCAGACGCAAAGAGCCCGCAGCGCGCAAGGCACGCAGGGTCCAGGCCGCACTAATCACTTTGCCACGCCCCTCGTCCGCCACCTTTTCTCTTGGTTATGTACGATAGGGGAGCGATTGGTTTTTC
got motifs at the same location for two motifs
Motif SeqID Motif_Name SeqName Match SeqPos Coord Strand Score
0 0 noc_gem_test_2.m0_c1723 chr1:6259452-6259796(.) ATCCCAGCAC:ATCCCAGC,ATCCNAGCA,TCCCAGCA,ATCCCAGNA,ATCCCAGNNC, 331 1:6259812:- - 1.24
1 0 noc_gem_test_2.m1_c1799 chr1:6259452-6259796(.) ATCCCAGCAC:ATCCCA,CAGCAC, 330 1:6259794:- - 1.37
But one SeqPos is 330 while another is 331.
their offset values in the KSM motif file seems the same for ATCCCAGCAC as 2.
so how exactly I could find out how to get the correct coordinates as both should be:
chr1:6259459-6259468
I attached the files for your references.
Thank you!
Hello,
I received the following error message when I ran a GEM job:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.plotAllReadDistributions(KPPMixture.java:3248)
at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:269)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)
Does anyone know why I received this message?
It is worth noting that I ran 3 GEM jobs and I received this error message for 2 of the jobs. These 2 jobs also produced warning messages that the read distribution was not updated because too few significant binding events were identified. These 2 jobs also did not generate a "Finished" message in the standard output; whereas, the 3rd job (the job without any error messages) did completely finish.
Thanks a lot,
Gavriel
Hi,
I am trying to run the following command on the ChIP-nexus/ChIP-exo BAM file.
java -jar -Xmx400G ~/bin/gem/gem.jar --expt nexus_filtered_combined.bam --d ~/bin/gem/Read_Distribution_ChIP-exo.txt
--g ~/bin/gem/mm10.chrom.sizes --f SAM --out test --outNP --mrc 150
I get the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.SAMReader.countReads(SAMReader.java:86)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
I used --mrc 150 because we have our pipeline to remove duplicates.
What is it that I may be doing wrong?
Thanks for developing this package I found KSM representation is really superior to the traditional PWMs. However, I wonder if there is a tool to compare two KSMs calculating their similarity? Previously when using PWMs, I can easily calculate a Pearson correlation efficient or distance score between any pair of PWMs, but with KSMs I'm just not sure how to do that ? Any thoughts?
Hello,
I am trying to use GEM and RMD to obtain co-binding matrices from ATAC-seq data.
I used HINT to obtain my TF binding calls, and am having the following issues:
I therefore decided to process my BED files with GEM. This way, RMD proceeds correctly, but during the GEM step, I lose about an order of magnitude number of calls. For instance, my most prominently bound factor goes from 15k calls in the HINT prediction to file to 1800 after processing the BED file through GEM.
Ideally, I would like to keep the most events possible for the analysis.
Any help will be greatly appreciated.
Kind regards,
Ricardo
Hi,
I am using chipexo data to analyze BAM files.
I get an error at the nd of the run:
Loading genome sequences ...
/cluster/genomes/Genome is not a valid path. Use default path.
chr6_dbb_hap3.fa genome sequence file is not found at directory /cluster/genomes/Genome_chrfa_only.
The protein for which I am analyzing the data is 36 base pairs in binding.
Can you kindly help as to where I am going un-prescribed.
best regards.
Amit
Hi there
I am running a KMAC followed by KSM analysis on DAPseq data.
I can generate the kmer set motif using KMAC using sequences correspnding to DAP-seq peaks
but then when I try to search the KSM on these same sequences I run into the error as below.
Please help me :)
Romain
Error:
Scanning KSM motifs ...
... iswap_spe_w100k10k30_0.m0 ...
... iswap_spe_w100k10k30_0.m1 ...
... iswap_spe_w100k10k30_0.m2 ...
... iswap_spe_w100k10k30_0.m3 ...
... iswap_spe_w100k10k30_0.m4 ...
... iswap_spe_w100k10k30_0.m5 ...
... iswap_spe_w100k10k30_0.m6 ...
... iswap_spe_w100k10k30_0.m7 ...
... iswap_spe_w100k10k30_0.m8 ...
... iswap_spe_w100k10k30_0.m9 ...
Note: for motif instances on the minus strand, the SeqPos is the position on the reverse compliment of the input sequence.
Exception in thread "main" java.lang.NullPointerException: Can't have a null genome
at edu.mit.csail.cgs.datasets.general.Region.(Region.java:134)
at edu.mit.csail.cgs.datasets.general.Region.fromString(Region.java:695)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.findMotifInstances(MotifScan.java:254)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.main(MotifScan.java:32)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:315)
My commands are:
java -Xmx8G -jar ${GEM}/gem.jar KMAC
--pos_seq $POS
--k_win 100
--k_min 10
--k_max 30
--k_top 10
--out_name $RUN
--print_aligned_seqs ON
--gc -1
--neg_seq $NEG
java -Xmx8G -jar ${GEM}/gem.jar KSM --fasta $POS --ksm ksm_list.txt --out $RUN.KSM.scan
Hi, I am comparing KSM scores with scores of a single PWM matrix and a single TFFM matrix in a ROC for a given plant TF.
I am not sure how to deal with the scores output in the KSM motif scan output. I did as follow:
For each searched sequences (for a positive and for a negative test set) I collect scores for all kmer in all motifs (KMAC detected 13 motif in the training set, n=800, DAPseq peaks).
then I take either the sum of all scores for a given searched sequence, or the mean or just keep the best.
Then I compare the curve and AUC for the KSM, PWM and TFFM (see plots below).
My first question is: is it correct to compute the sum, mean or best as I do?
Second question: is it correct to then compare these sum, mean or max KSM score against the PWM or TFFM scores obtained with a single matrix?
Hello,
I am running GEM on my data and I have a question regarding a section of the standard output from the GEM job that I ran.
The relevant section is as follows:
Loading genome sequences ...
Done, 2.3m
GC content=0.41
Use di-nucleotide shuffled sequences as negative sequences.
Estimated GC content is 0.43. Set [--gc -1] to use the estimated GC content.
Provided GC content is 0.41.
My questions:
When GEM says "GC content=0.41โ, is GEM referring to the GC content across all the genome sequences I read in? If so, then that is an error because I know the GC content of the genome I work on is 0.56 (i.e. 56%).
When GEM uses di-nucleotide shuffled sequences as negative sequences, is it shuffling sequences across the whole genome? Are these shuffled sequences the same length as the positive sequences (i.e. 61 bp)? Is the "Estimated GC content is 0.43" line referring to the estimated GC content of the shuffled sequences?
Thank you very much,
Gavriel
Hello,
A few questions about KMAC output for ChIP-seq data and apologies if these have been answered elsewhere. I have 5000 top, IDR-filtered ChIP-seq hits for a TF of interest and 5000 negative regions that are negative for that TF but are accessible as measured by ATAC-seq and active as measured by H3K27ac in my tissue of interest (see image below):
How do I interpret the motif spatial distributions? Are these results implying that the second motif is often found in the reverse complemented orientation 2 base pairs upstream from the first motif? They look like they may be connected/part of a larger motif (especially the the first few k-mers for the second motif), is this a reasonable interpretation?
Why are the PWM hit ratios so much worse than the KSM ratios? Does it even make sense to use the PWMs to identify TF identities using STAMP if those PWMs appear more frequently in the negative regions than positive?
How do you recommend I draw a reasonable threshold for what motifs are significant? AUC > ?
Thanks for your time,
I am playing with some of the code and would like to rebuild the gem.jar file. I could not find the right Ant target to use. Could you please give me some pointers on how to build the JAR? Thanks!
Hi,
I do SELEX on transcriptions factors and I typically discover motifs via MEME using enriched sub-sequences (10bp/12bp). Can KMAC discover motifs in 20bp (or 40bp) SELEX data?
thanks much,
jose
Hi,
I made a Dockerfile for CID/MICC and it's now on dockerhub.
I couldn't find a license file of your software, is it ok to open it to the public?
Thanks,
Hi,
I am running GEM with my chip-exo data and my organism is Candida glabrata.
The code I was using is:
java -Xmx10G -jar gem.jar --d Read_Distribution_ChIP-exo.txt --smooth 3 --g cglabrata.chrom.sizes --genome /projects/academic/lrusche/chip_exocg/GCA_014217725.1_ASM1421772v1_genomic.fna --s 12300000 --expt Hst1_1.bed --ctrl No_tag.bed --f BED --out gem_Hst1_1 --nrf --outBED --k_min 6 --k_max 13
However, I got this error:
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)
at edu.mit.csail.cgs.deepseq.utilities.BEDFileReader.countReads(BEDFileReader.java:124)
at edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)
at edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.(FileReadLoader.java:147)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:84)
at edu.mit.csail.cgs.deepseq.DeepSeqExpt.(DeepSeqExpt.java:78)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:126)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Could you please tell me how to fix it? Thank you!
Hi, I am running gem with the S.irio genome with this command
java -Xmx10G -jar gem.jar --d Read_Distribution_default.txt --g mm8.chrom.sizes --genome /home/scc/software/biosoft/data/bowtie2/hg19.idx --s 2000000000 --expt /home/liusiyu/S1.sam --ctrl /home/liusiyu/input.sam --f SAM --out /home/liusiyu/S1_GEM --k_min 6 --k_max 13
and got this error:
java.io.FileNotFoundException: GEM_Log.txt (Permission denied)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:158)
at java.base/java.io.FileWriter.(FileWriter.java:82)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Options:
--d Read_Distribution_default.txt --g mm8.chrom.sizes --genome /home/scc/software/biosoft/data/bowtie2/hg19.idx --s 2000000000 --expt /home/liusiyu/S1.sam --ctrl /home/liusiyu/input.sam --f SAM --out /home/liusiyu/S1_GEM --k_min 6 --k_max 13
java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.log(KPPMixture.java:3661)
at edu.mit.csail.cgs.deepseq.discovery.KPPMixture.(KPPMixture.java:197)
at edu.mit.csail.cgs.deepseq.discovery.GEM.(GEM.java:146)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)
Exception in thread "main" java.lang.NullPointerException
at edu.mit.csail.cgs.deepseq.discovery.GEM.runMixtureModel(GEM.java:166)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:345)
Please help! Thanks!
Hi,
I contribute to bioconda, https://bioconda.github.io, and I would like to create a package for "gem" to ease its distribution. However, in order to create the recipe I need an information for the License, and if possible, License file. Can you please provide this info?
Thank you,
Natasha
Thank you for this algorithm that can analyze the ChIP-like datasets, such as ChIP-seq, ChIP-exo and even DAP-seq. While using GEM to perform peak calling for my DAP-seq results, I am curious about if it is allowed to apply multiple negative controls as the multi-condition and the multi-replicates do mentioned in the website (https://groups.csail.mit.edu/cgs/gem/#readdistrib). And If the multi-controls is accepted, then I am curious that if there is some statistical calculation to support the control used in the peak-calling process. Basically, I am wondering how the algorithm runs with the negative control when doing peak calling. It would be extremely nice to have your suggestions and some recommended references for a deeper understanding of the algorithm.
Hi
Ive tried running GEM over a few datasets now and what ive noticed is the peak set returned for a particular experiment changes with each run, even when running GEM with the exact same parameter flags and values. The recalled set doesnt change by much but it is different with each run.
Is this an expected result? If so, then is there any way to modify the behavior of GEM to ensure it gives the exact same result each time its run over an experiment?
Hello,
I am trying to include a set of negative sequences in my KMAC run using the --neg_seq option, as in the command listed below:
java -Xmx8G -jar gem.jar KMAC --pos_seq peaks_forkmac.fa --neg_seq background_peaks_forkmac.fa --k_win 500 --k_min 5 --k_max 13 --k_top 10 --out_name kmac_peaks_kwin500_kmin5_kmax13_vsbg
However, it is not recognizing this file and is instead still using the default set of di-nucleotide shuffled sequences as negative sequences. The first several lines of the output are below:
KMAC motif discovery (version 1.3)
Options:
KMAC --pos_seq peaks_forkmac.fa --neg_seq background_peaks_forkmac.fa --k_win 500 --k_min 5 --k_max 13 --k_top 10 --out_name kmac_peaks_kwin500_kmin5_kmax13_vsbg
Loaded 36718 input positive sequences.
Use di-nucleotide shuffled sequences as negative sequences.
Am I using this option correctly? Thank you in advance for your help.
We are looking for known motifs/binding sites that are enriched in one set of genomic sequences versus another. After running your KSM analysis using hundreds of known motifs, we have a VERY large matrix of motifs and match instances.
1- We would like to know how you recommend proceeding to measure the frequency of motif matches, and then enrichment.
2- Based on your descriptions in the paper (Guo et al., Genome Research, 2018), is it correct that the number of unique SeqIDs in the second column of the output file for each motif would be the Kmer group hit count for that motif? In other words, would that be the number of places that motif was found in the query sequence?
3- Also, we understood from the paper that the last "Score" column in the output matrix is the odds ratio of that motifs appearance in the query sequence versus the negative training sequences. But, how would you recommend using this score to sort the motif match instances for those that are most significant? For example, how should we decide on a cut off?
4- We are also interested in how your algorithm can detect positional dependencies of motifs.
How can we extract the most frequent flanking Kmers for any given motif?
Thank you!
Dear GEM3 developer,
I'm facing a problem with KMAC and KSM. It seems that KMAC generates KSM motifs with commas instead of dots for:
When I try to run KSM on a KMAC output
java -jar ~/gem/gem.jar KSM --fasta ES_Oct4_61bp.fasta --ksm Oct4.ksm_list.txt --out Oct4.KSM.scan
java returns the following error:
Exception in thread "main" java.lang.NumberFormatException: For input string: "3,10"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at edu.mit.csail.cgs.deepseq.discovery.kmer.GappedKmer.loadKSM(GappedKmer.java:313)
at edu.mit.csail.cgs.deepseq.discovery.kmer.GappedKmer.loadKSM(GappedKmer.java:270)
at edu.mit.csail.cgs.deepseq.utilities.CommonUtils.loadKsmFile(CommonUtils.java:800)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.findMotifInstances(MotifScan.java:180)
at edu.mit.csail.cgs.deepseq.analysis.MotifScan.main(MotifScan.java:32)
at edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:315)
This is true for both Java 1.7 and 1.8.
Here, the first lines of the KSM m0:
#Oct4.m0
#KSM version: 1
#5000/5000
#3,10
#k-mer/r.c. Offset PosCt wPosCt NegCt HgpLg10 CIDs PosHits (base85 encoding) NegHits (base85 encoding)
ATGCAAA/TTTGCAT 1 529 529 61 -113,4 * "9Ouf2^CY>4An'M.q7d0..h4N!)?Fl&QU1%L2uE?i^;'&VL8q!('/**(g>>"9<rP,m+/^$j&[J!!!R-!XM%VTRafXYQ+ZV!*fXP&-/J5!'h8*"Ft.e^]4?G!!=VL"<_(P#m^MI!YSodJ3a/E!<<\)!!!!a!!!Q1!._ibJIMm8!!!!1JI2AW+@/Oq!!iW<"9\_?!!<3d+Ue>^!!!-%#ljr+5QhX[!/)JN!!!!a!$DPT!!:RK&cht46jNdc!!!!-!!"5DJcGiR![7UE!!u0r!0@53!5JR7QiROj!!E9E!<<*%TKiJ[J7JpB!'gYe!'gMaz#QOi)!!!$E!!#h,!!#gs!WW3#"9Lgg!.Y%L![7XF5QCcq!)NY"z!!!-%!<
H(!!E9%(]XO9!!!Q1!!iiE!!!-&&BFhOTE##n#QOi*!$G5@"A'^9!!'!#PeL!!!!"!.Y%L#QP,1!WW3#!/1E'!!!!Iz!$DmS!<E0#z#QOj5J,k*"!WX>C!!3-##QOo+Jj9;9!!!!1!'hZW!"csH%MJgF!!!$"&-)\1+96p'!!!!#'EAsM+:&5Q(dJ(Q+92ZI#XAA4!!#OkJ,fS2z63%&u!!3-#!!!''!'gMa!0@0!!3-#J,oWMJ.R52z!!!"L#QOjTJ-5iP0E?V'!!3-#!!!!%!!!"Lz!.Y%L!!!!A!$D7B!!!!)!"^7Q!<Q't!!!!A!.Y%L!13dz!!%NM!!!!"!!"-,!!!!1&:a
!<@cqz#Q~ zz!"],A!!!-%!!E9&z!!!!%!!!"Lz!!E9%zz!!!!#z!!Ei5zJ,fQLzz!!!!a!"],1zz!!39'zzz!!#7az!<<*"!!!!bzzzz"9Ac.!!!9)z!!!Q1z"98EEzzzzzzz!!!'#!!!'#z!!!-%zz!!!'#!!!$"!!!!A!!!9)z!!!!"!'gMazz!!iRTzzzzzz!!!!%zzz!!iQ)z!"],1zz#QOi)z!!!Q1z!!!!)zzzz#QTATzzzzz!!!!#zz!!!!"zzzzz&-)\3#QOi)!$D7Azzzzzzzzzzz!!!9)!!!!1zzzzzzzzzzz!!!$"zzzzz!!!-%zzzzz#QOi)zzzzz!.Y(Mz!!!!#!!!!a~
ATGCNAAT/ATTNGCAT 1 433 467 19 -118,5 0,-1 N.A. N.A.
ATGCANAT/ATNTGCAT 1 402 444 19 -111,5 0,2 N.A. N.A.
TATGCANA/TNTGCATA 0 367 408 26 -93,5 -3,4,5 N.A. N.A.
ATGNAAAT/ATTTNCAT 1 372 401 15 -103,0 6,0 N.A. N.A.
Moreover, I was wondering if it could be possible to save the aligned sequences (with N) that appear in the 'Aligned bound sequences' heatmap in an additional text file.
Otherwise, KMAC is one of the most convincing motif discovery software. I really appreciate that it is able to identify motifs so quickly on ChIP-seq/DAP-seq/Genome data.
Best,
Simon
Hello
I am running CID with different settings of '--micc'
On my data I get the same number of interaction with --micc 2 or 3 or 4 or 5 (but lower than not setting --micc at all, which I think uses the default value of 1).
In all the above cases, the lowest value in column 7 is "2"
EDIT I tried with --micc 1,2,4,5,8,10,15,20, all output with --micc different than 1 are the same
Thank you
Hi,
Thank you for the amazing tool. I'm a phd student learning to analyze chip-seq data.
I am trying to use --subsf to use only a subset of the genome to apply gem.
commandline used:
java -Xmx10G -jar /scratch/kimj50/ChIP-seq/GEM/gem/gem.jar
--d /scratch/kimj50/ChIP-seq/GEM/gem/Read_Distribution_default.txt
--g /scratch/kimj50/ChIP-seq/GEM/gem/ce10.chrom.sizes
--genome /scratch/kimj50/ChIP-seq/GEM/references
--f SAM
--exptN2 DPY-27_N2_emb_ext75_SE172_input_SE176.sam DPY-27_N2_emb_ext10_CJ132_input_CJ19.sam DPY-27_N2_emb_ext1_SE30_input_EE6.sam DPY-27_N2_emb_ext2_CJ39_input_CJ43.sam
--ctrlN2 input_SE176_N2_emb_ext75.sam input_CJ19_N2_emb_ext10.sam input_EE6_N2_emb_ext1.sam input_CJ43_N2_emb_ext2.sam
--out N2_groseq
--k_min 6 --k_max 13
--min 50
--subsf groseq_emb_X_GEMformat.bed
groseq_emb_X_GEMformat.bed: contains regions of chrX
chrX:323289-335587
chrX:371074-377975
chrX:380383-383809
....
.GEM_events output file contains regions that are not chrX, which makes me think I got the format wrong.
Thanks! - Jun
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.