huangyh09 / brie Goto Github PK
View Code? Open in Web Editor NEWBRIE: Bayesian Regression for Isoform Estimate in Single Cells
Home Page: https://brie.readthedocs.io
License: Apache License 2.0
BRIE: Bayesian Regression for Isoform Estimate in Single Cells
Home Page: https://brie.readthedocs.io
License: Apache License 2.0
Dear author,
Thanks a lot, I hope to hear from you.
quanlong Jiang
Hi,thank you for developing BRIE!
I've processed all my data with Brie2, but I'm wondering how to look at specific sites where differential splicing occurs, such as which exon occurs, or which intron is spliced out?
I noticed in my brie_count.h5ad files,I can get the transcript id such as:ENSMUSG00000033543.AS2,but I could not connect them with the annotation (SE.lenient.gff3.gz). I wonder the relationship between ENMUSG.....AS2 between ENMUSG.......in/out,so that i can figure out the specific spliced sites.
Thank you so much !
Dear Authors,
Thank you for your wonderful work.
I am struggling to understand the practical meaning of ELBO_gain and effect size of the Volcano plot between ELBO_gain and effect size on logit(PSI) for detecting differential splicing between EAE and control cells by BRIE2.
If you could give some explanation of those meaning in a simple/easy way, it would be great.
Many thanks.
Luke
Hello Yuanhua,
I built brie from source using python3 and I ended up with the error
TypeError: object of type 'map' has no len()
trying to run the brie-event command line. I built again brie with python2 and all went fine. Are you sure brie is compatible with python3 ? If not maybe you should write it on the readme and the website. (or fix it but it may be complicated)
Also, there is a typo in the brie website at https://brie-rna.sourceforge.io/manual.html
the command line example
brie-event-filter -a AS_events/SE.gff3 -anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa
is false and does not run, it should be replaced by:
brie-event-filter -a AS_events/SE.gff3 --anno_ref=gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa
Best,
Milan
I had an error when I test brie
.
How can I install and use BRIE?
$ brie
Traceback (most recent call last):
File "/usr/local/bin/brie", line 11, in <module>
load_entry_point('brie==0.1.1', 'console_scripts', 'brie')()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 565, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2631, in load_entry_point
return ep.load()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2291, in load
return self.resolve()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2297, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/local/lib/python3.6/site-packages/brie/brie.py", line 13, in <module>
from utils.gtf_utils import loadgene
ModuleNotFoundError: No module named 'utils'
Hello BRIE2,
Thanks for this amazing package!
I'm using the loom file as input, but it generates errors as below. Also tried on py3.7, doesn't work.
Environment: Python 3.8, Ubuntu 20.04 LTS
Could you please help me with this issue?
Thanks!
Best,
YJ
(base) hyjforesight@W10D-GW97ZC3:~$ brie-quant -i /home/hyjforesight/loom/cellsorted_WT_IEC_G123B.loom -o /home/hyjforesight/brie2/brie_quant_cell.h5ad -c /home/hyjforesight/WT_IEC_barcodes.tsv --interceptMode=gene --LRTindex=All --layers spliced,unspliced
2021-11-30 18:49:53.274751: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-30 18:49:53.275065: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "/home/hyjforesight/miniconda3/bin/brie-quant", line 8, in
sys.exit(main())
File "/home/hyjforesight/miniconda3/lib/python3.8/site-packages/brie/bin/quant.py", line 213, in main
quant(options.in_file, options.cell_file, options.gene_file,
File "/home/hyjforesight/miniconda3/lib/python3.8/site-packages/brie/bin/quant.py", line 53, in quant
_idx = brie.match(adata.obs.index, dat_tmp[1:, 0]).astype(float)
UnboundLocalError: local variable 'adata' referenced before assignment
I get the following error when executing brie-event-filter using the Ensembl toplevel genome sequence (ftp://ftp.ensembl.org/pub/release-82/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz) as the reference genome sequence:
$ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=Mus_musculus.GRCm38.dna.toplevel.fa
9908 Skipped Exon events are input for quality check.
Traceback (most recent call last):
File "venv/bin/brie-event-filter", line 9, in <module>
load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 369, in main
no_splice_site)
File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 118, in as_exon_check
up_ss3 = fastaFile.get_seq(chrom, _exon_loc[1]+1, _exon_loc[1]+2)
File "venv/local/lib/python2.7/site-packages/brie/utils/fasta_utils.py", line 19, in get_seq
return self.f.fetch(qref, start-1, stop)
File "pysam/libcfaidx.pyx", line 278, in pysam.libcfaidx.FastaFile.fetch (pysam/libcfaidx.c:5011)
KeyError: "sequence 'chrX' not present"
This error does not occur when I use the non-toplevel Ensembl genome sequence (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/GRCm38.p5.genome.fa.gz). I believe it is likely to be caused due to different naming of the chromosomes in each file.
Hi Yuanhua,
Thanks for developing this amazing tool! I'm interested in detecting AS events in my paired-end smart-seq2 dataset and when I was running the brie-count function:
brie-count -S cell_table_smartseq.tsv -a mouse_SE.lenient_50events.gff3
-o outs_smartseq -p 10 #--verbose
An error occurred:
[BRIE2] loading gene annotations ... Done.
[BRIE2] counting reads for 50 genes in 1 sam files with 10 cores...
[BRIE2] [====================] 100.0% cells done in 0.1 sec.
[BRIE2] 50 genes have been processed.
[BRIE2] saving matrix into h5ad ... Traceback (most recent call last):
File "/home/guanao/.local/bin/brie-count", line 33, in
sys.exit(load_entry_point('brie==2.2.2', 'console_scripts', 'brie-count')())
File "/home/guanao/.local/lib/python3.6/site-packages/brie/bin/count.py", line 304, in main
options.nproc, options.event_type, options.verbose)
File "/home/guanao/.local/lib/python3.6/site-packages/brie/bin/count.py", line 123, in smartseq_count
gene_note=np.array(gene_table, dtype='str'))
File "/home/guanao/.local/lib/python3.6/site-packages/brie/utils/io_utils.py", line 23, in convert_to_annData
_shape = Rmat[_input_keys[0]].shape
IndexError: list index out of range
I found in the example dataset (msEAE), single-end smart-seq data were used and I succeeded in running the brie-count function in the same data preprocessing manner but changed paired-end smart-seq2 data into single-end smart-seq2 data.
It seems that brie-count works well for single-end smart-seq2 data but not paired-end smart-seq2 data. Any suggestions would be appreciated. Thank you so much!
Hello!Thank you for developing Brie!
I am interested in alternative splicing events, so I applied BRIE2 to my research.
I have eight 10x genomics samples which belong to four groups. I used brie-count for eight times and got eight brie_count.h5ad files for each sample , then I merged them together into one brie_count.h5ad file to run brie-quant(mode2-quant) ,I am wondering if I am handling the data the right way?
After that , I got a huge h5ad file with 93357 cells and 4266 splicing events. I made four density plots using ggplot2 about their psi distributions but found them seem to have the same distribution ( massive cells psi values close to 0, while other cells close to 1). I am not sure they are right , I am wondering have you ever seen a psi distribution like this before? I am looking forward to your advise. Thank you so much !
Tuo
Hi @huangyh09,
thanks for this amazing tool!
I n the outputfile (clusters.tsv), I can see two columns for p_values. One is called p_value and one is called condition_pval.
What is the difference? and are the p_values adjusted?
Thanks!
Hello,
When I run brie-count
, several processes are spawned but only one is actually using the CPU (see screenshot of top
output). If I try using the parameter -p
or --nproc
, brie-count changes it's message [BRIE2] counting reads for 28303 genes in 1 sam files with 8 cores...
accordingly, and the correct number of processes are spawned; but only one is actually doing something. I'm trying to process smartseq data, so the problematic function is probably get_smartseq_matrix
. I've tried using the multiprocessing
package with a stress test and it worked correctly (ie. all spawned processed were using 100% CPU). Any help would be appreciated.
i have .fa, .gtf, .bam, barcodes.tsv files. And i have run thebriekit-event
and briekit-event-filter
to gain the SE.filter.gtf.
now i run the brie-count, i can't unterstand the errors, would you give me a hand?
brie-count -a AS_events/SE.lenient.gtf -s cellsorted_possorted_genome.bam -b barcode.tsv.gz -o test -p 15
thanks!
Hi Yuanhua,
thanks for providing the tool for splicing analysis. I got an error when running brie for isoform expression.
brie -a AS_events/SE.gold.gtf -s sample.bam -f AS_events/human_features_v19.csv.gz -o AS_events/ -p 50
[Brie] loading annotation file... Done.
[Brie] loading reads for 12504 genes with 50 cores...
[Brie] [====================] 100.0% done in 58.5 sec.
[Brie] running Brie for 25008 isoforms on 12504 genes with 50 cores...
Traceback (most recent call last):
File "/opt/modules/i12g/anaconda/3-4.1.1/envs/python27/bin/brie", line 11, in
load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
File "build/bdist.linux-x86_64/egg/brie/brie.py", line 268, in main
File "build/bdist.linux-x86_64/egg/brie/models/model_brie.py", line 300, in brie_MH_Heuristic
File "build/bdist.linux-x86_64/egg/brie/models/model_brie.py", line 159, in Iso_read_check
IndexError: index 0 is out of bounds for axis 0 with size 0
It might because my bam file uses ensembl chromosome names (1, 2, 3...) while the annotation SE.gold.gtf is from gencode followed you instructions.
I tried to generate brie-factor with ensembl annotation, it gave warnings like "No PhastCons data for ENSG...in. Treated as Zero". As PhastCons file is from UCSC, I assume the problem is again because of chromosome names.
Besides, the same error of brie for isoform expression persist if I use SE.gold.gtf generated from ensembl and human_features generated from gencode (as I cann't generate human_features from ensembl, see above).
Is there a way to get around of this?
Best wishes,
Jun
I want to know, can I get the PSI for every alternative splicing event in every cell ? Because I wanted to explore the correlation between splicing factors (SF) and splicing events through the expression levels of splicing factors in each cell and the PSI of splicing events in each cell.
How can I get the PSI for every alternative splicing event in every cell ? from which destination file ?
Thanks for your help.
Thank you for developing Brie. You gave multiple examples on https://brie.readthedocs.io/. All of these examples were based on mode 2 (cell features or aggregation). However, when it comes to mode 1, a file on gene features is needed (-g gene-feature-file). I don't find any related examples or descriptions in the current document. Could you please tell me how to generate the gene-feature-file for mode 1? What is the format of that file?
Hi Yuanhua,
there is a potential bug in FastaFile.rev_seq() function:
rev_seq("atgc")
>>> 'cgta'
Some parts of the reference fasta file has lower case letters (repetitive regions), in which case it will be a bug.
A possible better option will be
def rev_seq(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A', "N": "N"}
return '"".join([complement[base] for base in seq.upper()[::-1]])
Best,
Jun
Hi Yuanhua,
For the parameter "-f" in brie, which feature file should I use? The lenient or filtered? Is "-f" parameter optional or must needed?
Thanks,
Gangcai
Hi!
Hope you can help me.
In your documentation (https://brie.readthedocs.io/en/latest/brie_count.html) you mention that "brie-quant is more generic and can be applied if the counting has been done, either by brie-count or other tools." I was wondering how can I format the h5ad output from a different tool to be able to use brie-quant.
The reason to use a different tool than brie-count is that I want to use 10x data.
Cheers!
Hi,
I have been trying to use BRIE for a couple of days and running into errors when trying to quantify splicing rates following the manual. In particular I was getting an error in the MH_heuristic function because idxF was empty. I have tracked down this to the map_data function:
"elif ids[idx1[i]] == tran_ids[idx2[j]]:"
this basically never happens because of the suffixes you add to the ids. I have modified my version to remove the suffixes but I am not sure this is how it should be done. Anyway, have a look please and see if you can fix it.
Thanks!
Hi, Yuanhua,
I'm using BRIE to calculate Psi for each event, and then perform different AS detection with △Psi>0.05 and P<0.01 between two different group cells. But many of significant different AS event I found are very low expressed, even though their Psi very significantly different (~0.01 vs ~0.9) between two groups. The Psi value is extracted from fractions.tsv output by brie, but I noticed there are many event that their counts are zero, but the Psi are very high or very low, and their Psi_low and Psi_high are quite different ( the range even nearly 0 to 1). So why their Psi are such different? and how could I filter out the such low coverage AS event? just according to the counts value from fractions.tsv file is OK?
Thanks,
Hello!Thank you for developing Brie!
Recently I tried to visualize sashimi plot with Brie2's h5ad files.
However,I noticed that I had to use Brie1's results(samples.csv.gz)to draw the sashimi plot with miso's sashimi_plot command.
Samples.csv.gz includes the MCMC samples of posterior distribution of Psi,however,h5ad files don't contain this information given that Brie2 doesn't use Metropolis-Hastings algorithm to approximate the posterior.
So I would like to verify if sashimi plot can only be generated with Brie1's result?
Thank you so much!
Hi Yuanhua,
When following the manual, the 'brie' and 'brie-diff' command gives what looks to be valid output files. However, when using 'brie-diff' on my own data, all of the bayesian estimates were identical (2.0e+03), and all of the output values in 'weights.tsv' were 'nan' (I assume 'Not a number').
I have retried all of my installation steps, and re-run brie, but that created no change. 'fractions.tsv' and 'samples.csv' after running 'brie' were distinctly different outputs from cell to cell.
Best,
Gabriel
Hi,
I've met error when installating brie, no matter using 'pip install' or downloading brie packages and using 'python setup.py install'.
Traceback (most recent call last):
File "setup.py", line 11, in <module>
import brie
File "/home/yangxx/software/brie-v0.1.3/brie/__init__.py", line 8, in <module>
from .utils.bias_utils import FastaFile, BiasFile
File "/home/yangxx/software/brie-v0.1.3/brie/utils/bias_utils.py", line 6, in <module>
import pylab as pl
ImportError: No module named pylab
I've tried to fink 'pylab' packages in PYPI but no package have the name exactly same as 'pylab'. My python version 2.7.12. Do you have any idea about this issue?
Hi,
I run brie-count on a .bam file and use the provided SE.gold.gtf annotation. Unfortunately, I receive this error message:
What is the reason for it? I used this code in the command line:
brie-count -a $path2gtf/SE.gold.gff3\ -s $path2bam/Aligned.sortedByCoord.out.bam\ -b $path2barcodes/barcodes.tsv.gz \ -o $path2out/brieCount \ -p 15
I would appreciate some feedback!
Cheers,
Friedrich
Hi Yuanhua,
Is it possible to use Ensembl annotation file to generate the factor file? I had tried briekit-factor, however it seems that the chromosome names are not compatible with the phastCons annotation from UCSC.
Thanks,
Gangcai
Hi,
I have a question about the annotation file, I mapped the reads by STAR with gencode reference gtf, and the splicing events annotations I download from brie directly(https://sourceforge.net/projects/brie-rna/files/annotation/). So my question is will that have any influence for the results? some of the annotation file will be different? Thanks.
Hi Yuanhua,
I am using v2.2 to count other types of AS, A3SS for example:
brie-count -a ~/genome/mm10/mouseAS/A3SS.gff3 -o . -S bamfile.list -p 8 -t Any
Did I miss anything or put something wrong?
Many thanks!
Quanyi
Dear Yuanhua,
Thank you for developing BRIE! Now I have eight samples which have sequenced by 10x genomics, so I get 8 bamfiles and 8 barcodes.tsv.gz.
In your introduction for BRIE2, you give an example like: brie-count -a AS_events/SE.gold.gtf -s possorted.bam -b barcodes.tsv.gz -o out_dir -p 15,so I wonder if there is a way I can input eight bamfiles and their 8 different barcodes.tsv.gz in one command line, so that I can get one h5ad file result instead of eight h5ad files? If I can't, can you give me a suggestion for integrating eight h5ad results? My integration isn't ideal.
I am looking forward to your apply, thank you so much!
Tuo
Using BRIE v2.2.2, I'm getting this error when I use brie-count:
brie-count -p 12 -o ../brie -a ../stringtie2/merged_transcriptome.gtf -S ../data/bam-trimmed/SRR4047245_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047246_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047247_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047248_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047249_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047250_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047251_Aligned.sortedByCoord.out.bam
Traceback (most recent call last):
File "/home/grad17/igonzalez/.local/bin/brie-count", line 33, in
sys.exit(load_entry_point('brie==2.2.2', 'console_scripts', 'brie-count')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/grad17/igonzalez/.local/lib/python3.11/site-packages/brie/bin/count.py", line 303, in main
smartseq_count(options.gff_file, options.samList_file, options.out_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/grad17/igonzalez/.local/lib/python3.11/site-packages/brie/bin/count.py", line 26, in smartseq_count
sam_table = np.loadtxt(samList_file, delimiter = None, dtype=str, ndmin = 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/site-packages/numpy/lib/npyio.py", line 1026, in _read
next_arr = _load_from_filelike(
^^^^^^^^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Dear Authors,
I have sequenced, let's say, 1000 cells. I run brie2 in two ways: Strategy1, run brie2-quant for 1000 cells; Strategy2, run brie2-quant for 1-500 cells and for 501-1000 cells separately.
Afterwards, I compared the PSI value of the same event from Strategy1 V.S. that from Strategy2. The results showed that the average PSI for 1-500 and 501-1000 cells were almost equal in Strategy 1, but the average PSI for 1-500 and 501-1000 cells were obviously different in Strategy 2. I guess that the PSI distribution are different if brie2-quant were run separately for 1-500 and 501-1000 cells (as brie2 jointly model all cells at once) . Am I correct?
This kind of difference may drive the separation of 1-500 cells and 500-1000 cells if I use Seurat to cluster cells. Is it acceptable? What's your suggestion?
Looking forward to your reply and have a nice weekend.
Changchang Cao,
[email protected]
I am following the instructions of the brie manual ([(https://brie-rna.sourceforge.io/manual.html)]). At step 4 I run into the following error:
brie-factor -a AS_events/SE.gold.gtf -r GRCm38.p5.genome.fa -c mm10.60way.phastCons.bw -o mouse_features.csv -p 10
loading annotation file... Done.
extracting features for 5227 skipping exon triplets with 10 cores...
Traceback (most recent call last):
File "/myhome /conda-envs/ml_0/bin/brie-factor", line 11, in
sys.exit(main())
File "/myhome /conda-envs/ml_0/lib/python2.7/site-packages/brie/brie_factor.py", line 152, in main
RV = result[g].get()
File "/myhome /conda-envs/ml_0/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
OSError: [Errno 2] No such file or directory
I receive this error both with my and the example data. I treated the bigWigSummary utility as described in the manual and my multiprocessing packages should be fine.
The first error message looks like the program has trouble finding the relative imports from .utils.gtf_utils import loadgene
.
Could you please help me finding the issue?
Thanks
Stephanie
P.S
The code in step 3
brie-event-filter -a AS_events/SE.gff3 -anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa
should be changed to
brie-event-filter -a AS_events/SE.gff3 --anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa
Hi,
I tried brie to my own data, but it reported "Cannot fetch reads in regions: xxx" all the way, and finished with error showed below:
File "/home/yangxx/.local/bin/brie", line 9, in <module>
load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
File "build/bdist.linux-x86_64/egg/brie/brie.py", line 256, in main
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
UnboundLocalError: local variable 'reads' referenced before assignment
My aligner is STAR 2.5.2b, and I already sort bam files during aligner and index with samtools. I also tried to resort output bam files and indexed. These all reported the same error. This is my command line:
brie -a $path/annotation/SE.gold.gtf -f $path/annotation/human_factors.SE.gold.csv -p 10 -s $path/STAR/SRR1033783.Aligned.sortedByCoord.out.bam -o $path/BRIE/SRR1033783. Is there something wrong about my input or parameter?
After finishing the brie
, I had three files: fractions.tsv, weights.tsv and samples.csv.gz.
And I need to use Seurat for further analysis.
However, I need, at least, the expression matrix. The format could be as following:
AAACATACCATCAG AAACATACCGATAC AAACATACGCATCA
ENSMUSG00000111425.1 1 2 0
ENSMUSG00000024442.5 0 1 0
OR
Cell_1 Cell_2 Cell_3
Gene_1_isoform_1 1 2 0
Gene_1_isoform_2 0 1 0
Gene_2_isoform_1 1 1 0
Does anyone know how to export a file like this or how to combine BRIE and Seurat?
Hi together,
When I count MXE events and run brie quant on it, I get a file where the chromosmecoordinates are listed (format: chr6:10749202:10749268:+@chr6:10749622:10749698:+@chr6:10755142:10755232:+@chr6:10770077:10770204:+ 24905) and p values etc.
But I dont have the gene_id.
How do I get the gene_id?
thanks!
"Naturally, protocols such as CEL-seq or STRT-seq that bias reads towards the ends of the transcript cannot provide information about exon-skipping events that may be very far from the ends of a transcript."
I saw this in BRIE article, and I am not sure whether BRIE2 can be used in 10x Genomics scRNA data which also is PolyA-tailed mRNA data.
I was wondering if there's any way to include cell technical variation (e.g., # of genes detected, percent mitochondrial, or read depth). These often have considerable impact on gene expression estimation but are not really biological.
Dear Yuanhua,
On the example you gave on https://brie.readthedocs.io/en/latest/quick_start.html under "Pre-Step: Read Counting" for 10x genomics data, I was unable to execute brie-count or brie-quant, as it kept giving the error that the command "brie-count" and "brie-quant" did not exist. Would you happen to know why it appears as un-executable when I run the command, or what I could do to remediate this error?
Thank you!
Hello,
I was wondering if there was any plan to add resource usage (i.e. number of threads) as an option to use brie-quant
. Right now the code has maximum resources hardcoded:
Lines 207 to 211 in 8f83c56
This is an issue for people using shared computer clusters. I didn't have time to test this, but would changing this line of code lead to less resources used?
Thank you in advance for the reply
Hi,
Thank you for all the work and effort that went into BRIE/BRIE2. Very nice tool.
I am processing a SS2 dataset with brie-count
as follows:
brie-count -a SE.gold.gtf -S samples.tsv -o test -p 3
This runs for a while with samtools
-related warnings that look like this:
"E::sam_hrecs_error] Malformed key:value pair at line 69: "@RG ID:CM387
Cannot fetch reads in region: chr16:92613599-92668812
and ultimately fails with:
Cannot fetch reads in region: chrX:167304929-167330099
[BRIE2] [====================] 100.0% cells done in 6.6 sec.
[BRIE2] save matrix into h5ad ...
Traceback (most recent call last):
File "/Users/ctl/anaconda3/envs/scbrie2/bin/brie-count", line 8, in <module>
sys.exit(main())
File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 206, in main
count(options.gff_file, options.samList_file, options.out_dir,
File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 156, in count
adata = convert_to_annData(Rmat_dict=Rmat_dict,
File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/utils/io_utils.py", line 20, in convert_to_annData
X = Rmat['1'] + Rmat['2'] + Rmat['3']
KeyError: '1'
It looks like the anndata
can't be made because there are problems with the BAM file, but the BAM files seem fine.
Any help with this would be much appreciated.
Dear Yuanhua,
This is more a question than an issue. I recognized that the counts in the fractions.tsv file can adopt non integer values, because the counts are the mean of Cnt_all .
Cnt_all is updated in your MH_propose function together with the Y and Psi value.
Could you please describe shortly the connection between the actual counts in the sequencing data and your Cnt_all? Unfortunately I am not able to figure it out by looking at your code our paper.
Many thanks,
Stephanie
Hi,
I am trying to use BRIE using a custom annotation file and the GENCODE M10 annotation file. In both cases, running brie-event
as follows:
brie-event -a gencode.vM10.annotation.gtf
returns errors and does not succeed in producing the SE files. Note that my custom annotation has the same format as the GENCODE file, and neither file causes problems when used with other software tools. The format is:
GENCODE:
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; (...)
Custom:
chr1 mm10_index exon 4777525 4777648 100 - . transcript_id "PB.1.1"; gene_id "Mrpl15"; (...)
In the case of my custom annotation, the error is the following:
File "/home/angeles/anaconda3/bin/brie-event", line 11, in <module>
sys.exit(main())
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 166, in main
sanitize=options.sanitize)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 59, in defineAllSplicing
DtoA_F, AtoD_F, DtoA_R, AtoD_R = prepareSplicegraph(anno_file, ftype)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 38, in prepareSplicegraph
DtoA_F, AtoD_F, DtoA_R, AtoD_R)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 80, in populateSplicegraph
data = readTable_gff(table_f, ftype)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 60, in readTable_gff
exons.append([str(int(vals[3])-1), vals[4]])
UnboundLocalError: local variable 'exons' referenced before assignment
In the case of the GENCODE file, the error is slightly different, although it seems to involve the same two python scripts, event_marker.py
and parseTables.py
:
File "/home/angeles/anaconda3/bin/brie-event", line 11, in <module>
sys.exit(main())
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 166, in main
sanitize=options.sanitize)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 59, in defineAllSplicing
DtoA_F, AtoD_F, DtoA_R, AtoD_R = prepareSplicegraph(anno_file, ftype)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 38, in prepareSplicegraph
DtoA_F, AtoD_F, DtoA_R, AtoD_R)
File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 96, in populateSplicegraph
for i in range(len(startvals) - 1):
TypeError: object of type 'map' has no len()
It seems to me that BRIE cannot get the right information from the .gtf files, and the variables are therefore empty when referenced downstream, which then causes the error... but I wonder why that is happening?
Thanks in advance,
Ángeles
The following error occurs when executing brie-event-filter using the Ensembl GTF file from ftp://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/Mus_musculus.GRCm38.82.chr.gtf.gz:
$ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=GRCm38.p5.genome.fa
[fai_load] build FASTA index.
9908 Skipped Exon events are input for quality check.
0 Skipped Exon events pass the qulity control.
Traceback (most recent call last):
File "venv/bin/brie-event-filter", line 9, in <module>
load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 361, in main
g_idx, g_chr, g_start, g_stop = get_gene_idx(anno_out)
File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 39, in get_gene_idx
g_idx.append([now_g, last_g])
UnboundLocalError: local variable 'last_g' referenced before assignment
The following command was used to generate SE.gff3:
$ brie-event -a Mus_musculus.GRCm38.82.chr.gtf -o AS_events
Making GFF alternative events annotation...
- Input annotation files: Mus_musculus.GRCm38.82.chr.gtf
- Output dir: AS_events
('Reading table', 'Mus_musculus.GRCm38.82.chr.gtf')
Generating skipped exons (SE)
Generating retained introns (RI)
Generating mutually exclusive exons (MXE)
Generating alternative 3' splice sites (A3SS)
Generating alternative 5' splice sites (A5SS)
Took 0.69 minutes to make the annotation.
This error does not occur if the GENCODE GTF file from ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz is used to generate SE.gff3 and SE.gold.gff3 instead.
Hi all,
I'm an undergraduate research assistant who was asked to figure out how to install and use BRIE and BRIE-kit for my research lab, but as I am in no way a trained statistician, I have no clue where to start interpreting the brie-diff output.
If unclear, I'm referring to the ~.diff.tsv file and ~.diff.rank.tsv file that are both created after running the program. My data, for example, looks like this:
What can I do to visualize and understand these results?
Edit - I can access each column individually or in comparison in a program like RStudio, but without an understanding of statistics behind BRIE, I cannot use the data
Hi,
I did not see any examples analyzing other splicing types (A5SS, A3SS, MXE, RI) using BRIE. I used brie-event-filter to deal with gff3 of other splicing types, but I can not get gold.gtf or gold.gff3.
Hi Yuanhua,
I want to perform differential splicing of different groups of single cells. Could you provide an example for groups as I did not find this in the manual.
Thanks
Hi Yuanhua,
my code is as follows:
brie-count -a genes.gff3 -S cell.bam -b barcodes.tsv.gz -o test2/ -p 15 -t any
this process ran for 15 hours, however, the result files don't contain the improtant .h5ad file.
Is anything i miss? thanks
Hi Yuanhua,
my code is as follows:
brie-count -a genes.gff3 -S cell.bam -b barcodes.tsv.gz -o test2/ -p 15 -t any
this process ran for 15 hours, however, the result files don't contain the improtant .h5ad file.
Is anything i miss? thanks
Dearest Yuanhua,
Running brie with a specific bam file generates this error. Other bam files that I have tried work perfectly fine.
Traceback (most recent call last):
File "/home/user/anaconda3/envs/brie-env/bin/brie", line 11, in <module>
sys.exit(main())
File "/home/user/anaconda3/envs/brie-env/lib/python2.7/site-packages/brie/brie.py", line 268, in main
M=M, Mmin=initial, gap=gap, nproc=nproc)
File "/home/user/anaconda3/envs/brie-env/lib/python2.7/site-packages/brie/models/model_brie.py", line 451, in brie_MH_Heuristic
FPKM_all = Cnt_all / tranLen.reshape(-1, 1) / total_count * 10**9
RuntimeWarning: invalid value encountered in divide
I believe that this is some sort of divide by zero error that prevents the rest of the analysis from running. No output files are created.
I have yet to fully investigate your code myself, but would there be any conditions that could cause one of the divisors to be zero?
Regards,
Trevor
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.