huangyh09 / brie Goto Github PK

BRIE: Bayesian Regression for Isoform Estimate in Single Cells

License: Apache License 2.0

Python 99.18% Shell 0.82%

alternative-splicing isoform-quantification single-cell rna-seq variantional-inference differential-splicing differential-momentum-gene

brie's People

Contributors

Stargazers

Watchers

Forkers

anu-bioinfo cwt1 neevor sridhar0605 operontop shicheng-guo michaelgmz houruiyan davidaknowles lhc17 cziegenhain lhartmanis eneritz wook2014

brie's Issues

confused about Psi_95CI

Dear author,

I think the psi 95CI shuld be a invterval. but only a single value was provied in the result. Is Psi_95CI the upper bounds of 95 CI?
you said " if a splicing event in a gene doesn’t have any read, it will return a posterior with Psi’s mean=0.5 and 95% confidence interval around 0.95 (most cases >0.9)" . I think PSI 0.5 means a half of exon was exclued. This is confused. Is that better if the PSI assigned as 1 if there is no reads?
why Psi_95CI > 0.3 keep the confident psi?

Thanks a lot, I hope to hear from you.
quanlong Jiang

How to get the specific sites where differential splicing events occurs

Hi,thank you for developing BRIE!
I've processed all my data with Brie2, but I'm wondering how to look at specific sites where differential splicing occurs, such as which exon occurs, or which intron is spliced out?
I noticed in my brie_count.h5ad files,I can get the transcript id such as:ENSMUSG00000033543.AS2,but I could not connect them with the annotation (SE.lenient.gff3.gz). I wonder the relationship between ENMUSG.....AS2 between ENMUSG.......in/out,so that i can figure out the specific spliced sites.
Thank you so much !

ELBO_gain and effect size

Dear Authors,

Thank you for your wonderful work.
I am struggling to understand the practical meaning of ELBO_gain and effect size of the Volcano plot between ELBO_gain and effect size on logit(PSI) for detecting differential splicing between EAE and control cells by BRIE2.
If you could give some explanation of those meaning in a simple/easy way, it would be great.

Many thanks.
Luke

brie seems to be incompatible with python3

Hello Yuanhua,

I built brie from source using python3 and I ended up with the error
TypeError: object of type 'map' has no len()
trying to run the brie-event command line. I built again brie with python2 and all went fine. Are you sure brie is compatible with python3 ? If not maybe you should write it on the readme and the website. (or fix it but it may be complicated)

Also, there is a typo in the brie website at https://brie-rna.sourceforge.io/manual.html
the command line example
brie-event-filter -a AS_events/SE.gff3 -anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa
is false and does not run, it should be replaced by:
brie-event-filter -a AS_events/SE.gff3 --anno_ref=gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa

Best,
Milan

ModuleNotFoundError: No module named 'utils'

I had an error when I test brie.

How can I install and use BRIE?

python 3.6.0 (docker/python:3.6.0)

$ brie
Traceback (most recent call last):
  File "/usr/local/bin/brie", line 11, in <module>
    load_entry_point('brie==0.1.1', 'console_scripts', 'brie')()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 565, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2631, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2291, in load
    return self.resolve()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2297, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/brie/brie.py", line 13, in <module>
    from utils.gtf_utils import loadgene
ModuleNotFoundError: No module named 'utils'

UnboundLocalError: local variable 'adata' referenced before assignment

Hello BRIE2,
Thanks for this amazing package!
I'm using the loom file as input, but it generates errors as below. Also tried on py3.7, doesn't work.
Environment: Python 3.8, Ubuntu 20.04 LTS
Could you please help me with this issue?
Thanks!
Best,
YJ

(base) hyjforesight@W10D-GW97ZC3:~$ brie-quant -i /home/hyjforesight/loom/cellsorted_WT_IEC_G123B.loom -o /home/hyjforesight/brie2/brie_quant_cell.h5ad -c /home/hyjforesight/WT_IEC_barcodes.tsv --interceptMode=gene --LRTindex=All --layers spliced,unspliced
2021-11-30 18:49:53.274751: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-30 18:49:53.275065: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "/home/hyjforesight/miniconda3/bin/brie-quant", line 8, in
sys.exit(main())
File "/home/hyjforesight/miniconda3/lib/python3.8/site-packages/brie/bin/quant.py", line 213, in main
quant(options.in_file, options.cell_file, options.gene_file,
File "/home/hyjforesight/miniconda3/lib/python3.8/site-packages/brie/bin/quant.py", line 53, in quant
_idx = brie.match(adata.obs.index, dat_tmp[1:, 0]).astype(float)
UnboundLocalError: local variable 'adata' referenced before assignment

BRIE does not support use of the Ensembl toplevel genome sequence

I get the following error when executing brie-event-filter using the Ensembl toplevel genome sequence (ftp://ftp.ensembl.org/pub/release-82/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz) as the reference genome sequence:

$ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=Mus_musculus.GRCm38.dna.toplevel.fa
9908 Skipped Exon events are input for quality check.
Traceback (most recent call last):
  File "venv/bin/brie-event-filter", line 9, in <module>
    load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 369, in main
    no_splice_site)
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 118, in as_exon_check
    up_ss3 = fastaFile.get_seq(chrom, _exon_loc[1]+1, _exon_loc[1]+2)
  File "venv/local/lib/python2.7/site-packages/brie/utils/fasta_utils.py", line 19, in get_seq
    return self.f.fetch(qref, start-1, stop)
  File "pysam/libcfaidx.pyx", line 278, in pysam.libcfaidx.FastaFile.fetch (pysam/libcfaidx.c:5011)
KeyError: "sequence 'chrX' not present"

This error does not occur when I use the non-toplevel Ensembl genome sequence (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/GRCm38.p5.genome.fa.gz). I believe it is likely to be caused due to different naming of the chromosomes in each file.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Hi, I set up the environment as you described. And I tested with the example data and got the error.

Brie-count error with paired end smart-seq2 data

Hi Yuanhua,
Thanks for developing this amazing tool! I'm interested in detecting AS events in my paired-end smart-seq2 dataset and when I was running the brie-count function:

brie-count -S cell_table_smartseq.tsv -a mouse_SE.lenient_50events.gff3
-o outs_smartseq -p 10 #--verbose

An error occurred:
[BRIE2] loading gene annotations ... Done.
[BRIE2] counting reads for 50 genes in 1 sam files with 10 cores...
[BRIE2] [====================] 100.0% cells done in 0.1 sec.
[BRIE2] 50 genes have been processed.
[BRIE2] saving matrix into h5ad ... Traceback (most recent call last):
File "/home/guanao/.local/bin/brie-count", line 33, in
sys.exit(load_entry_point('brie==2.2.2', 'console_scripts', 'brie-count')())
File "/home/guanao/.local/lib/python3.6/site-packages/brie/bin/count.py", line 304, in main
options.nproc, options.event_type, options.verbose)
File "/home/guanao/.local/lib/python3.6/site-packages/brie/bin/count.py", line 123, in smartseq_count
gene_note=np.array(gene_table, dtype='str'))
File "/home/guanao/.local/lib/python3.6/site-packages/brie/utils/io_utils.py", line 23, in convert_to_annData
_shape = Rmat[_input_keys[0]].shape
IndexError: list index out of range

I found in the example dataset (msEAE), single-end smart-seq data were used and I succeeded in running the brie-count function in the same data preprocessing manner but changed paired-end smart-seq2 data into single-end smart-seq2 data.

It seems that brie-count works well for single-end smart-seq2 data but not paired-end smart-seq2 data. Any suggestions would be appreciated. Thank you so much!

some questions about psi distribution

Hello!Thank you for developing Brie!
I am interested in alternative splicing events, so I applied BRIE2 to my research.
I have eight 10x genomics samples which belong to four groups. I used brie-count for eight times and got eight brie_count.h5ad files for each sample , then I merged them together into one brie_count.h5ad file to run brie-quant(mode2-quant) ,I am wondering if I am handling the data the right way?
After that , I got a huge h5ad file with 93357 cells and 4266 splicing events. I made four density plots using ggplot2 about their psi distributions but found them seem to have the same distribution ( massive cells psi values close to 0, while other cells close to 1). I am not sure they are right , I am wondering have you ever seen a psi distribution like this before? I am looking forward to your advise. Thank you so much !
Tuo

p values outputfile

Hi @huangyh09,

thanks for this amazing tool!

I n the outputfile (clusters.tsv), I can see two columns for p_values. One is called p_value and one is called condition_pval.
What is the difference? and are the p_values adjusted?

Thanks!

Brie-count Multiprocessing using only 1 core

Hello,

When I run brie-count, several processes are spawned but only one is actually using the CPU (see screenshot of top output). If I try using the parameter -p or --nproc, brie-count changes it's message [BRIE2] counting reads for 28303 genes in 1 sam files with 8 cores... accordingly, and the correct number of processes are spawned; but only one is actually doing something. I'm trying to process smartseq data, so the problematic function is probably get_smartseq_matrix. I've tried using the multiprocessing package with a stress test and it worked correctly (ie. all spawned processed were using 100% CPU). Any help would be appreciated.

the errors about the brie-count

i have .fa, .gtf, .bam, barcodes.tsv files. And i have run thebriekit-event and briekit-event-filter to gain the SE.filter.gtf.
now i run the brie-count, i can't unterstand the errors, would you give me a hand?

brie-count -a AS_events/SE.lenient.gtf -s cellsorted_possorted_genome.bam -b barcode.tsv.gz -o test -p 15

thanks!

Brie isoform expression quantification error

Hi Yuanhua,

thanks for providing the tool for splicing analysis. I got an error when running brie for isoform expression.

brie -a AS_events/SE.gold.gtf -s sample.bam -f AS_events/human_features_v19.csv.gz -o AS_events/ -p 50
[Brie] loading annotation file... Done.
[Brie] loading reads for 12504 genes with 50 cores...
[Brie] [====================] 100.0% done in 58.5 sec.
[Brie] running Brie for 25008 isoforms on 12504 genes with 50 cores...
Traceback (most recent call last):
File "/opt/modules/i12g/anaconda/3-4.1.1/envs/python27/bin/brie", line 11, in
load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
File "build/bdist.linux-x86_64/egg/brie/brie.py", line 268, in main
File "build/bdist.linux-x86_64/egg/brie/models/model_brie.py", line 300, in brie_MH_Heuristic
File "build/bdist.linux-x86_64/egg/brie/models/model_brie.py", line 159, in Iso_read_check
IndexError: index 0 is out of bounds for axis 0 with size 0

It might because my bam file uses ensembl chromosome names (1, 2, 3...) while the annotation SE.gold.gtf is from gencode followed you instructions.

I tried to generate brie-factor with ensembl annotation, it gave warnings like "No PhastCons data for ENSG...in. Treated as Zero". As PhastCons file is from UCSC, I assume the problem is again because of chromosome names.

Besides, the same error of brie for isoform expression persist if I use SE.gold.gtf generated from ensembl and human_features generated from gencode (as I cann't generate human_features from ensembl, see above).

Is there a way to get around of this?

Best wishes,
Jun

get the PSI for every alternative splicing event in every cell

I want to know, can I get the PSI for every alternative splicing event in every cell ? Because I wanted to explore the correlation between splicing factors (SF) and splicing events through the expression levels of splicing factors in each cell and the PSI of splicing events in each cell.
How can I get the PSI for every alternative splicing event in every cell ? from which destination file ?
Thanks for your help.

Mode 1 and gene features

Thank you for developing Brie. You gave multiple examples on https://brie.readthedocs.io/. All of these examples were based on mode 2 (cell features or aggregation). However, when it comes to mode 1, a file on gene features is needed (-g gene-feature-file). I don't find any related examples or descriptions in the current document. Could you please tell me how to generate the gene-feature-file for mode 1? What is the format of that file?

FastaFile.rev_seq fail to complement for lower case input

Hi Yuanhua,

there is a potential bug in FastaFile.rev_seq() function:

rev_seq("atgc")
>>> 'cgta'

Some parts of the reference fasta file has lower case letters (repetitive regions), in which case it will be a bug.
A possible better option will be

def rev_seq(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A', "N": "N"}
    return '"".join([complement[base] for base in seq.upper()[::-1]])

Best,
Jun

What's the difference between mouse_features.filtered.csv.gz and mouse_features.lenient.csv.gz, and which one to use?

Hi Yuanhua,
For the parameter "-f" in brie, which feature file should I use? The lenient or filtered? Is "-f" parameter optional or must needed?

Thanks,
Gangcai

Brie-quant input

Hi!

Hope you can help me.

In your documentation (https://brie.readthedocs.io/en/latest/brie_count.html) you mention that "brie-quant is more generic and can be applied if the counting has been done, either by brie-count or other tools." I was wondering how can I format the h5ad output from a different tool to be able to use brie-quant.

The reason to use a different tool than brie-count is that I want to use 10x data.

Cheers!

Error in map_data function

Hi,

I have been trying to use BRIE for a couple of days and running into errors when trying to quantify splicing rates following the manual. In particular I was getting an error in the MH_heuristic function because idxF was empty. I have tracked down this to the map_data function:

"elif ids[idx1[i]] == tran_ids[idx2[j]]:"

this basically never happens because of the suffixes you add to the ids. I have modified my version to remove the suffixes but I am not sure this is how it should be done. Anyway, have a look please and see if you can fix it.

Thanks!

filter out low coverage event

Hi, Yuanhua,

I'm using BRIE to calculate Psi for each event, and then perform different AS detection with △Psi>0.05 and P<0.01 between two different group cells. But many of significant different AS event I found are very low expressed, even though their Psi very significantly different (~0.01 vs ~0.9) between two groups. The Psi value is extracted from fractions.tsv output by brie, but I noticed there are many event that their counts are zero, but the Psi are very high or very low, and their Psi_low and Psi_high are quite different ( the range even nearly 0 to 1). So why their Psi are such different? and how could I filter out the such low coverage AS event? just according to the counts value from fractions.tsv file is OK?

Thanks,

visualize sashimi plot with brie2

Hello!Thank you for developing Brie!
Recently I tried to visualize sashimi plot with Brie2's h5ad files.
However,I noticed that I had to use Brie1's results(samples.csv.gz)to draw the sashimi plot with miso's sashimi_plot command.
Samples.csv.gz includes the MCMC samples of posterior distribution of Psi,however,h5ad files don't contain this information given that Brie2 doesn't use Metropolis-Hastings algorithm to approximate the posterior.
So I would like to verify if sashimi plot can only be generated with Brie1's result?
Thank you so much!

'nan' output in weights.tsv

Hi Yuanhua,

When following the manual, the 'brie' and 'brie-diff' command gives what looks to be valid output files. However, when using 'brie-diff' on my own data, all of the bayesian estimates were identical (2.0e+03), and all of the output values in 'weights.tsv' were 'nan' (I assume 'Not a number').

I have retried all of my installation steps, and re-run brie, but that created no change. 'fractions.tsv' and 'samples.csv' after running 'brie' were distinctly different outputs from cell to cell.

Best,
Gabriel

Installation failed

Hi,

I've met error when installating brie, no matter using 'pip install' or downloading brie packages and using 'python setup.py install'.

Traceback (most recent call last):
  File "setup.py", line 11, in <module>
    import brie
  File "/home/yangxx/software/brie-v0.1.3/brie/__init__.py", line 8, in <module>
    from .utils.bias_utils import FastaFile, BiasFile
  File "/home/yangxx/software/brie-v0.1.3/brie/utils/bias_utils.py", line 6, in <module>
    import pylab as pl
ImportError: No module named pylab

I've tried to fink 'pylab' packages in PYPI but no package have the name exactly same as 'pylab'. My python version 2.7.12. Do you have any idea about this issue?

Error when running brie-count: IndexError: list index out of range

Hi,

I run brie-count on a .bam file and use the provided SE.gold.gtf annotation. Unfortunately, I receive this error message:

What is the reason for it? I used this code in the command line:

brie-count -a $path2gtf/SE.gold.gff3\ -s $path2bam/Aligned.sortedByCoord.out.bam\ -b $path2barcodes/barcodes.tsv.gz \ -o $path2out/brieCount \ -p 15

I would appreciate some feedback!

Cheers,
Friedrich

How to generate feature files using Ensembl annotation file?

Hi Yuanhua,
Is it possible to use Ensembl annotation file to generate the factor file? I had tried briekit-factor, however it seems that the chromosome names are not compatible with the phastCons annotation from UCSC.

Thanks,
Gangcai

splicing events annotations

Hi,
I have a question about the annotation file, I mapped the reads by STAR with gencode reference gtf, and the splicing events annotations I download from brie directly(https://sourceforge.net/projects/brie-rna/files/annotation/). So my question is will that have any influence for the results? some of the annotation file will be different? Thanks.

Other AS types counting, no output

Hi Yuanhua,

I am using v2.2 to count other types of AS, A3SS for example:

brie-count -a ~/genome/mm10/mouseAS/A3SS.gff3 -o . -S bamfile.list -p 8 -t Any

Even with -t Any option, this command still gives error message and stuck like forever:

[BRIE2] example head cells:
[['/..../116434_42_3.bam'
'116434_42_3']
['/..../116434_61_39.bam'
'116434_61_39']
['/..../116434_8_46.bam'
'116434_8_46']] ...
[BRIE2] loading gene annotations ... Done.
[BRIE2] counting reads for 6104 genes in 3140 sam files with 8 cores...
This is not exon-skipping event!
None
This is not exon-skipping event!
None
This is not exon-skipping event!
None

Did I miss anything or put something wrong?

Many thanks!

Quanyi

multiple 10x samples integration to run brie-count

Dear Yuanhua,
Thank you for developing BRIE! Now I have eight samples which have sequenced by 10x genomics, so I get 8 bamfiles and 8 barcodes.tsv.gz.
In your introduction for BRIE2, you give an example like: brie-count -a AS_events/SE.gold.gtf -s possorted.bam -b barcodes.tsv.gz -o out_dir -p 15,so I wonder if there is a way I can input eight bamfiles and their 8 different barcodes.tsv.gz in one command line, so that I can get one h5ad file result instead of eight h5ad files? If I can't, can you give me a suggestion for integrating eight h5ad results? My integration isn't ideal.
I am looking forward to your apply, thank you so much!

Tuo

UnicodeDecodeError

Using BRIE v2.2.2, I'm getting this error when I use brie-count:

brie-count -p 12 -o ../brie -a ../stringtie2/merged_transcriptome.gtf -S ../data/bam-trimmed/SRR4047245_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047246_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047247_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047248_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047249_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047250_Aligned.sortedByCoord.out.bam ../data/bam-trimmed/SRR4047251_Aligned.sortedByCoord.out.bam

Traceback (most recent call last):
File "/home/grad17/igonzalez/.local/bin/brie-count", line 33, in
sys.exit(load_entry_point('brie==2.2.2', 'console_scripts', 'brie-count')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/grad17/igonzalez/.local/lib/python3.11/site-packages/brie/bin/count.py", line 303, in main
smartseq_count(options.gff_file, options.samList_file, options.out_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/grad17/igonzalez/.local/lib/python3.11/site-packages/brie/bin/count.py", line 26, in smartseq_count
sam_table = np.loadtxt(samList_file, delimiter = None, dtype=str, ndmin = 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/site-packages/numpy/lib/npyio.py", line 1356, in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/site-packages/numpy/lib/npyio.py", line 1026, in _read
next_arr = _load_from_filelike(
^^^^^^^^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

split cells into groups and run brie2

Dear Authors,

I have sequenced, let's say, 1000 cells. I run brie2 in two ways: Strategy1, run brie2-quant for 1000 cells; Strategy2, run brie2-quant for 1-500 cells and for 501-1000 cells separately.

Afterwards, I compared the PSI value of the same event from Strategy1 V.S. that from Strategy2. The results showed that the average PSI for 1-500 and 501-1000 cells were almost equal in Strategy 1, but the average PSI for 1-500 and 501-1000 cells were obviously different in Strategy 2. I guess that the PSI distribution are different if brie2-quant were run separately for 1-500 and 501-1000 cells (as brie2 jointly model all cells at once) . Am I correct?

This kind of difference may drive the separation of 1-500 cells and 500-1000 cells if I use Seurat to cluster cells. Is it acceptable? What's your suggestion?

Looking forward to your reply and have a nice weekend.
Changchang Cao,
[email protected]

brie-factor utility causes error

I am following the instructions of the brie manual ([(https://brie-rna.sourceforge.io/manual.html)]). At step 4 I run into the following error:

brie-factor -a AS_events/SE.gold.gtf -r GRCm38.p5.genome.fa -c mm10.60way.phastCons.bw -o mouse_features.csv -p 10

loading annotation file... Done.
extracting features for 5227 skipping exon triplets with 10 cores...
Traceback (most recent call last):
File "/myhome /conda-envs/ml_0/bin/brie-factor", line 11, in
sys.exit(main())
File "/myhome /conda-envs/ml_0/lib/python2.7/site-packages/brie/brie_factor.py", line 152, in main
RV = result[g].get()
File "/myhome /conda-envs/ml_0/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
OSError: [Errno 2] No such file or directory

I receive this error both with my and the example data. I treated the bigWigSummary utility as described in the manual and my multiprocessing packages should be fine.
The first error message looks like the program has trouble finding the relative imports from .utils.gtf_utils import loadgene .

Could you please help me finding the issue?

Thanks
Stephanie

P.S
The code in step 3
brie-event-filter -a AS_events/SE.gff3 -anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa

should be changed to
brie-event-filter -a AS_events/SE.gff3 --anno_ref gencode.vM12.annotation.gtf -r GRCm38.p5.genome.fa

Cannot fetch reads in region

Hi,

I tried brie to my own data, but it reported "Cannot fetch reads in regions: xxx" all the way, and finished with error showed below:

File "/home/yangxx/.local/bin/brie", line 9, in <module>
   load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
 File "build/bdist.linux-x86_64/egg/brie/brie.py", line 256, in main
 File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
   raise self._value
UnboundLocalError: local variable 'reads' referenced before assignment

My aligner is STAR 2.5.2b, and I already sort bam files during aligner and index with samtools. I also tried to resort output bam files and indexed. These all reported the same error. This is my command line:
brie -a $path/annotation/SE.gold.gtf -f $path/annotation/human_factors.SE.gold.csv -p 10 -s $path/STAR/SRR1033783.Aligned.sortedByCoord.out.bam -o $path/BRIE/SRR1033783. Is there something wrong about my input or parameter?

Export count matrix containing isoforms

After finishing the brie, I had three files: fractions.tsv, weights.tsv and samples.csv.gz.
And I need to use Seurat for further analysis.
However, I need, at least, the expression matrix. The format could be as following:

                                 AAACATACCATCAG AAACATACCGATAC AAACATACGCATCA
ENSMUSG00000111425.1               1             2              0
ENSMUSG00000024442.5               0             1              0

                           Cell_1     Cell_2    Cell_3
Gene_1_isoform_1               1          2          0
Gene_1_isoform_2               0          1          0
Gene_2_isoform_1               1          1          0

Does anyone know how to export a file like this or how to combine BRIE and Seurat?

MXE chromosomecoordinates to gene_ids

Hi together,
When I count MXE events and run brie quant on it, I get a file where the chromosmecoordinates are listed (format: chr6:10749202:10749268:+@chr6:10749622:10749698:+@chr6:10755142:10755232:+@chr6:10770077:10770204:+ 24905) and p values etc.
But I dont have the gene_id.

How do I get the gene_id?

thanks!

Is BRIE2 appliable in 10x Genomics scRNA data

"Naturally, protocols such as CEL-seq or STRT-seq that bias reads towards the ends of the transcript cannot provide information about exon-skipping events that may be very far from the ends of a transcript."

I saw this in BRIE article, and I am not sure whether BRIE2 can be used in 10x Genomics scRNA data which also is PolyA-tailed mRNA data.

Modeling cell-level technical variation

I was wondering if there's any way to include cell technical variation (e.g., # of genes detected, percent mitochondrial, or read depth). These often have considerable impact on gene expression estimation but are not really biological.

Brie-Count and Brie-Quant Errors

Dear Yuanhua,

On the example you gave on https://brie.readthedocs.io/en/latest/quick_start.html under "Pre-Step: Read Counting" for 10x genomics data, I was unable to execute brie-count or brie-quant, as it kept giving the error that the command "brie-count" and "brie-quant" did not exist. Would you happen to know why it appears as un-executable when I run the command, or what I could do to remediate this error?

Thank you!

brie-quant uses all available resources

Hello,

I was wondering if there was any plan to add resource usage (i.e. number of threads) as an option to use brie-quant. Right now the code has maximum resources hardcoded:

brie/brie/bin/quant.py

Lines 207 to 211 in 8f83c56

 ## maximum number of threads (to fix) 

 # if options.nproc != -1: 

 # tf.config.threading.set_inter_op_parallelism_threads(options.nproc) 

 nproc = -1

This is an issue for people using shared computer clusters. I didn't have time to test this, but would changing this line of code lead to less resources used?

Thank you in advance for the reply

`brie-count` fails with `KeyError '1'`

Hi,

Thank you for all the work and effort that went into BRIE/BRIE2. Very nice tool.

I am processing a SS2 dataset with brie-count as follows:

brie-count -a SE.gold.gtf -S samples.tsv -o test -p 3

This runs for a while with samtools-related warnings that look like this:

"E::sam_hrecs_error] Malformed key:value pair at line 69: "@RG	ID:CM387
Cannot fetch reads in region: chr16:92613599-92668812

and ultimately fails with:

Cannot fetch reads in region: chrX:167304929-167330099
[BRIE2] [====================] 100.0% cells done in 6.6 sec.
[BRIE2] save matrix into h5ad ...
Traceback (most recent call last):
  File "/Users/ctl/anaconda3/envs/scbrie2/bin/brie-count", line 8, in <module>
    sys.exit(main())
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 206, in main
    count(options.gff_file, options.samList_file, options.out_dir,
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/bin/count.py", line 156, in count
    adata = convert_to_annData(Rmat_dict=Rmat_dict,
  File "/Users/ctl/anaconda3/envs/scbrie2/lib/python3.9/site-packages/brie/utils/io_utils.py", line 20, in convert_to_annData
    X = Rmat['1'] + Rmat['2'] + Rmat['3']
KeyError: '1'

It looks like the anndata can't be made because there are problems with the BAM file, but the BAM files seem fine.

Any help with this would be much appreciated.

Counts in fractions.tsv

Dear Yuanhua,

This is more a question than an issue. I recognized that the counts in the fractions.tsv file can adopt non integer values, because the counts are the mean of Cnt_all .

Cnt_all is updated in your MH_propose function together with the Y and Psi value.
Could you please describe shortly the connection between the actual counts in the sequencing data and your Cnt_all? Unfortunately I am not able to figure it out by looking at your code our paper.

Many thanks,
Stephanie

brie-event cannot read .gtf file

Hi,

I am trying to use BRIE using a custom annotation file and the GENCODE M10 annotation file. In both cases, running brie-event as follows:

brie-event -a gencode.vM10.annotation.gtf

returns errors and does not succeed in producing the SE files. Note that my custom annotation has the same format as the GENCODE file, and neither file causes problems when used with other software tools. The format is:

GENCODE:
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; (...)

Custom:
chr1 mm10_index exon 4777525 4777648 100 - . transcript_id "PB.1.1"; gene_id "Mrpl15"; (...)

In the case of my custom annotation, the error is the following:

 File "/home/angeles/anaconda3/bin/brie-event", line 11, in <module>
    sys.exit(main())
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 166, in main
    sanitize=options.sanitize)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 59, in defineAllSplicing
    DtoA_F, AtoD_F, DtoA_R, AtoD_R = prepareSplicegraph(anno_file, ftype)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 38, in prepareSplicegraph
    DtoA_F, AtoD_F, DtoA_R, AtoD_R)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 80, in populateSplicegraph
    data =  readTable_gff(table_f, ftype)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 60, in readTable_gff
    exons.append([str(int(vals[3])-1), vals[4]])
UnboundLocalError: local variable 'exons' referenced before assignment

In the case of the GENCODE file, the error is slightly different, although it seems to involve the same two python scripts, event_marker.py and parseTables.py:

  File "/home/angeles/anaconda3/bin/brie-event", line 11, in <module>
    sys.exit(main())
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 166, in main
    sanitize=options.sanitize)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 59, in defineAllSplicing
    DtoA_F, AtoD_F, DtoA_R, AtoD_R = prepareSplicegraph(anno_file, ftype)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/event_maker.py", line 38, in prepareSplicegraph
    DtoA_F, AtoD_F, DtoA_R, AtoD_R)
  File "/home/angeles/anaconda3/lib/python3.6/site-packages/brie/events/parseTables.py", line 96, in populateSplicegraph
    for i in range(len(startvals) - 1):
TypeError: object of type 'map' has no len()

It seems to me that BRIE cannot get the right information from the .gtf files, and the variables are therefore empty when referenced downstream, which then causes the error... but I wonder why that is happening?

Thanks in advance,

Ángeles

BRIE does not support annotations other than GENCODE

The following error occurs when executing brie-event-filter using the Ensembl GTF file from ftp://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/Mus_musculus.GRCm38.82.chr.gtf.gz:

 $ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=GRCm38.p5.genome.fa
[fai_load] build FASTA index.
9908 Skipped Exon events are input for quality check.
0 Skipped Exon events pass the qulity control.
Traceback (most recent call last):
  File "venv/bin/brie-event-filter", line 9, in <module>
    load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 361, in main
    g_idx, g_chr, g_start, g_stop = get_gene_idx(anno_out)
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 39, in get_gene_idx
    g_idx.append([now_g, last_g])
UnboundLocalError: local variable 'last_g' referenced before assignment

The following command was used to generate SE.gff3:

$ brie-event -a Mus_musculus.GRCm38.82.chr.gtf -o  AS_events
Making GFF alternative events annotation...
  - Input annotation files: Mus_musculus.GRCm38.82.chr.gtf
  - Output dir: AS_events
('Reading table', 'Mus_musculus.GRCm38.82.chr.gtf')
Generating skipped exons (SE)
Generating retained introns (RI)
Generating mutually exclusive exons (MXE)
Generating alternative 3' splice sites (A3SS)
Generating alternative 5' splice sites (A5SS)
Took 0.69 minutes to make the annotation.

This error does not occur if the GENCODE GTF file from ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz is used to generate SE.gff3 and SE.gold.gff3 instead.

Data Interpretation

Hi all,

I'm an undergraduate research assistant who was asked to figure out how to install and use BRIE and BRIE-kit for my research lab, but as I am in no way a trained statistician, I have no clue where to start interpreting the brie-diff output.

If unclear, I'm referring to the ~.diff.tsv file and ~.diff.rank.tsv file that are both created after running the program. My data, for example, looks like this:

What can I do to visualize and understand these results?

Edit - I can access each column individually or in comparison in a program like RStudio, but without an understanding of statistics behind BRIE, I cannot use the data

Other spling types filter (A5SS, A3SS, MXE, RI)

Hi,
I did not see any examples analyzing other splicing types (A5SS, A3SS, MXE, RI) using BRIE. I used brie-event-filter to deal with gff3 of other splicing types, but I can not get gold.gtf or gold.gff3.

Comparing groups of single cells

Hi Yuanhua,
I want to perform differential splicing of different groups of single cells. Could you provide an example for groups as I did not find this in the manual.
Thanks

the output of brie-count don't contain the .h5ad file

Hi Yuanhua,

my code is as follows:

brie-count -a genes.gff3 -S cell.bam -b barcodes.tsv.gz -o test2/ -p 15 -t any

this process ran for 15 hours, however, the result files don't contain the improtant .h5ad file.

Is anything i miss? thanks

the output of brie-count has not .h5ad file

Hi Yuanhua,

my code is as follows:

brie-count -a genes.gff3 -S cell.bam -b barcodes.tsv.gz -o test2/ -p 15 -t any

this process ran for 15 hours, however, the result files don't contain the improtant .h5ad file.

Is anything i miss? thanks

Invalid divide value in brie_MH_Heuristic

Dearest Yuanhua,

Running brie with a specific bam file generates this error. Other bam files that I have tried work perfectly fine.

Traceback (most recent call last):                                                                      
  File "/home/user/anaconda3/envs/brie-env/bin/brie", line 11, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/envs/brie-env/lib/python2.7/site-packages/brie/brie.py", line 268, in main
    M=M,  Mmin=initial, gap=gap, nproc=nproc)
  File "/home/user/anaconda3/envs/brie-env/lib/python2.7/site-packages/brie/models/model_brie.py", line 451, in brie_MH_Heuristic
    FPKM_all = Cnt_all / tranLen.reshape(-1, 1) / total_count * 10**9
RuntimeWarning: invalid value encountered in divide

I believe that this is some sort of divide by zero error that prevents the rest of the analysis from running. No output files are created.

I have yet to fully investigate your code myself, but would there be any conditions that could cause one of the divisors to be zero?

Regards,
Trevor

	## maximum number of threads (to fix)
	# if options.nproc != -1:
	# tf.config.threading.set_inter_op_parallelism_threads(options.nproc)

	nproc = -1

huangyh09 / brie Goto Github PK

brie's People

Contributors

Stargazers

Watchers

Forkers

brie's Issues

Even with -t Any option, this command still gives error message and stuck like forever:

Recommend Projects

Recommend Topics

Recommend Org