This repo contains the source code for the velocyto.py
library.
For more information consult the velocyto.py documentation.
RNA velocity estimation in Python
Home Page: http://velocyto.org/velocyto.py/
License: BSD 2-Clause "Simplified" License
This repo contains the source code for the velocyto.py
library.
For more information consult the velocyto.py documentation.
Using extract_intervals
on a GENCODE gtf file like these resulted in this error:
ubuntu@olgabot-loom:~$ velocyto extract_intervals gencode.vM15.annotation.gtf -p gencode.vM15.annotation.velocyto-intervals.txt
2017-10-24 23:14:54,320 - DEBUG - Reading /home/ubuntu/gencode.vM15.annotation.gtf
Traceback (most recent call last):
File "/home/ubuntu/anaconda/bin/velocyto", line 11, in <module>
load_entry_point('velocyto==0.9.6', 'console_scripts', 'velocyto')()
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/velocyto/commands/extract_intervals.py", line 95, in extract_intervals
exonno = regex_exonno.search(tags).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
ubuntu@olgabot-loom:~$ which python
/home/ubuntu/anaconda/bin/python
ubuntu@olgabot-loom:~$ python --version
Python 3.6.2 :: Anaconda custom (64-bit)
Hi,
velocyto seems stuck on the writing of the loom file -- currently it's been stuck on the line below for 3 hours.
ds = loompy.create(filename=outfile, matrix=spliced, row_attrs=ra, col_attrs=ca, dtype="float32")
In vlm.default_fit_preparation(), numpy1.13 gives the following error:
I'm fairly sure a workaround is to intsall numpy1.11
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <built-in method query of sklearn.neighbors.kd_tree.KDTree object>
args = (array([[ -2.5302447 , -3.73565713, 4.00488269...0.42542586,
1.07867484, 2.27431581]]), 1281.0, True)
kwargs = {}
132
133 def __len__(self):
134 return self._size
135
...........................................................................
/home/seqteam/programs/miniconda3/lib/python3.6/site-packages/sklearn/neighbors/kd_tree.cpython-36m-x86_64-linux-gnu.so in sklearn.neighbors.kd_tree.BinaryTree.query()
...........................................................................
/home/seqteam/programs/miniconda3/lib/python3.6/site-packages/sklearn/neighbors/kd_tree.cpython-36m-x86_64-linux-gnu.so in sklearn.neighbors.kd_tree.NeighborsHeap.init()
TypeError: 'numpy.float64' object cannot be interpreted as an integer
Hey!
There's a typo in the analysis.py gene_knn_imputation function.
On line 1107 it's currently connectivity = (self.knn > 0).atype(float)
But it should be with connectivity = (self.knn > 0).astype(float)
Hi,
I was using velocyto run_smartseq2 command to run some bam files. Most of them work well, but some bam files will get back "IndexError: list index out of range" this error when reading the bam (after summarizing the results of intron validation). It seems like this:
We used GSNAP for RNAseq alignment, maybe there are some special cigar. For example, we think this kind of reads (below) will cause the error but are not very sure and don't know why. Is there some easy ways to find which reads are not satisfied the requirement and how to deal with them?
Thank you very much!
Seems like the output of samtools sort should be directed into the output folder, not the folder containing the input bam files (which may be read-only).
Hello again,
I wonder if you can help with this issue. I am trying to obtain a loom file from 10x data. I know the program runs well in human 10x data (tried and tested) but when I try to run it on a non-model organims I am working with, it gives me an error which I think it is to do with the type of annotation. I wonder if you could have a look at the error below? Have you seen this error before? I would be very greateful if you could help with this.
Many thanks in advance and apologies for the many questions
PS: I am have also attached the velocyto log file. Hope you can see it
Cheers, Carmen
2018-05-16 20:42:58,966 - WARNING - The entry exon_number was not present in the gtf file. It will be infferred from the position.
2018-05-16 20:43:02,117 - DEBUG - Parsing Chromosome SM_V7_1 strand - [line 0]
2018-05-16 20:43:07,784 - DEBUG - Fixing corner cases of transcript models containg intron longer than 1000Kbp
/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/bin/velocyto", line 11, in
sys.exit(cli())
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/commands/run10x.py", line 104, in run10x
samtools_memory=samtools_memory, dump=dump, verbose=verbose, additional_ca=additional_ca)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/commands/_run.py", line 180, in _run
annotations_by_chrm_strand = exincounter.read_transcriptmodels(gtffile)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/counter.py", line 512, in read_transcriptmodels
tm.chop_if_long_intron() # Change it in place
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/transcript_model.py", line 93, in chop_if_long_intron
long_feats = [i for i in self.list_features if len(i) > maxlen and i.kind == ord("i")]
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/transcript_model.py", line 93, in
long_feats = [i for i in self.list_features if len(i) > maxlen and i.kind == ord("i")]
ValueError: len() should return >= 0
Hi,
I'm using Velocyto on data generated from 10x genomics cellranger pipeline, and project velocity onto embeddings produced from Seurat and scanpy. I found that some of my marker genes are barely detected in Velocyto pipeline, but are detected in hundreds of cells from the cellranger output, .
For example:
'GSX-2'
cellranger count (total 10k cells):
nonzero_cells 898.0
total_count 1233.0
Velocyto spliced count:
nonzero_cells 1.0
total_count 1.0
Velocyto un-spliced count:
nonzero_cells 0.0
total_count 0.0
Here is a plot of counts for ~20,000 genes:
It looks like many genes are detected with far less counts in the Velocyto pipeline.
Below is the CLI code I used to produce the loom file:
velocyto run10x -m ~/Data1/data/hrch38_rmsk.gtf ./14741X1 ~/Data1/data/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf
The bam files were generated using cellranger 2.0.2 using the same reference and sorted during velocyto run10x. I tried run10x without repeat mask and got similar result.
Do you know what's the reason for the much lower gene detection in Velocyto?
Thanks!
Yueqi
I'm seeing the following error message after running "pip install velocyto":
Collecting velocyto
Using cached velocyto-0.13.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-8iDgnS/velocyto/setup.py", line 14
package_data: dict = {}
^
SyntaxError: invalid syntax
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-8iDgnS/velocyto/
I have all of the proper dependencies installed, on a Ubuntu16.04 machine, running conda 4.4.7. Any ideas whats going wrong?
Hi,
when setting clusters from an external data frame (eg., read Seurat output using pandas with vlm.set_clusters(my_annotation.loc[vlm.ca['CellID'],'cell.type'])
), the dtype of vlm.cluster_labels is 'O'
, which cannot be dumped into hdf5 (TypeError: Object dtype dtype('O') has no native HDF5 equivalent
).
using vlm.cluster_labels=vlm.cluster_labels.astype(np.string_)
fixes this problem but creates byte strings which is maybe not optimal. some unicode compatibility problem?
great package by the way!
when filtering cells using vlm.filter_cells(totals > 500)
, only vlm.S
and vlm.U
get updated (resized), but vlm.initial_cell_size
and vlm.initial_Ucell_size
keep their original shapes. this leads to a
ValueError: operands could not be broadcast together with shapes (ncells_before,) (ngenes, ncells_after)
when calling vlm.normalize_by_total()
using vlm.initial_cell_size=vlm.initial_cell_size[(totals > 500)]
fixes this problem
Hi,
Not sure where the .loom
output file is located after running the following:
$velo multi10x -n 4 -l $log_folder -m $repmask $parentfolder $intervals
All I see in STDOUT is this:
2017-12-02 04:00:24,912 - DEBUG - Attempting to start 4 processes
Then I ran velocyto on each sample individually as a background process:
for sample_path in $parentfolder/*;
do
echo $sample_path
sample=`basename $sample_path`
$velo run10x -m $repmask $sample_path $intervals 1> logs_velocyto/${sample}.log 2> logs_velocyto/${sample}.err &
done
It seems to be running each sample in the background well enough right now:
2017-12-04 10:03:41,978 - DEBUG - Read first 100 million mapped reads
2017-12-04 10:05:21,292 - DEBUG - Marking up chromosome 17
2017-12-04 10:06:25,369 - DEBUG - Read first 110 million mapped reads
2017-12-04 10:08:30,929 - DEBUG - Marking up chromosome 18
Thus far I see a velocyto/
folder in the $sample_path
,
Where is the output going to be? Perhaps make this more explicit in the documentation.
Any comments on the "multi10x" subcommand?
I'm trying to install velocyto on macos High Sierra using conda 4.5.0 python version Python 3.6.5.
All the dependencies install fine but pip install velocyto
yields the following error:
Building wheels for collected packages: velocyto
Running setup.py bdist_wheel for velocyto ... error
...
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/harmbrugge/anaconda3/envs/3point6/include -arch x86_64 -I/Users/harmbrugge/anaconda3/envs/3point6/include -arch x86_64 -I/Users/harmbrugge/anaconda3/envs/3point6/lib/python3.6/site-packages/numpy/core/include -I/Users/harmbrugge/anaconda3/envs/3point6/include/python3.6m -c velocyto/speedboosted.c -o build/temp.macosx-10.7-x86_64-3.6/velocyto/speedboosted.o -fopenmp -ffast-math
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1
I think the problem is that i'm using an old gcc version and tried installing the latest version using homebrew: brew install cmake gcc
, but I'm not sure how to link the new gcc instalation with pip
Hi,
Thanks for Velocyto, it's very interesting.
I try to launch "velocyto run" on our data.
Our scRNA-seq protocol is inspired by the STRT-Seq protocol. The results are in UMI counts and the reads are from the 5 prime end of the mRNA (not like 10x where there are from the 3 prime end).
I used the Drop-Seq pipeline (and STAR) to obtain a mapped bam file where the cell barcode and UMI barcode are contained in the XC and XM tags.
I noticed that counter.py takes care of these specific tags from Drop-Seq.
Unfortunately, I have got very poor results when I launch "Velocyto run" on my bam which contains 96 cells. Only a few genes are detected and we do not have a lot of spliced, unspliced and ambiguous values.
Also, it may be part of this issue: In R, when I look at the rownames(ldat$spliced), there are a lot of duplicated genes...
Did I miss something?
Normally (without velocyto), I have more than 1000 genes detected per cell.
Thank you
Hi,
I ran the following commands
samtools sort -t CB -O BAM -o cellsorted_possorted_genome_bam.bam possorted_genome_bam.bam
velocyto run -b filtered_gene_bc_matrices/mm10/barcodes.tsv -m ../mm10_rmsk.gtf -o preprocessed_bam possorted_genome_bam.bam ../genes.gtf
However the resulting splicing matrix S in the loom file seems to be all 0. I have another dataset (different conditions but same platform -- 10x) but for that, the splicing matrix looks fine (though the PCA visualization of the top 2 components is different compared to what it looks like when doing it for the counts from the 10x software processing). For a third dataset, the splicing matrix is also all 0s.
Any help would be appreciated!
--Sid
Hello, I am relatively new to python. Sorry if I am missing any information.
Velocyto installation completed, but when running velocyto --help
to test installation I receive the following error:
$ velocyto --help
Traceback (most recent call last):
File "/anaconda3/bin/velocyto", line 7, in <module>
from velocyto.commands.velocyto import cli
File "/anaconda3/lib/python3.6/site-packages/velocyto/__init__.py", line 13, in <module>
from .estimation import fit_slope, _fit1_slope, clusters_stats
File "/anaconda3/lib/python3.6/site-packages/velocyto/estimation.py", line 7, in <module>
from .speedboosted import _colDeltaCor, _colDeltaCorLog10, _colDeltaCorSqrt
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so, 2): Symbol not found: _GOMP_parallel
Referenced from: /anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so
Expected in: flat namespace
in /anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so
I think this likely has something to do with my gcc version, but I'm not sure how to fix the problem.
$ python -V
Python 3.6.3 :: Anaconda custom (64-bit)
$ which gcc
/anaconda3/bin/gcc
$ gcc --version
gcc (GCC) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I'm having some problems getting velocyto installed properly with python-3.6.1. The package appears to install properly, but doesn't import. I get the following matplotlib issue:
Successfully installed velocyto
You are using pip version 9.0.1, however version 9.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@mskilab02 velocyto.py]# python
Python 3.6.1 (default, Mar 19 2018, 15:38:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import velocyto
/path/software/python-3.6.1/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/Software/src/velocyto.py/velocyto/__init__.py", line 15, in <module>
from .analysis import VelocytoLoom, scatter_viz, ixs_thatsort_a2b, load_velocyto_hdf5
File "/path/Software/src/velocyto.py/velocyto/analysis.py", line 1936, in <module>
colors20 = np.vstack((plt.cm.Vega20b(np.linspace(0., 1, 20))[::2], plt.cm.Vega20c(np.linspace(0, 1, 20))[1::2]))
AttributeError: module 'matplotlib.cm' has no attribute 'Vega20b'
Loompy appears to have installed correctly though...
>>> import loompy
/path/software/python-3.6.1/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Any troubleshooting tips? Should I be trying a different python? Thanks for any help.
Hello,
I'm trying to install velocyto into an ubuntu-based singularity image. However, I am failing at the install with the error in the title:
+ pip install velocyto
Collecting velocyto
Downloading velocyto-0.12.1.tar.gz (180kB)
100% |################################| 184kB 2.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-rCTsse/velocyto/setup.py", line 14
package_data: dict = {}
^
SyntaxError: invalid syntax
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-rCTsse/velocyto/
ABORT: Aborting with RETVAL=255
I can see this line in the setup.py file, but I'm not familiar enough with python to understand what this line is trying to do.
Concerning the compute environment, the recipe includes these steps with regard to python:
apt-get -y install python python3 python-pip python3-pip git
pip install --upgrade pip
pip3 install --upgrade pip
pip install numpy scipy cython numba matplotlib scikit-learn h5py click pysam
pip install velocyto
Do you have any ideas what is causing this?
Thanks!
Hey there!
I'm applying velocyto to aggregated data collected from several time points of a differentiation time course experiment. I was wondering if I should be cautious about situations where cells from different time points (eg. before and immediately after induction) may have similar gene expression, but could have different underlying velocities. Are kNNs determined based on total expression? Would the kNN smoothing be affected by this?
The reason I ask is because I'm noticing that I have cells in my non-induced population (blue) with a high velocity towards the more-differentiated populations without the accumulation of these cells at some terminal state (the cells shouldn't be dying and it seems unlikely that I managed to measure the cells just as this started to happen and they have).
Any chance you have any insight or suggestions here? Thanks! :)
Velocities produced with default pipeline (default_fit_preparation)
I ran this command, for SmartSeq2 data:
velocyto run_smartseq2 -o ./velocyto star/22606_3#7*/22606_3#7*.2pass.Aligned.sortedByCoord.dedup.bam $GTF
I was using a bam file with some reads mapped to ERCC spike-ins, so I have a GTF file with ERCC transcripts described like this (for example):
ERCC-00157 ERCC exon 1 1019 0.000000 + . gene_id "ERCC-00157"; transcript_id "DQ839618";
ERCC-00158 ERCC exon 1 1027 0.000000 + . gene_id "ERCC-00158"; transcript_id "DQ516795";
ERCC-00160 ERCC exon 1 743 0.000000 + . gene_id "ERCC-00160"; transcript_id "DQ883658";
ERCC-00162 ERCC exon 1 523 0.000000 + . gene_id "ERCC-00162"; transcript_id "DQ516750";
ERCC-00163 ERCC exon 1 543 0.000000 + . gene_id "ERCC-00163"; transcript_id "DQ668359";
ERCC-00164 ERCC exon 1 1022 0.000000 + . gene_id "ERCC-00164"; transcript_id "DQ516779";
ERCC-00165 ERCC exon 1 872 0.000000 + . gene_id "ERCC-00165"; transcript_id "DQ668363";
ERCC-00168 ERCC exon 1 1024 0.000000 + . gene_id "ERCC-00168"; transcript_id "DQ516776";
ERCC-00170 ERCC exon 1 1023 0.000000 + . gene_id "ERCC-00170"; transcript_id "DQ516773";
ERCC-00171 ERCC exon 1 505 0.000000 + . gene_id "ERCC-00171"; transcript_id "DQ854994";
And this seems to cause an error:
2018-03-25 21:03:50,535 - INFO - Load the annotation from /hps/nobackup/stegle/datasets/references/human/GRCh37/gencode.v19.annotation_ERCC.gtf
2018-03-25 21:03:59,379 - DEBUG - Parsing Chromosome ERCC-00002 strand + [line 0]
Traceback (most recent call last):
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/bin/velocyto", line 11, in <module>
load_entry_point('velocyto', 'console_scripts', 'velocyto')()
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/run_smartseq2.py", line 72, in run_smartseq2
samtools_memory=1, dump=dump, verbose=verbose, additional_ca=additional_ca)
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/_run.py", line 180, in _run
annotations_by_chrm_strand = exincounter.read_transcriptmodels(gtffile)
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/counter.py", line 467, in read_transcriptmodels
trname = regex_trname.search(tags).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
I got around this no problem by just removing the ERCC chromosomes from the gtf, but thought maybe this would be an issue for others, since I think quite a few people have used ERCC spike-ins with the SmartSeq2 protocol.
I am running extract_intervals on an mm10 annotation (from ENCODE but with the transcript_name field added at the end to solve that issue). It has 23337 genes and 30455 transcripts. It seems to take tons of memory - using up all the 64 gigs of ram I have and then eating into swap space, which obviously makes it run very slow - 20โ to process the first 2500 genes.
How much memory use is expected for the extract_intervals program? I assume there is a problem somewhere here.
Running pipreqs
on the repo resulted in more dependencies than in the installation requirements, and I'm not sure which you prefer to be conda vs pip.
conda install numpy scipy cython scikit-learn h5py click
pip install pysam loompy
pipreqs
(loom-env) โ velocyto.py git:(edit_on_github) /Users/olgabot/anaconda3/envs/loom-env/bin/pipreqs .
INFO: Successfully saved requirements file in ./requirements.txt
(loom-env) โ velocyto.py git:(edit_on_github) cat ./requirements.txt
click==6.7
h5py==2.7.0
scipy==0.19.1
setuptools==36.5.0.post20170921
numpy==1.13.3
numba==0.35.0+6.gaa35fb1
matplotlib==2.1.0
pysam==0.12.0.1
Cython==0.27.2
loompy==1.0.0
scikit_learn==0.19.1
typing==3.6.2
The documentation states that pip install velocyto should be enough. This is incorrect. As per conda, all the deps will need to be installed first.
omendivi$ velocyto
/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/anaconda3/bin/velocyto", line 11, in
load_entry_point('velocyto', 'console_scripts', 'velocyto')()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2755, in load_entry_point
return ep.load()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2408, in load
return self.resolve()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2414, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/bin/velocyto.py/velocyto/init.py", line 13, in
from .estimation import fit_slope, _fit1_slope, clusters_stats
File "/usr/local/bin/velocyto.py/velocyto/estimation.py", line 7, in
from .speedboosted import _colDeltaCor, _colDeltaCorLog10, _colDeltaCorSqrt
ImportError: dlopen(/usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so, 2): Symbol not found: _GOMP_parallel
Referenced from: /usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so
Expected in: flat namespace
in /usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so
Hi,
I have extensively read the documentation and still have questions:
run
stage?Thanks!
import velocyto as vcy
vlm = vcy.VelocytoLoom("myloomfile.loom")
vlm.normalize("S", size=True, log=True)
vlm.default_filter_and_norm()
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:670: RuntimeWarning: divide by zero encountered in true_divide
adj_factor = predicted / y
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:707: RuntimeWarning: divide by zero encountered in true_divide
self.U_sz[:, ~self.small_U_pop] = self.U_sz[:, ~self.small_U_pop] * (np.median(self.U_sz[:, ~self.small_U_pop].sum(0)) / self.U_sz[:, ~self.small_U_pop].sum(0))
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:707: RuntimeWarning: invalid value encountered in multiply
self.U_sz[:, ~self.small_U_pop] = self.U_sz[:, ~self.small_U_pop] * (np.median(self.U_sz[:, ~self.small_U_pop].sum(0)) / self.U_sz[:, ~self.small_U_pop].sum(0))
Hi there,
First of all, I've already run velocyto on some 10x data and it's looking really interesting, so thanks a lot for developing the tool!
I'm now hoping to compare against some data produced with the sci-RNA-seq protocol (Cao et al. 2017. At the moment I'm processing the data with zUMIs, which seems to do a pretty good job of filtering the barcodes.
Now I'm trying to figure out whether it's best to try to use velocyto run directly or to adapt one of the existing shortcuts, such as run_dropest. As far as I can tell, for either one I would need to pass a list of the filtered barcodes as well as adding the barcodes as CB tag in the filtered bam files. This is not done by default with zUMIs but should be do-able.
One thing I'm not sure about is how to handle the UMIs. In sci-RNA-seq they are concatenated together with the cell barcodes in R1. Would I need to also add them as a tag in the filtered bam file, or as part of the reads themselves?
Thanks for your help - I would appreciate any suggestions.
Best,
Sarah
Hi,
I was using velocyto to analysis my own data (about 90k cells
). however, when I do estimate_transition_prob
, the python always crush down without any hits. Here is my code:
vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt",threads=8, n_neighbors=4000, knn_random=True, sampled_fraction=0.1)
and output:
2018-05-06 20:41:29,612 - DEBUG - Calculate KNN in the embedding space 2018-05-06 20:44:23,179 - DEBUG - Correlation Calculation 'knn_random'
my server is ubuntu 16.04 with 512GB RAM. Is there some easy ways to deal with it?
Thank you~
Hi sorry if I'm missing something obvious here, but when running run_smartseq2
is there a way of adding metadata information.
Im currently going through the API info but can find any reference to this.
Thanks.
I'm attempting to run Velocyto within a condo environment, a la issue #32. Upon calling the run
command, I get a missing argument error that appears to originate from the click module:
Invocation:
velocyto run bam.dir/Uber_sort.bam mm10_genes.gtf
Traceback (most recent call last):
File "/home/morgan02/.local/bin/velocyto", line 11, in
sys.exit(cli())
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
TypeError: run() missing 1 required positional argument: 'test'
Following the instructions, I've been able to (seemingly sucessfully) install velocyto via pip
and installing from source via github, i.e.
git clone https://github.com/velocyto-team/velocyto.py.git
cd velocyto.py
pip install -e . # note the trailing dot
outputs
Installing collected packages: velocyto
Found existing installation: velocyto 0.13.8
Uninstalling velocyto-0.13.8:
Successfully uninstalled velocyto-0.13.8
Running setup.py develop for velocyto
Successfully installed velocyto
However, I cannot find the executable
$ velocyto --help
bash: velocyto: command not found
As I'm using multiple versions of Python, this isn't set-up as the "default" python. Where is this executable?
Hi there,
first of all, thank you very much for the great package and the awesome preprint, really enjoyed reading your analysis using RNA velocity and the theory behind it.
I was going through the python tutorial and was wondering whether you are planning to add a run mode, that would make it easy to run velocyto with BAM files from the Drop-seq pipeline.
The format for BAM files created with this pipeline is described here:
Dropseq Alignment Cookbook
Basically, the Cell barcode for each read is tagged with the XC tag and the molecular barcode with the XM tag. I think it would be really cool to use your approach on some Drop-seq data. Maybe there is already a way to use the package on these kinds of BAM files and I haven't noticed if so, please excuse and close the issue. If not, I think there are quite a few people using Drop-seq, that would be interested to apply velocyto to their data!
Best,
Florian
Hello! Thanks again for your help with the interval file. I was able to use it to create a .loom file for a practice version of our dataset and now I'm working through the analysis pipeline tutorial.
One of the ways to get started mentions using vlm.default_filter_and_norm()
. When I run that, I appear to get an index out of bounds error. I think might have to do with the score_detection_levels()
function reducing the number of genes from ~60,000 to 441 but then the score_cv_vs_mean()
function is still trying to use more than 441 genes...? But perhaps it's for another reason, not sure.
Would this just mean that you have to experiment with providing inputs to the vlm.default_filter_and_norm()
function or do you expect that the default function should have worked here?
Here's the traceback:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-66-9ec18bcc69cb> in <module>()
----> 1 vlm.default_filter_and_norm()
~/ebs/workspace/velocyto.py/velocyto/analysis.py in default_filter_and_norm(self, min_expr_counts, min_cells_express, N, min_avg_U, min_avg_S)
1587 self.filter_genes(by_detection_levels=True)
1588
-> 1589 self.score_cv_vs_mean(N=N, max_expr_avg=40)
1590 self.filter_genes(by_cv_vs_mean=True)
1591
~/ebs/workspace/velocyto.py/velocyto/analysis.py in score_cv_vs_mean(self, N, min_expr_cells, max_expr_avg, svr_gamma, winsorize, winsor_perc, plot)
274 ff = fitted_fun(log_m[:, None])
275 score = log_cv - ff
--> 276 nth_score = np.sort(score)[::-1][N]
277 if plot:
278 scatter_viz(log_m[score > nth_score], log_cv[score > nth_score], s=3, alpha=0.4, c="tab:red")
IndexError: index 1000 is out of bounds for axis 0 with size 441
Dear velocyto team,
I am currently following your tutorial online to learn how to work with velocyto to then apply the same code to my data. Unfortunately, I cannot get passed the vlm.normalize_median() step in your script. Maybe it is something obvious so apologies in advanced if this is the case. I should say that I am not working in ipython due to a system and administration issues with all the dependencies. Instead I am submitting my script to a computer farm. I am also following exactly what is in your code. The only exception are the modules that I import which are almost the same except for the ipython/display import.
import sys
import numpy as np
from scipy.stats import norm
from scipy.spatial.distance import pdist, squareform
import matplotlib
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import loompy
import velocyto as vcy
import logging
So... the error I keep getting is the following:
Traceback (most recent call last):
File "test.py", line 81, in
vlm.normalize_median()
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/velocyto/analysis.py", line 898, in normalize_median
self.Sx_sz = self.Sx * (np.median(self.Sx.sum(0)) / self.Sx.sum(0))
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4119, in median
overwrite_input=overwrite_input)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4033, in _ureduce
r = func(a, **kwargs)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4152, in _median
part = partition(a, kth, axis=axis)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 664, in partition
a.partition(kth, axis=axis, kind=kind, order=order)
ValueError: kth(=9106) out of bounds (1)
I am also attaching the script that I am running in case.
I would be very greateful if you have any ideas of what the problem is.
Kind regards,
test.txt
Carmen
Using the repeat masking GTF from the UCSC genome browser, velocyto extract_repeats
results in the following error:
$ velocyto extract_repeats mm10_rmsk.gtf
2017-10-31 17:50:58,709 - DEBUG - Sorting by `sort -k1,1 -k7,7 -k4,4n /home/ubuntu/mm10_rmsk.gtf > /home/ubuntu/mm10_rmsk_sorted.gtf`
2017-10-31 17:51:05,265 - DEBUG - Opening /home/ubuntu/mm10_rmsk_sorted.gtf, output will be at /home/ubuntu/mm10_rmsk_joined.gtf
Traceback (most recent call last):
File "/home/ubuntu/anaconda/bin/velocyto", line 11, in <module>
load_entry_point('velocyto==0.9.9', 'console_scripts', 'velocyto')()
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/velocyto/commands/extract_repeats.py", line 54, in extract_repeats
chrom, feature_class, feature_type, start, end, junk, strand, junk, tags = fields
ValueError: too many values to unpack (expected 9)
This looks like a malformed GTF issue or something else to do with reading each line of the gtf.
Hi, I am trying to run velocyto on data generated using SmartSeq2, using the current dev branch.
I ran the command:
velocyto run -l SmartSeq2 -o ./velocyto -U $BAMPATH $GTFPATH
And I get the following error:
Traceback (most recent call last):
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/bin/velocyto", line 11, in <module>
load_entry_point('velocyto', 'console_scripts', 'velocyto')()
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/run.py", line 114, in run
samtools_memory=samtools_memory, dump=dump, verbose=verbose, additional_ca=additional_ca)
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/_run.py", line 156, in _run
exincounter.peek(bamfile[0])
File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/counter.py", line 148, in peek
raise IOError("The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.")
OSError: The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.
I checked my BAM and GTF files are correct and compatible. The first few lines of my BAM are like this:
HS34_22606:3:2212:13187:68945 419 chr1 156025 0 122M = 156265 344 GTCTTGCTCTATCACCTAGGCTGGACTGCAGTGCTGCAATCACAGTTAACTGCAGCCTCAACCTCCAGGGCTTGAGCAATATTCCCATCTAATTTTTATCTTGTTTAAGAAGTGCAGTCTTG BBBBB<BFFBFFF/<F/</FF<F7FFFFFB/<BFBFFF/F<FFBFB</<<</<F<F</BF/</BFFB///<F/</B///<</</FBBB/B/FF<BF//B///<<F/B//B/<//7/BF///B MD:Z:33G65T11A10 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:6 HI:i:2 NM:i:3 AS:i:208
HS34_22606:3:2212:13187:68945 339 chr1 156265 0 21S104M = 156025 -344 CACGGACTCCAGGGGTGCCCAGCCACGAGTAGCTCCTTTTTTTTTTTTAATATTTAGTAGAGACAAAGTGTCACCATGTTGACCAGGTTGGTGGTGATCTCCTACACTCAGGCAGTTCTCTCACC 7//BB7/<7B77/////7////</B///B7//B<<///BBB/FBF/FF</////<B/F<B/FFFBFFFFFF</<<BBB</FFBBF//BF</<</FBFFFFFFB/FBFFFFBFF<BBBB/<BBB</ MD:Z:2A3T7A15T22T50 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:6 HI:i:2 NM:i:5 AS:i:208
HS34_22606:3:1314:4949:83531 99 chr1 340951 255 9M42S = 566538 225711 GGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT //BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:9 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:1 HI:i:1 NM:i:0 AS:i:119
HS34_22606:3:2310:16017:71190 137 chr1 526895 255 8S116M * 0 0 GGCGAAAAACGAGGACAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCATGGTAGTGGGCGCCTGTAGTCCCAGCTACTTGGGA /////<//<//<////<//B/F/</BB///<</F<<///<//<F<F<BBFFF/FFFFFFBF/FBFF<FFFFBBBB/FBB<7FF/<BBBFB<BFFFFF/7<<7/BBFF/7B7/7/B7B//7B/BB MD:Z:6T68T9G30 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:1 HI:i:1 NM:i:3 AS:i:108
HS34_22606:3:1316:9877:80870 355 chr1 564464 3 8S115M = 564664 324 GAGTACATGGGAGTCCGAACTAGTCTCAGGCTTCAACATCGAATACGCCGCAGGCCCCTTCGCCCTATTCTTCATAGCCGAATACACAAACATTATTATAATAAACACCCTCACCACTACAAT /BBBB<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFF/BFFFFFFFFFFFFFFFBFFFFFFFF<F MD:Z:115 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:2 HI:i:2 NM:i:0 AS:i:237
HS34_22606:3:1213:5571:77912 163 chr1 564659 3 125M = 564673 139 CCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACTCATACACCTCCTATGAAAAAACTTCCTACCACTCACCCTAGCATTACTTATATGATATGTCTCCATACCCATT BBBBBFBFFFFFFFF/FFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFB MD:Z:125 PG:Z:MarkDuplicates RG:Z:22606_3#76 NH:i:2 HI:i:1 NM:i:0 AS:i:248
I think I'm on the most up-to-date dev branch. The output of pip freeze for velocyto version is:
-e git+https://github.com/velocyto-team/velocyto.py.git@e7b2a82dc8d23f47d190f7c4d3f326892d5ba2b8#egg=velocyto
Thanks for making the software, and for releasing as a preprint for people to read about and try while still under development.
Hi,
Package install is very clean. Thanks! I installed as an conda environment because of the python 3.6 requirement. You may consider adding this to the docs for other users.
conda env create -f velocyto.yml
where velocyto.yml is:
name: velocyto_0.11
dependencies:
- python=3.6
- numpy
- scipy
- cython
- numba
- matplotlib
- scikit-learn
- h5py
- click
- pip:
- velocyto
To run velocyto after install, one needs to activate the environment (Linux version):
source activate velocyto_0.11
Has team velocyto tested the installation on the Mac operating system? Myself and colleagues have have run into roadblocks with the clang compiler. gcc-4.8 appeared to solve the issue for one of us, allowing her to successfully install velocyto, but that compiler is not compatible with the latest Mac OS 10.13.
Do you have general advice for installing velocyto on a Mac? Would really appreciate it!
Hi,
When running velocyto run on a bam file generated with the cellranger pipeline, I get the following error:
2017-11-30 04:28:37,293 - DEBUG - Save 3' junction/exon read counts
2017-11-30 04:28:44,852 - DEBUG - Collecting genes structural info statistics
2017-11-30 04:29:41,614 - DEBUG - Mapping statistics have been saved to D26/possorted_genome_bam_5ON9O_mapstats.hdf5
2017-11-30 04:29:41,616 - DEBUG - Generating output file D26/possorted_genome_bam_5ON9O.loom
2017-11-30 04:29:41,617 - DEBUG - Collecting row attributes
2017-11-30 04:29:41,732 - DEBUG - Generating data table
Traceback (most recent call last):
File "/home/users/jmehtone/miniconda2/envs/velocyto/bin/velocyto", line 11, in <module>
sys.exit(cli())
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/velocyto/commands/run.py", line 90, in run
intron_validation=introns, additional_ca=additional_ca)
File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/velocyto/commands/_run.py", line 205, in _run
total = spliced + unspliced + ambiguous
MemoryError
velocyto was installed to a fresh conda environment with python 3.6 following the installation guide from the docs. velocyto v0.11.0 and loompy v1.1.0 used. Intronic/exonic intervals and repeats annotations were generated as guided by the docs.
Any idea what might be causing this? The bam file consists of around 2k cells with around 200 million mapped reads. This is run on a computing server, so resources (CPU/RAM/disk space) should not be the problem. The bam file itself is ok as I have processed it with other tools.
Sorry for disturbing you again, but after you fixed the former problem, I get a new one.
Also when using velocyto run_smartseq2 command to run bam files, I get the error โAttributeError: 'list' object has no attribute 'add'โ. It likes this:
It seems to happen where you fix your code yesterday. Here is the link (below) of two of my samples which will get this error, if you would like to try to run them using smartseq2 logic.
https://1drv.ms/f/s!Ah6kTQEKGo66giSsQqtqeLOF8j8c
Thank you very much for making the software and helping us solve the problems.
Dear velocyto team,
I have installed velocyto and dependencies following the installation guide. However, the program crashes simply when doing velocyto --help. It is funny because I have succeeded in different computers but now I just don't know what the problem could be. Bellow is the error I am getting, it seems to be related to matplotlib.cm. Please let me know if you recognise this problem
velocyto --help
Traceback (most recent call last):
File "/homes/iimaz/miniconda3/bin/velocyto", line 7, in
from velocyto.commands.velocyto import cli
File "/homes/iimaz/miniconda3/lib/python3.6/site-packages/velocyto/init.py", line 15, in
from .analysis import VelocytoLoom, scatter_viz, ixs_thatsort_a2b, load_velocyto_hdf5
File "/homes/iimaz/miniconda3/lib/python3.6/site-packages/velocyto/analysis.py", line 1936, in
colors20 = np.vstack((plt.cm.Vega20b(np.linspace(0., 1, 20))[::2], plt.cm.Vega20c(np.linspace(0, 1, 20))[1::2]))
AttributeError: module 'matplotlib.cm' has no attribute 'Vega20b'
Thank you!
Ivan
Hey! I've installed velocyto and am looking to use it to create the ivls.txt file from a gtf. In particular, I'm trying to use the ensembl human genome gtf here: http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
I started the process with the following command:
velocyto extract_intervals Homo_sapiens.GRCh37.75.gtf -p GRCh37
And it starts running but then after about 2 hours, it failed with what I think is a dictionary lookup error for a gene name? Here is the traceback:
2017-11-16 23:45:55,547 - DEBUG - 47500 genes already processed
2017-11-16 23:51:16,930 - DEBUG - 50000 genes already processed
2017-11-16 23:56:46,434 - DEBUG - 52500 genes already processed
Traceback (most recent call last):
File "/home/ubuntu/ebs/anaconda3/bin/velocyto", line 11, in <module>
sys.exit(cli())
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/velocyto/commands/extract_intervals.py", line 152, in extract_intervals
othertrs.update(binnedtrs_by_chromstrand[chromstrand][trbin])
KeyError: 'HG1007_PATCH+'
I looked into the Homo_sapiens.GRCh37.75.gtf file and could not find the pattern 'HG1007_PATCH+' but I did find it several times without the '+' (just 'HG1007_PATCH'). I'm wondering if there is a chance the + got accidentally attached to the gene name when it was defined as a dictionary key at some point? Or if you have any suggestions for finding where this gene name edit might have happened?
Thanks!
Hi,
The prepare_markov function has two required parameters -- sigma_D and sigma_W. However I'm unclear on what parameters those correspond to in the manuscript. If I could have clarification on that, that would be much appreciated!
Thanks!
Command: velocyto run -o vel-run-test -m mm10_rmsk_joined.gtf test.bam mm10-1.2.0_gene_ivls.txt
File "/miniconda/lib/python3.6/site-packages/velocyto/commands/run.py", line 87, in run
additional_ca)
File "/miniconda/lib/python3.6/site-packages/velocyto/commands/_run.py", line 84, in _run
exincounter = vcy.ExInCounter(valid_bcs2idx)
UnboundLocalError: local variable 'valid_bcs2idx' referenced before assignment
Hi,
Thanks for developing such an exciting piece of technology! I am currently trying to run the velocyto CLI to generate a .loom file. However, when I submit the run10x command I receive the following error:
TypeError: run10x() missing 1 required positional argument: 'molrep'
However, according to the tutorial, molrep does not appear to be an option in the 10x command. If I try to use the molrep option as defined in the run command in the run10x command I receive this error:
Error: no such option: --molrep
If you have any suggestions on how to correct this issue it would be greatly appreciated.
I've installed velocyto.py from source in a cluster environment running on CentOS (6.9).
Im trying to use the cli run command like so:
velocyto run -b barcodes.tsv -o ~/output/ possorted_genome_bam.bam hg19.gtf
Yielding the following error:
2018-04-05 16:20:21,604 - DEBUG - No SAMPLEID specified, the sample will be called possorted_genome_bam_5N1Y2
2018-04-05 16:20:21,605 - DEBUG - Using logic: Default
2018-04-05 16:20:21,612 - DEBUG - Read 4913 cell barcodes from /target/gpfs2/groups/umcg-wijmenga/tmp02/projects/scRNAseq_10X_pilot/pilot4/pilot4_lane_2/outs/filtered_gene_bc_matrices/hg19/barcodes.tsv
2018-04-05 16:20:21,612 - DEBUG - Example of barcode: AAACCTGAGATGTGTA and cell_id: possorted_genome_bam_5N1Y2:AAACCTGAGATGTGTA-1
Traceback (most recent call last):
File "./velocyto", line 11, in <module>
load_entry_point('velocyto==0.13.8', 'console_scripts', 'velocyto')()
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/velocyto/commands/run.py", line 102, in run
samtools_memory=samtools_memory, additional_ca=additional_ca)
File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/velocyto/commands/_run.py", line 110, in _run
mb_available = int(subprocess.check_output('grep MemAvailable /proc/meminfo'.split()).split()[1]) / 1000
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['grep', 'MemAvailable', '/proc/meminfo']' returned non-zero exit status 1.
Do you have any idea what might cause this issue?
I was using the default filtering and normalization but got this error. Is there way to get around this? Thanks!
vlm.default_filter_and_norm()
/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/analysis.py:695: RuntimeWarning: divide by zero encountered in true_divide
adj_factor = predicted / y
Have you used bioconda to install pysam? It comes pre-built with many extensions so you don't need to. I always worry about installing C-dependent libraries with pip because it's been so problematic for me.
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install pysam
In the documentation, there is a command to perform velocyto run
on a single bam file, but how would you do this for 100s of cells at a time? Do you recommend aggregating all the bam files together and adding a unique ID to the "read_group" field?
Hi,
I was trying to follow the procedure to run velocyto. However I got the error message when using 10X genomics cellranger output data, until the last step to form loom file. Is there possible reason why this happen? Thanks!
The command I used was:
~/anaconda3/bin/velocyto run10x -m ~/reference/mm10_rmsk.sorted.gtf ~/cellrangerpath/count-WT/ ~/reference/refdata-cellranger-mm10-1.2.0/genes/genes.gtf
It was all good until writing to data:
...
2018-01-25 02:58:37,430 - DEBUG - 866 reads not considered because fully enclosed in repeat masked regions
2018-01-25 02:58:37,681 - DEBUG - Counting done!
2018-01-25 02:58:37,691 - DEBUG - Generating output file /cellrangerpath/count-WT/velocyto/count-WT.loom
2018-01-25 02:58:37,691 - DEBUG - Collecting row attributes
2018-01-25 02:58:37,763 - DEBUG - Generating data table
2018-01-25 02:58:38,975 - DEBUG - Writing loom file
Traceback (most recent call last):
File "/usr/home/anaconda3/bin/velocyto", line 11, in
sys.exit(cli())
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/commands/run10x.py", line 100, in run10x
samtools_memory=samtools_memory, additional_ca=additional_ca)
File "/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/commands/_run.py", line 222, in _run
ds = loompy.create(filename=outfile, matrix=total, row_attrs=ra, col_attrs=ca, dtype="float32")
TypeError: create() got an unexpected keyword argument 'matrix'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.