Comments (8)
Sorry for the delay. Somehow I missed to reply to this.
Try to monitor memory usage when you run it.
If memory or crash is immediate then it might be a numerical corner case that I fail to catch (e.g. sqrt(-0) ?). Try to set psc=1e-10. Still the same? What's about changing transform='log" (psc=1) or "linear".
If all those fail it sounds like you might have some zero or constant row/column that is breaking the correlation calculation with a division by zero.
If you could check and report here what you find I can try to isolate and fix the bug
from velocyto.py.
is it solved now?
from velocyto.py.
sorry about that. because of our server broken down, I couldn't re-run it now. However, before that, I subset my data to 40k cells, it worked well, but still crash when I increasing cell numbers to 50k. Also, I used to run the source line by line, I found _colDeltaCorSqrt
lead to crash, but I don't know how to solve it . I will update my results of your suggests when server fix up.
P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.
from velocyto.py.
colDeltaCorSqrt
should never be called, instead colDeltaCorSqrtpartial
should be called when you specify knn_random=True
Maybe the allocation is really too big for 40k cells and I should implement this with a sparse implementation, never tried for such a big dataset
P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.
Are you installing exactly by following the instructions? If so, then you are compiling the Cython and C code using whatever gcc version is installed in the system. If you think that your problem might be related to compilation follow this answer #53
from velocyto.py.
Thanks for your reply~
Follow your suggestion, I tried(40k cells are ok, but 50k cells make error):
### 1 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt",threads=24,n_jobs=24,psc=1e-10, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 2 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="linear",threads=24,n_jobs=24, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 3 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="log",threads=24,n_jobs=24,psc=1, n_neighbors=3000, knn_random=True, sampled_fraction=1)
it won't change.
colDeltaCorSqrt should never be called, instead colDeltaCorSqrtpartial should be called when you specify knn_random=True`
yep, _colDeltaCorSqrtpartial
was called. I think line 169 in velocyto.py/velocyto/estimation.py
make the error, if you could check it or revise it, and I can test the code with my data.
Also, the memory allocate are stable and around ~80g(maybe) before it crush.
from velocyto.py.
The function is calculating the correlation between the velocity vector "v_1" of a call and each finite difference between expression of two cells "x_i- x_1"
Since this is transform independent. I suspect that it is just a division by zero error with correlation calculation. This is calculated as :
A = v_1 - mean(v_1)
B = (x_i- x_1) - mean(x_i- x_1)
A . B / norm(A) * norm(B)
This can happen in the following cases:
- There is a column (cell) of the velocity matrix or expression matrix that is zero
- There are two or more columns (cells) that are identical and therefore B is zero
The fact that with less data it works might mean that the problem is solved because the problematic cell is removed
To help finding what the problem is, you could try to substitute a couple of rows (both in vlm.Sx_sz and vlm.Sx_sz_t) with a random vector. If the error is still there in this test, then the error was not generated by the situations above.
Another possibility is that somehow vlm.Sx_sz (because of some manual filtering of any extra step you might have done) is not in the correct memory layout (e.g. "Fortran" instead of "C"). In this case doing vlm.Sx_sz = np.array(vlm.Sx_sz, copy=True, order='C') should solve it
Finally, if this does not work, the problem might be something completely different... then I would suspect that is somehow related to the way OpenMP is compiling the cython/C code, I don't see any other reason this would fail just by looking at the code. Unfortunately since this is compiled C code, debugging is going to be really difficult.
from velocyto.py.
Was this solved?
from velocyto.py.
Hi,
Sorry for delay, and I still got the same error.
But I use theislab/scvelo which implemented velocity seems good for my data.
It's really wired error.
Thank you again for your kindly help!
from velocyto.py.
Related Issues (20)
- BlockingIOError: [Errno 35] 'Resource temporarily unavailable' on Mac OS X
- Question about latent parameter t
- When and when not to use "repeat annotation mask" in 10X?
- how do i generate the repeat sequences masked gtf file from Ensembl?
- index file HOT 1
- OSError: truncated file is occured in cellranger-5.0.1-dirty output.
- WARNING - The .bam file refers to a chromosome not present in the annotation (.gtf) file HOT 4
- velocyto on filtered or raw matrix or from matrix corrected ?
- Inquiry on the input for running "velocyto run10x"
- Using spatial transcriptomics data on Velocyto.py generates lower counts than Space Ranger
- Additional gene problem
- Seems like a typo.
- Mark spliced/unspliced reads on original BAM file
- Incorrect trimming of chromosome name leads to IOError(f"Input .bam file should be chromosome-sorted. (Hint: use `samtools sort {bamfile}`)") HOT 3
- How to get fastq header information of spliced and unspliced 10X short reads
- Package dependency problem on NumPy HOT 2
- Velocyto ran successfully but main matrix is missing HOT 2
- Memory & run time issue run10x HOT 1
- When calculate the unspliced reads, does UTR region included or not?
- Retriving genes with caracterisitic velocity behavior accorindg to latent time
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from velocyto.py.