Hi, I was using velocyto to analysis my own data ( about

python always crush when running estimate_transition_prob about velocyto.py HOT 8 CLOSED

velocyto-team commented on September 17, 2024

python always crush when running estimate_transition_prob

from velocyto.py.

Comments (8)

gioelelm commented on September 17, 2024

Sorry for the delay. Somehow I missed to reply to this.

Try to monitor memory usage when you run it.
If memory or crash is immediate then it might be a numerical corner case that I fail to catch (e.g. sqrt(-0) ?). Try to set psc=1e-10. Still the same? What's about changing transform='log" (psc=1) or "linear".

If all those fail it sounds like you might have some zero or constant row/column that is breaking the correlation calculation with a division by zero.

If you could check and report here what you find I can try to isolate and fix the bug

from velocyto.py.

gioelelm commented on September 17, 2024

is it solved now?

from velocyto.py.

zacharylau10 commented on September 17, 2024

sorry about that. because of our server broken down, I couldn't re-run it now. However, before that, I subset my data to 40k cells, it worked well, but still crash when I increasing cell numbers to 50k. Also, I used to run the source line by line, I found _colDeltaCorSqrt lead to crash, but I don't know how to solve it . I will update my results of your suggests when server fix up.

P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.

from velocyto.py.

gioelelm commented on September 17, 2024

colDeltaCorSqrt should never be called, instead colDeltaCorSqrtpartial should be called when you specify knn_random=True

Maybe the allocation is really too big for 40k cells and I should implement this with a sparse implementation, never tried for such a big dataset

P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.

Are you installing exactly by following the instructions? If so, then you are compiling the Cython and C code using whatever gcc version is installed in the system. If you think that your problem might be related to compilation follow this answer #53

from velocyto.py.

zacharylau10 commented on September 17, 2024

Thanks for your reply~
Follow your suggestion, I tried(40k cells are ok, but 50k cells make error):

### 1 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt",threads=24,n_jobs=24,psc=1e-10, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 2 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="linear",threads=24,n_jobs=24, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 3 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="log",threads=24,n_jobs=24,psc=1, n_neighbors=3000, knn_random=True, sampled_fraction=1)

it won't change.

colDeltaCorSqrt should never be called, instead colDeltaCorSqrtpartial should be called when you specify knn_random=True`

yep, _colDeltaCorSqrtpartial was called. I think line 169 in velocyto.py/velocyto/estimation.py make the error, if you could check it or revise it, and I can test the code with my data.

Also, the memory allocate are stable and around ~80g(maybe) before it crush.

from velocyto.py.

gioelelm commented on September 17, 2024

The function is calculating the correlation between the velocity vector "v_1" of a call and each finite difference between expression of two cells "x_i- x_1"

Since this is transform independent. I suspect that it is just a division by zero error with correlation calculation. This is calculated as :

A = v_1 - mean(v_1)
B = (x_i- x_1) - mean(x_i- x_1)
A . B / norm(A) * norm(B)

This can happen in the following cases:

There is a column (cell) of the velocity matrix or expression matrix that is zero
There are two or more columns (cells) that are identical and therefore B is zero

The fact that with less data it works might mean that the problem is solved because the problematic cell is removed

To help finding what the problem is, you could try to substitute a couple of rows (both in vlm.Sx_sz and vlm.Sx_sz_t) with a random vector. If the error is still there in this test, then the error was not generated by the situations above.

Another possibility is that somehow vlm.Sx_sz (because of some manual filtering of any extra step you might have done) is not in the correct memory layout (e.g. "Fortran" instead of "C"). In this case doing vlm.Sx_sz = np.array(vlm.Sx_sz, copy=True, order='C') should solve it

Finally, if this does not work, the problem might be something completely different... then I would suspect that is somehow related to the way OpenMP is compiling the cython/C code, I don't see any other reason this would fail just by looking at the code. Unfortunately since this is compiled C code, debugging is going to be really difficult.

from velocyto.py.

gioelelm commented on September 17, 2024

Was this solved?

from velocyto.py.

zacharylau10 commented on September 17, 2024

Hi,
Sorry for delay, and I still got the same error.
But I use theislab/scvelo which implemented velocity seems good for my data.
It's really wired error.
Thank you again for your kindly help!

from velocyto.py.

python always crush when running estimate_transition_prob about velocyto.py HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent