Git Product home page Git Product logo

Comments (8)

gioelelm avatar gioelelm commented on September 17, 2024

Sorry for the delay. Somehow I missed to reply to this.

Try to monitor memory usage when you run it.
If memory or crash is immediate then it might be a numerical corner case that I fail to catch (e.g. sqrt(-0) ?). Try to set psc=1e-10. Still the same? What's about changing transform='log" (psc=1) or "linear".

If all those fail it sounds like you might have some zero or constant row/column that is breaking the correlation calculation with a division by zero.

If you could check and report here what you find I can try to isolate and fix the bug

from velocyto.py.

gioelelm avatar gioelelm commented on September 17, 2024

is it solved now?

from velocyto.py.

zacharylau10 avatar zacharylau10 commented on September 17, 2024

sorry about that. because of our server broken down, I couldn't re-run it now. However, before that, I subset my data to 40k cells, it worked well, but still crash when I increasing cell numbers to 50k. Also, I used to run the source line by line, I found _colDeltaCorSqrt lead to crash, but I don't know how to solve it . I will update my results of your suggests when server fix up.

P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.

from velocyto.py.

gioelelm avatar gioelelm commented on September 17, 2024

colDeltaCorSqrt should never be called, instead colDeltaCorSqrtpartial should be called when you specify knn_random=True

Maybe the allocation is really too big for 40k cells and I should implement this with a sparse implementation, never tried for such a big dataset

P.S. could you tell me your cython version or environment package version? maybe it's a compatible problem.

Are you installing exactly by following the instructions? If so, then you are compiling the Cython and C code using whatever gcc version is installed in the system. If you think that your problem might be related to compilation follow this answer #53

from velocyto.py.

zacharylau10 avatar zacharylau10 commented on September 17, 2024

Thanks for your reply~
Follow your suggestion, I tried(40k cells are ok, but 50k cells make error):

### 1 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt",threads=24,n_jobs=24,psc=1e-10, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 2 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="linear",threads=24,n_jobs=24, n_neighbors=3000, knn_random=True, sampled_fraction=1)
### 3 vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="log",threads=24,n_jobs=24,psc=1, n_neighbors=3000, knn_random=True, sampled_fraction=1)

it won't change.

colDeltaCorSqrt should never be called, instead colDeltaCorSqrtpartial should be called when you specify knn_random=True`

yep, _colDeltaCorSqrtpartial was called. I think line 169 in velocyto.py/velocyto/estimation.py make the error, if you could check it or revise it, and I can test the code with my data.

Also, the memory allocate are stable and around ~80g(maybe) before it crush.

from velocyto.py.

gioelelm avatar gioelelm commented on September 17, 2024

The function is calculating the correlation between the velocity vector "v_1" of a call and each finite difference between expression of two cells "x_i- x_1"

Since this is transform independent. I suspect that it is just a division by zero error with correlation calculation. This is calculated as :

A = v_1 - mean(v_1)
B = (x_i- x_1) - mean(x_i- x_1)
A . B / norm(A) * norm(B)

This can happen in the following cases:

  • There is a column (cell) of the velocity matrix or expression matrix that is zero
  • There are two or more columns (cells) that are identical and therefore B is zero

The fact that with less data it works might mean that the problem is solved because the problematic cell is removed

To help finding what the problem is, you could try to substitute a couple of rows (both in vlm.Sx_sz and vlm.Sx_sz_t) with a random vector. If the error is still there in this test, then the error was not generated by the situations above.

Another possibility is that somehow vlm.Sx_sz (because of some manual filtering of any extra step you might have done) is not in the correct memory layout (e.g. "Fortran" instead of "C"). In this case doing vlm.Sx_sz = np.array(vlm.Sx_sz, copy=True, order='C') should solve it

Finally, if this does not work, the problem might be something completely different... then I would suspect that is somehow related to the way OpenMP is compiling the cython/C code, I don't see any other reason this would fail just by looking at the code. Unfortunately since this is compiled C code, debugging is going to be really difficult.

from velocyto.py.

gioelelm avatar gioelelm commented on September 17, 2024

Was this solved?

from velocyto.py.

zacharylau10 avatar zacharylau10 commented on September 17, 2024

Hi,
Sorry for delay, and I still got the same error.
But I use theislab/scvelo which implemented velocity seems good for my data.
It's really wired error.
Thank you again for your kindly help!

from velocyto.py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.