Git Product home page Git Product logo

Comments (6)

msmcfarlin avatar msmcfarlin commented on August 15, 2024

Hi Jacob,

I am not a DivNet developer, so take my comments with that caveat in mind. Also, I'm not sure about network methods so I'll leave any comment on that to a developer.

That taxa count does seem quite high for 69 samples though without knowing your study system it's hard to say if you should be suspicious of that. You might comment on the DADA2 page for usage of DADA2 with your data set.

The error, "cannot allocate vector of size...", occurs when the object you are trying to load into divnet() has a vector size larger than your memory limit. From what you said, it sounds like the memory limit on the node you're using is 190Gb and your data, or your data plus whatever other objects are in your R environment exceeds that.

Some of these might help...

  • Request more memory from the cluster for your calculation.
  • Clean out unneeded objects in your R environment before running divnet(), functions here https://www.programmingr.com/r-error-messages/cannot-allocate-vector-of-size/
  • Use tax_glom() to reduce the taxa count if you think that high taxa count is representative of the environment sampled.
  • Subset your data into a smaller data set.

Best,
-Mike

from divnet.

mooreryan avatar mooreryan commented on August 15, 2024

Yeah I would also be suspicious of ~70,000 ASV from 70 samples...it seems abnormally high.

But for arguments sake, let's assume you do have 70,000 good ASVs. The number of samples isn't what's causing the huge resource usage...it's the high number of taxa. Even the rust version will take a while and use a decent amount of memory on a dataset with 70,000 taxa. If I have more than a few 1000 taxa, I generally switch to the Rust version.

Alternatively, you can try collapsing your ASVs to a higher level with tax_glom or somthing similar, to get the taxa to a more manageable level.

from divnet.

cramjaco avatar cramjaco commented on August 15, 2024

Thanks! Yeah, the rust version has been crashing on me too. I'll take it up with those developers next. I'm beginning to think that there may actually be 70,000 ASVs, since I found another dataset from the chesapeake bay that has 300k ASVs in it.
Not a huge fan of using tax glom, since I'd really prefer ASV level shannon index, rather than some other level shannon index.

from divnet.

cramjaco avatar cramjaco commented on August 15, 2024

Oh, wait, I'm talking to @mooreryan -- you are the divnet-rs developer. I'm having the same problem in divnet-rs. That again overloads the memory allocation that I give it on the cluster (~180TB) and then the job gets killed. Is it worth opening an issue over on divnet-rs, or should I just not try to calculate divnet indexes on these highly "diverse" datasets?

from divnet.

cramjaco avatar cramjaco commented on August 15, 2024

Regarding @msmcfarlin 's suggestion about subsetting. Is it ok to run divnet or divnet-rs on each sample (or small set of samples) seperately (assuming I'm not using the network features)? That might get me around the memory issues.

from divnet.

mooreryan avatar mooreryan commented on August 15, 2024

If you would like, feel free to open an issue on the divnet-rs github and we can try and figure out what's going on there.

from divnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.