How long/wide can a data frame be -- going from gathered to spread form? <p dir="a

spread causes system to run out of memory about tidyr HOT 4 CLOSED

tidyverse commented on June 17, 2024 1

spread causes system to run out of memory

from tidyr.

Comments (4)

JamesOwers commented on June 17, 2024 1

Just a quick note that I'm having memory issues with spread(..., drop=FALSE). If I use spread(..., drop=TRUE) then everything works out fine, the process takes just a few seconds, and the result is of size 0.2Mb.

My input dataset is 0.4MB, has 6000 rows, and 11 variables. This is the result of a filter on a dataset which is of size 200Mb. When running with spread(..., drop=FALSE), the rsession memory expands to over 20Gb.

Unfortunately I can't provide the exact dataset, but if there is anything I can provide to help, I'll be happy to do so.

from tidyr.

hadley commented on June 17, 2024

I have not. It might be possible to replace the vectorised R code with optimised C++ code that would need less memory.

from tidyr.

hadley commented on June 17, 2024

How many unique values are there in the variables that you are spreading? It is easy to create very very large data frames with spread.

from tidyr.

JamesOwers commented on June 17, 2024

There are some numeric variables with a few thousand unique values, but isn't spread just going to make a variable for each key? Also, by virtue of spread(..., drop=TRUE) working fine, the only variables remaining to spread only have one value: NA.

from tidyr.

spread causes system to run out of memory about tidyr HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent