Comments (5)
Interestingly, if ONE level is missing, but they others are present, it works as expected:
data.frame(U=c("foo","bar")[c(1,1,2,2)],K=factor(letters[c(1:2,1:2)],levels=letters[1:3]),V=c(1:2,1:2)) %>% spread(K,V,drop=FALSE)
U a b c
1 bar 1 2 NA
2 foo 1 2 NA
But fails if no data is present:
data.frame(U=c("foo","bar")[c()],K=factor(c(),levels=letters[1:3]),V=numeric()) %>% spread(K,V,drop=FALSE)
Error in `colnames<-`(`*tmp*`, value = c("a", "b", "c")) :
'names' attribute [3] must be the same length as the vector [0]
from tidyr.
This the provided example:
data.frame(U=c("foo","bar")[c()],K=factor(c(),levels=letters[1:3]),V=numeric()) %>% spread(K,V,drop=FALSE)
in spread_.data_frame
we get a failure to operate because
col_id <- dplyr::id(col, drop = drop)
integer(0)
attr(,"n")
[1] 0
but
col_labels <- split_labels(col, col_id, drop = drop)
K
1 a
2 b
3 c
Hence in
> dim(ordered) <- c(attr(row_id, "n"), attr(col_id, "n"))
[1] 0,0
> ordered <- as.data.frame.matrix(ordered, stringsAsFactors = FALSE)
> colnames(ordered) <- as.character(col_labels[[1]])
# FAIL!
colnames fails because it's expecting col_labels column count, but in fact has a column count of 0.
This failure to behave in a similar manner in empty cases (a seasonal problem in R code) starts:
in tidyr::id
.variables <- .variables[lengths != 0]
the id function which nominally handles a data frame (for which lengths would be a repeated number), is actually capable of handling a list (for which other situations might occur). It's hard for me to see how that code - filtering out empty columns - would make sense in a data frame.
It continue in tidy::id_var with
if (length(x) == 0)
return(structure(integer(), n = 0L))
which precedes the factor handling code:
if (is.factor(x) && !drop) {
id <- as.integer(addNA(x, ifany = TRUE))
n <- length(levels(x))
}
I suggest that the order of these clauses in id_var be fixed to check factor first.
from tidyr.
Gah, wrong issue.
from tidyr.
Here's my understanding of the issue, in code:
df_c <- data_frame(
x = c("a", "a", "b", "b"),
y = c("y", "z", "y", "z"),
z = 1:4
)
df_f <- df_c %>% mutate(x = factor(x, levels = c("b", "a")), y = factor(y))
# Correct: only differ in order
df_c %>% spread(y, z) %>% str()
df_f %>% spread(y, z) %>% str()
# Correct: only see y
df_c[1,] %>% spread(y, z) %>% str()
df_f[1,] %>% spread(y, z) %>% str()
# Correct: expands out both y and z
df_f[1,] %>% spread(y, z, drop = FALSE) %>% str()
# Correct: don't see any values
df_c[0,] %>% spread(y, z) %>% str()
# Incorrect: from the levels of the factor, should have columns a and b
df_f[0,] %>% spread(y, z, drop = FALSE) %>% str()
from tidyr.
I'm pretty sure I correctly identified the underlying problem. Please let me know if I missed anything.
from tidyr.
Related Issues (20)
- Release tidyr 1.3.1
- separate_wider_delim changes input column names when using names_sep with cols_remove=FALSE HOT 1
- Change in pivot_wider behavior and error messages when using column numbers for `id_cols` HOT 1
- Feature request: function count the missing value
- Feature request: `.vary` in `expand_grid()`
- Solutions for a crowded namespace: selective removal of items? Any better ideas?
- int64 summation fails HOT 3
- Error in tidyr package & issue with columns being added to Markdown output where used to put into rows (no change to code) HOT 10
- keep getting the error while running RStudio R 4.3.3 on MacOS Big Sur HOT 1
- Better error message for empty tidyselect quosure HOT 2
- In `separate_wider_delim()`, could `merge` have an option for `too_many = "merge_end"`
- num_range() with multiple-specified prefixes gives alternating result HOT 1
- `slice()` behaves differently to `head()` on nested tibbles vs. unnested tibbles. HOT 2
- unnest_longer with multiple cols - values_to=glue::glue({col}_long) not working
- `drop_na()` loses extra classes for `tbl_df`s
- Use base pipe in examples HOT 3
- Feature request: `fill(.direction = "sandwich")` HOT 1
- typos
- Intermittent "reached elapsed time limit" when nesting
- Tibble error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tidyr.