Comments (13)
Make sure you are using the latest version of pyreadr. If the problem persists send a file to reproduce the issue. If I cannot reproduce it, I cannot fix it.
from pyreadr.
I have the latest pyreadr. However, I am not allowed to share the dataset.
from pyreadr.
that's unfortunate because if I can't reproduce it there is nothing I can do now.
If you can prepare a minimal synthetic dataset that reproduces the error, that would be ideal.
Otherwise, we will have to wait until somebody else finds the same issue and generates a file to reproduce it.
from pyreadr.
Hello,
I have the same issue and I have made a reproducible file for you to check out (however I cannot find how to upload it here). I tried a lot to get it to work and probably more during my long internet search. I think my file is not a "good" .RData file and tried to find the reason why, but so far unsuccessful. Could you have a look?
- Loading it into Rstudio and trying to save it in a different way. saveRDS() with different parameters (compress, version, ascii)
- tried to change the encoding if that might work with R scripts and then saving it as an .RData file
fix.encoding <- function(df, originalEncoding = "UTF-8") {
numCols <- ncol(df)
df <- data.frame(df)
for (col in 1:numCols)
{
if(class(df[, col]) == "character"){
Encoding(df[, col]) <- originalEncoding
}
if(class(df[, col]) == "factor"){
Encoding(levels(df[, col])) <- originalEncoding
}
else{
Encoding(df[, col]) <- originalEncoding
}
}
return(as_data_frame(df))
}
- tried to open it with this python code. Which kind of works, but not really.
with open(file, 'rb') as f:
text = f.read()
text = text.decode("utf-8")
- also tried to remove the rownames, or change all my factors to characters, but also still an error.
df <- tibble::rownames_to_column(df, "VALUE")
and
i <- sapply(df, is.factor)
df[i] <- lapply(df[i], as.character)
from pyreadr.
thanks, I need the file to take a look. Zip it and then upload it here, just drag and drop into this text box. If the file is too big, then put it in dropbox, google drive or similar and share it with everyone and paste here the link. You can research for other services where you can put your file without having an account.
Without file it is impossible for me to take a look.
from pyreadr.
Sorry, I uploaded some corrupt files earlier. This one should work
test8.RData.zip
from pyreadr.
I Finally found a solution! However, I would like not to load it into R, to re-save the file, and then use it in my code. I would rather just use the original RData files. But I was trying all kinds of stuff for proof of concept.
I load this file into R, run the following to remove the Factors: (rlvnc2 is the name of de dataframe, change accordingly)
i <- sapply(rlvnc2, is.factor)
rlvnc2[i] <- lapply(rlvnc2[i], as.character)
And then save it with the standard save() option from R
save(rlvnc2, file = "/file/path/test9.RData")
Then it works fine with your pyreadr. But if I save it with saveRDS()
it doesn't work anymore. Also the original file doesn't work (with the factors instead of characters)
from pyreadr.
Ok, thanks I can reproduce it. The issue is coming from the C library, therefore I have submitted a new issue about this.
I see that in the file every factor has a lot of levels, I wonder if there is some non-UTF8 character hidden there somewhere. In the other hand it seems that you already tried to change the encoding of all factors and that didn't work.
from pyreadr.
to be sure, I tried to change the encoding again and save with save()
instead of saveRDS()
and still I have the error.
Good luck finding the exact problem. If you need any help with trial and error, let me know
from pyreadr.
interesting, when I save the file it looks completely different when looked at a hex file editor. What version of R are you using, on which platform? (windows, mac, linux ... )?
from pyreadr.
I think that the original file (that isn't working) is made on a linux based computer with an old version of R or a windows computer with an old version of R. I do not know the exact origin, because I only work with this file and was created before I was involved.
the new file (after changing the factors to characters) was made on R version 4.0.2 with Rstudio 2021.09.0 Build 351 "Ghost Orchid" Release (077589bc, 2021-09-20) for macOS.
test9.RData.zip
EDIT:
now that I think about it... both files are made in the macOS R version 4.0.2. I made a reproducible example using my own computer. the original-original file is much bigger, but also has some information in it that I am unable to share. this is just the first 4 lines of the original file, saved in macOS R version 4.0.2 (test8).
after changing the factors to character (as explained earlier) the same dataframe works again (test9)
from pyreadr.
OK anyway, saving the file again with 4.02 gives exactly the same error, I think somehow the C library is not reading one of the fields in the binary file from the correct byte.
from pyreadr.
I saved a working version for you in a previous post. Might be a good way to compare the two.
from pyreadr.
Related Issues (20)
- Unable to allocate memory HOT 1
- index of pandas dataframe is lost when writing to Rds HOT 2
- fail to run tests due to missing import pyreadr HOT 5
- Allow Python 3's pathlib.Path as an alternative to str HOT 2
- ImportError: DLL load failed while importing librdata: Can't find the specified module. HOT 39
- Support for row_offset and row_limit ? HOT 2
- ModuleNotFoundError: No module named 'pyreadr.librdata' HOT 16
- Installation problem on MacOS 12.1 (m1max chip) with python 3.8. HOT 14
- LibrdataError: The file contains an unrecognized object HOT 1
- Invalid file, or file has unsupported features HOT 1
- Usage of - df1 = result["df1"] # extract the pandas data frame for object df1
- DLL load failed while importing librdata: The specified module could not be found. HOT 3
- Error: Unable to load time variables with missing values in python using pyreadr package from RData file HOT 2
- Integer datatypes with missing values changes to object in python using pyreadr package, after importing data from RData file HOT 5
- Wheel building on MacOS arm64 HOT 9
- Wheels for Python 3.11 HOT 4
- save multiple df at once in pyreadr.write_rdata HOT 1
- A string file path was read as bytes in pyreadr.read_r() HOT 5
- Error during installation of pyreadr using pip HOT 3
- pyreadr.custom_errors.LibrdataError: Unable to read from file for large RDS files HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyreadr.