Git Product home page Git Product logo

Comments (13)

ofajardo avatar ofajardo commented on May 27, 2024

Make sure you are using the latest version of pyreadr. If the problem persists send a file to reproduce the issue. If I cannot reproduce it, I cannot fix it.

from pyreadr.

Erikvvats avatar Erikvvats commented on May 27, 2024

I have the latest pyreadr. However, I am not allowed to share the dataset.

from pyreadr.

ofajardo avatar ofajardo commented on May 27, 2024

that's unfortunate because if I can't reproduce it there is nothing I can do now.
If you can prepare a minimal synthetic dataset that reproduces the error, that would be ideal.
Otherwise, we will have to wait until somebody else finds the same issue and generates a file to reproduce it.

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

Hello,

I have the same issue and I have made a reproducible file for you to check out (however I cannot find how to upload it here). I tried a lot to get it to work and probably more during my long internet search. I think my file is not a "good" .RData file and tried to find the reason why, but so far unsuccessful. Could you have a look?

  1. Loading it into Rstudio and trying to save it in a different way. saveRDS() with different parameters (compress, version, ascii)
  2. tried to change the encoding if that might work with R scripts and then saving it as an .RData file
fix.encoding <- function(df, originalEncoding = "UTF-8") {
  numCols <- ncol(df)
  df <- data.frame(df)
  for (col in 1:numCols)
  {
    if(class(df[, col]) == "character"){
      Encoding(df[, col]) <- originalEncoding
    }
    
    if(class(df[, col]) == "factor"){
      Encoding(levels(df[, col])) <- originalEncoding
    }
    else{
      Encoding(df[, col]) <- originalEncoding
    }
  }
  return(as_data_frame(df))
}
  1. tried to open it with this python code. Which kind of works, but not really.
with open(file, 'rb') as f:
    text = f.read()
    text = text.decode("utf-8") 
  1. also tried to remove the rownames, or change all my factors to characters, but also still an error.
    df <- tibble::rownames_to_column(df, "VALUE")
    and
i <- sapply(df, is.factor)
df[i] <- lapply(df[i], as.character)

from pyreadr.

ofajardo avatar ofajardo commented on May 27, 2024

thanks, I need the file to take a look. Zip it and then upload it here, just drag and drop into this text box. If the file is too big, then put it in dropbox, google drive or similar and share it with everyone and paste here the link. You can research for other services where you can put your file without having an account.

Without file it is impossible for me to take a look.

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

Sorry, I uploaded some corrupt files earlier. This one should work
test8.RData.zip

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

I Finally found a solution! However, I would like not to load it into R, to re-save the file, and then use it in my code. I would rather just use the original RData files. But I was trying all kinds of stuff for proof of concept.

I load this file into R, run the following to remove the Factors: (rlvnc2 is the name of de dataframe, change accordingly)

i <- sapply(rlvnc2, is.factor)
rlvnc2[i] <- lapply(rlvnc2[i], as.character)

And then save it with the standard save() option from R
save(rlvnc2, file = "/file/path/test9.RData")

Then it works fine with your pyreadr. But if I save it with saveRDS() it doesn't work anymore. Also the original file doesn't work (with the factors instead of characters)

from pyreadr.

ofajardo avatar ofajardo commented on May 27, 2024

Ok, thanks I can reproduce it. The issue is coming from the C library, therefore I have submitted a new issue about this.

I see that in the file every factor has a lot of levels, I wonder if there is some non-UTF8 character hidden there somewhere. In the other hand it seems that you already tried to change the encoding of all factors and that didn't work.

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

to be sure, I tried to change the encoding again and save with save() instead of saveRDS() and still I have the error.

Good luck finding the exact problem. If you need any help with trial and error, let me know

from pyreadr.

ofajardo avatar ofajardo commented on May 27, 2024

interesting, when I save the file it looks completely different when looked at a hex file editor. What version of R are you using, on which platform? (windows, mac, linux ... )?

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

I think that the original file (that isn't working) is made on a linux based computer with an old version of R or a windows computer with an old version of R. I do not know the exact origin, because I only work with this file and was created before I was involved.

the new file (after changing the factors to characters) was made on R version 4.0.2 with Rstudio 2021.09.0 Build 351 "Ghost Orchid" Release (077589bc, 2021-09-20) for macOS.
test9.RData.zip

EDIT:
now that I think about it... both files are made in the macOS R version 4.0.2. I made a reproducible example using my own computer. the original-original file is much bigger, but also has some information in it that I am unable to share. this is just the first 4 lines of the original file, saved in macOS R version 4.0.2 (test8).
after changing the factors to character (as explained earlier) the same dataframe works again (test9)

from pyreadr.

ofajardo avatar ofajardo commented on May 27, 2024

OK anyway, saving the file again with 4.02 gives exactly the same error, I think somehow the C library is not reading one of the fields in the binary file from the correct byte.

from pyreadr.

FiniDG avatar FiniDG commented on May 27, 2024

I saved a working version for you in a previous post. Might be a good way to compare the two.
Screen Shot 2021-12-14 at 15 54 01

from pyreadr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.