Git Product home page Git Product logo

Comments (4)

bczernecki avatar bczernecki commented on July 20, 2024

Hi @Armin-RS
The error message is not very informative, but I believe there's something wrong with the file for May 21'.
I have downloaded the same file on my laptops (Mac, Linux) and the message is still the same - file is corrupted. However, it works in a browser and in system shell, so there is sth wrong actually with the way how R tries to extract this file on Linux/Unix systems. I think it might be related to the zlib library which is used by unzip function on non-Windows systems:

The internal C code uses zlib and is in particular based on the contributed minizip application in the zlib sources (from https://zlib.net/)

What I can recommend is to :

  • (1) check the same command on Windows
  • (2) slightly more complicated - overwrite R's unzip:

Solution for no. 2:

# first - modify the default unzip:

unzip = function (zipfile, files = NULL, list = FALSE, overwrite = TRUE, 
          junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE) {
  if (identical(unzip, "internal")) {
    if (!list && !missing(exdir)) 
      dir.create(exdir, showWarnings = FALSE, recursive = TRUE)
    res <- system(paste0("/usr/bin/unzip -o ", zipfile, " -d ", exdir))
    if (list) {
      dates <- as.POSIXct(res[[3]], "%Y-%m-%d %H:%M", tz = "UTC")
      data.frame(Name = res[[1]], Length = res[[2]], Date = dates, 
                 stringsAsFactors = FALSE)
    }
    else invisible(attr(res, "extracted"))
  }
  else {
    WINDOWS <- .Platform$OS.type == "windows"
    if (!is.character(unzip) || length(unzip) != 1L || !nzchar(unzip)) 
      stop("'unzip' must be a single character string")
    zipfile <- path.expand(zipfile)
    if (list) {
      res <- if (WINDOWS) 
        system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE)
      else system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE, 
                   env = c("TZ=UTC"))
      l <- length(res)
      res2 <- res[-c(2, l - 1, l)]
      res3 <- gsub(" *([^ ]+) +([^ ]+) +([^ ]+) +(.*)", 
                   "\\1 \\2 \\3 \"\\4\"", res2)
      con <- textConnection(res3)
      on.exit(close(con))
      z <- read.table(con, header = TRUE, as.is = TRUE)
      dt <- paste(z$Date, z$Time)
      formats <- if (max(nchar(z$Date) > 8)) 
        c("%Y-%m-%d", "%d-%m-%Y", "%m-%d-%Y")
      else c("%m-%d-%y", "%d-%m-%y", "%y-%m-%d")
      slash <- any(grepl("/", z$Date))
      if (slash) 
        formats <- gsub("-", "/", formats, fixed = TRUE)
      formats <- paste(formats, "%H:%M")
      for (f in formats) {
        zz <- as.POSIXct(dt, tz = "UTC", format = f)
        if (all(!is.na(zz))) 
          break
      }
      z[, "Date"] <- zz
      z[c("Name", "Length", "Date")]
    }
    else {
      args <- character()
      if (junkpaths) 
        args <- c(args, "-j")
      if (overwrite) 
        args <- c(args, "-oq", shQuote(zipfile))
      else args <- c(args, "-nq", shQuote(zipfile))
      if (length(files)) 
        args <- c(args, shQuote(files))
      if (exdir != ".") 
        args <- c(args, "-d", shQuote(exdir))
      if (WINDOWS) 
        system2(unzip, args, stdout = NULL, stderr = NULL, 
                invisible = TRUE)
      else system2(unzip, args, stdout = NULL, stderr = NULL)
      invisible(NULL)
    }
  }
}

# overwrite the unzip in utils package:
assignInNamespace("unzip", unzip, ns = "utils")

# activate climate with the newer unzip command:
library(climate)
m = meteo_imgw(interval="daily", rank="precip", year=2021, 
               coords=TRUE, status=TRUE, col_names="full")

from climate.

Armin-RS avatar Armin-RS commented on July 20, 2024

Hi @bczernecki,

thank you for your very detailed reply!

I tried the same command on a Windows 10 computer and it failed with the same error message.

I also believe that there is something wrong with these files because when I extract the June 2021 on the Windows computer, it has less than 5000 lines compared to the typical 11000 of most other months.

For now, I worked around the problem by manually downloading, extracting and reformatting the 2021 precipitation files.

Is there anyone at IMGW whom I could notify about the corrupt files ?

Thanks again and have a nice weekend,
Armin

from climate.

bczernecki avatar bczernecki commented on July 20, 2024

The solution that I have provided above returned over 11k rows for May 2021, so it seems to be working fine. Here's the output for all that is available so far for 2021: 2021.xlsx

I have never had this kind of situation before, but I think you can try to contact IMGW by official form: https://imgw.pl/kontakt/reklamacje (in Polish, but writing in English shouldn't be a problem). I can also ask some of my friends working there about contact person to address this question.

from climate.

Armin-RS avatar Armin-RS commented on July 20, 2024

Further investigation (with "zip -F --out fixed.zip 2021_05_o.zip") showed that the CRC checksum of the ZIP archive is wrong. That's probably why R does not want to decompress the file.

You are right, the May 2021 file has 11k rows. But the June 2021 file has only 4.7k rows which I also find a bit odd that suddenly so many stations are not reporting in June.

Anyway, as this is not a bug in the "climate" package, I will close this issue.

from climate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.