Git Product home page Git Product logo

Comments (6)

bogind avatar bogind commented on July 17, 2024 1

Hi,
You're not doing anything wrong. I just havent had the time to add the functionality that lets you create one data frame from a folder.
That's the next thing on the plan but it's not currently supported

from easycsv.

alexfun avatar alexfun commented on July 17, 2024 1

Hi, instead of assigning to the global environment, why don't you assign to a list and apply rbindlist on it?

fread_folder <- 
    function (directory = NULL, extension = "CSV", sep = "auto", 
              nrows = -1L, header = "auto", na.strings = "NA", stringsAsFactors = FALSE, 
              verbose = getOption("datatable.verbose"), skip = 0L, drop = NULL, 
              colClasses = NULL, integer64 = getOption("datatable.integer64"), 
              dec = if (sep != ".") "." else ",", check.names = FALSE, 
              encoding = "unknown", quote = "\"", strip.white = TRUE, 
              fill = FALSE, blank.lines.skip = FALSE, key = NULL, Names = NULL, 
              prefix = NULL, showProgress = interactive(), data.table = TRUE) 
    {
        if ("data.table" %in% rownames(installed.packages()) == 
            FALSE) {
            stop("data.table needed for this function to work. Please install it.", 
                 call. = FALSE)
        }
        if (is.null(directory)) {
            os = Identify.OS()
            if (tolower(os) == "windows") {
                directory <- utils::choose.dir()
                if (tolower(os) == "linux" | tolower(os) == "macosx") {
                    directory <- choose_dir()
                }
            }
            else {
                stop("Please supply a valid local directory")
            }
        }
        directory = paste(gsub(pattern = "\\", "/", directory, fixed = TRUE))
        endings = list()
        if (tolower(extension) == "txt") {
            endings[1] = "*\\.txt$"
        }
        if (tolower(extension) == "csv") {
            endings[1] = "*\\.csv$"
        }
        if (tolower(extension) == "both") {
            endings[1] = "*\\.txt$"
            endings[2] = "*\\.csv$"
        }
        if ((tolower(extension) %in% c("txt", "csv", "both")) == 
            FALSE) {
            stop("Pleas supply a valid value for 'extension',\n\n         allowed values are: 'TXT','CSV','BOTH'.")
        }
        tempfiles = list()
        temppath = list()
        tempdf_list = list()
        num = 1
        for (i in endings) {
            temppath = paste(directory, list.files(path = directory, 
                                                   pattern = i), sep = "/")
            tempfiles = list.files(path = directory, pattern = i)
            num = num + 1
            if (length(temppath) < 1 | length(tempfiles) < 1) {
                num = num + 1
            } else {
                temppath = unlist(temppath)
                tempfiles = unlist(tempfiles)
                count = 0
                for (tbl in temppath) {
                    count = count + 1
                    DTname1 = paste0(gsub(directory, "", tbl))
                    DTname2 = paste0(gsub("/", "", DTname1))
                    if (!is.null(Names)) {
                        if ((length(Names) != length(temppath)) | 
                            (class(Names) != "character")) {
                            stop("Names must a character vector of same length as the files to be read.")
                        } else {
                            DTname3 = Names[count]
                        }
                    } else {
                        DTname3 = paste0(gsub(i, "", DTname2))
                    }
                    
                    if (!is.null(prefix) && is.character(prefix)) {
                        DTname4 = paste(prefix, DTname3, sep = "")
                    } else {
                        DTname4 = DTname3
                    }
                    
                    DTable <- data.table::fread(input = tbl, sep = sep, 
                                                nrows = nrows, header = header, na.strings = na.strings, 
                                                stringsAsFactors = stringsAsFactors, verbose = verbose, 
                                                skip = skip, drop = drop, colClasses = colClasses, 
                                                dec = if (sep != ".") "." else ",", 
                                                check.names = check.names, encoding = encoding, 
                                                quote = quote, strip.white = strip.white, 
                                                fill = fill, blank.lines.skip = blank.lines.skip, 
                                                key = key, showProgress = showProgress, data.table = data.table)
                    
                    # assign_to_global <- function(pos = 1) {
                    #     assign(x = DTname4, value = DTable, envir = as.environment(pos))
                    # }
                    # assign_to_global()
                    
                    tempdf_list <- append(tempdf_list, list(DTable))
                    
                    rm(DTable)
                }
            }
        }
        
        tempdf = data.table::rbindlist(tempdf_list)
        
        if(!data.table) {
            tempdf = as.data.frame(tempdf)
        }
        
        return(tempdf)
            
    }

from easycsv.

bogind avatar bogind commented on July 17, 2024

@alexfun looks good. Do you want to create a pull request?

from easycsv.

alexfun avatar alexfun commented on July 17, 2024

@bogind I would be more than happy to submit the function above, however I am not sure whether you had something in mind with the code that assigns a variable name based on the file name in the folder. If you would like, I can add a new parameter combine taking one of the following values: c("data.frame", "global", "list") so that

  1. global preserves existing behaviour.
  2. list returns a named list of the csvs, using the currently used naming convention.
  3. data.frame returns one data frame via rbindlist.

from easycsv.

bogind avatar bogind commented on July 17, 2024

@alexfun The combine parameter seems logical, I think the regular behavior should be using global as the value

from easycsv.

alexfun avatar alexfun commented on July 17, 2024

ok, i will write the code with global as the default behaviour and submit it to you for review.

from easycsv.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.