Git Product home page Git Product logo

readwritedlm2.jl's Introduction

ReadWriteDlm2

CSV IO Supports Decimal Comma, Date, DateTime, Time, Complex, Missing and Rational

Build status Build status codecov.io

ReadWriteDlm2 functions readdlm2(), writedlm2(), readcsv2() and writecsv2() are similar to those of stdlib.DelimitedFiles, but with additional support for Dates formats, Complex, Rational, Missing types and special decimal marks. ReadWriteDlm2 supports the Tables.jl interface.

  • For "decimal dot" users the functions readcsv2() and writecsv2() have the respective defaults: Delimiter is ',' (fixed) and decimal='.'.

  • The basic idea of readdlm2() and writedlm2() is to support the decimal comma countries. These functions use ';' as default delimiter and ',' as default decimal mark. "Decimal dot" users of these functions need to define decimal='.'.

  • Alternative package: CSV (supports also special decimal marks)

Installation

This package is registered and can be installed within the Pkg REPL-mode: Type ] in the REPL and then:

pkg> add ReadWriteDlm2

Basic Example(-> more): How To Use ReadWriteDlm2

julia> using ReadWriteDlm2, Dates           # activate modules ReadWriteDlm2, Dates

julia> a = ["text" 1.2; Date(2017,1,1) 1];  # create array with: String, Date, Float64 and Int eltype

julia> writedlm2("test.csv", a)             # test.csv(decimal comma): "text;1,2\n2017-01-01;1\n"

julia> readdlm2("test.csv")                 # read `CSV` data: All four eltypes are parsed correctly!
2×2 Array{Any,2}:
 "text"      1.2
 2017-01-01  1

julia> using DataFrames                     # Tables interface: auto Types for DataFrame columns

julia> DataFrame(readdlm2("test.csv", tables=true))
2×2 DataFrame
│ Row │ Column1    │ Column2 │
│     │ Any        │ Real    │
├─────┼────────────┼─────────┤
│ 1   │ text       │ 1.2     │
│ 2   │ 2017-01-01 │ 1       │

Function readdlm2()

Read a matrix from source. The source can be a text file, stream or byte array. Each line, separated by eol (default is '\n'), gives one row. The columns are separated by ';', another delim can be defined.

readdlm2(source; options...)
readdlm2(source, T::Type; options...)
readdlm2(source, delim::Char; options...)
readdlm2(source, delim::Char, T::Type; options...)
readdlm2(source, delim::Char, eol::Char; options...)
readdlm2(source, delim::Char, T::Type, eol::Char; options...)

Pre-processing of source with regex substitution changes the decimal marks from d,d to d.d. For default rs the keyword argument decimal=',' sets the decimal Char in the r-string of rs. When a special regex substitution tuple rs=(r.., s..) is defined, the argument decimal is not used ( -> Example). Pre-processing can be switched off with: rs=().

In addition to stdlib readdlm(), data is also parsed for Dates formats (ISO), theTime format HH:MM[:SS[.s{1,9}]] and for complex and rational numbers. To deactivate parsing dates/time set: dfs="", dtfs="". locale defines the language of day (E, e) and month (U, u) names.

The result will be a (heterogeneous) array of default element type Any. If header=true it will be a tuple containing the data array and a vector for the columnnames. Other (abstract) types for the data array elements could be defined. If data is empty, a 0×0 Array{T,2} is returned.

With tables=true[, header=true] option[s] a Tables interface compatible MatrixTable with individual column types is returned, which for example can be used as argument for DataFrame().

Additional Keyword Arguments readdlm2()

  • decimal=',': Decimal mark Char used by default rs, irrelevant if rs-tuple is not the default one
  • rs=(r"(\d),(\d)", s"\1.\2"): Regex (r,s)-tuple, the default change d,d to d.d if decimal=','
  • dtfs="yyyy-mm-ddTHH:MM:SS.s": Format string for DateTime parsing
  • dfs="yyyy-mm-dd": Format string for Date parsing
  • locale="english": Language for parsing dates names, default is english
  • tables=false: Return Tables interface compatible MatrixTable if true
  • dfheader=false: dfheader=true is shortform for tables=true, header=true
  • missingstring="na": How missing values are represented, default is "na"

Function readcsv2()

readcsv2(source, T::Type=Any; opts...)

Equivalent to readdlm2() with delimiter ',' and decimal='.'.

Documentation For Base readdlm()

More information about Base functionality and (keyword) arguments - which are also supported by readdlm2() and readcsv2() - is available in the documentation for readdlm().

Compare Default Functionality readdlm() - readdlm2() - readcsv2()

Module Function Delimiter Dec. Mark Element Type Ext. Parsing
DelimitedFiles readdlm() ' ' '.' Float64/Any No (String)
ReadWriteDlm2 readdlm2() ';' ',' Any Yes
ReadWriteDlm2 readcsv2() ',' '.' Any Yes
ReadWriteDlm2 readdlm2(opt:tables=true) ';' ',' Column spec. Yes, + col T
ReadWriteDlm2 readcsv2(opt:tables=true) ',' '.' Column spec. Yes, + col T

Function writedlm2()

Write A (a vector, matrix, or an iterable collection of iterable rows, a Tables source) as text to f (either a filename or an IO stream). The columns are separated by ';', another delim (Char or String) can be defined.

writedlm2(f, A; options...)
writedlm2(f, A, delim; options...)

By default, a pre-processing of values takes place. Before writing as strings, decimal marks are changed from '.' to ','. With a keyword argument another decimal mark can be defined. To switch off this pre-processing set: decimal='.'.

In writedlm2() the output format for Date and DateTime data can be defined with format strings. Defaults are the ISO formats. Day (E, e) and month (U, u) names are written in the locale language. For writing Complex numbers the imaginary component suffix can be selected with the imsuffix= keyword argument.

Additional Keyword Arguments writedlm2()

  • decimal=',': Character for writing decimal marks, default is a comma
  • dtfs="yyyy-mm-ddTHH:MM:SS.s": Format string, DateTime write format
  • dfs="yyyy-mm-dd": Format string, Date write format
  • locale="english": Language for writing date names, default is english
  • imsuffix="im": Complex - imaginary component suffix "im"(=default), "i" or "j"
  • missingstring="na": How missing values are written, default is "na"

Function writecsv2()

writecsv2(f, A; opts...)

Equivalent to writedlm2() with fixed delimiter ',' and decimal='.'.

Compare Default Functionality writedlm() - writedlm2() - writecsv2()

Module Function Delimiter Decimal Mark
DelimitedFiles writedlm() '\t' '.'
ReadWriteDlm2 writedlm2() ';' ','
ReadWriteDlm2 writecsv2() ',' '.'

More Examples

writecsv2() And readcsv2()

julia> using ReadWriteDlm2

julia> a = Any[1 complex(1.5,2.7);1.0 1//3];   # create array with: Int, Complex, Float64 and Rational type

julia> writecsv2("test.csv", a)                # test.csv(decimal dot): "1,1.5+2.7im\n1.0,1//3\n"

julia> readcsv2("test.csv")                    # read CSV data: All four types are parsed correctly!
2×2 Array{Any,2}:
 1    1.5+2.7im
 1.0    1//3

writedlm2() And readdlm2() With Special decimal=

julia> using ReadWriteDlm2

julia> a = Float64[1.1 1.2;2.1 2.2]
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

julia> writedlm2("test.csv", a; decimal='€')     # '€' is decimal Char in 'test.csv'

julia> readdlm2("test.csv", Float64; decimal='€')      # a) standard: use keyword argument
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

julia> readdlm2("test.csv", Float64; rs=(r"(\d)€(\d)", s"\1.\2"))    # b) more flexible: rs-Regex-Tupel
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

writedlm2() And readdlm2() With Union{Missing, Float64}

julia> using ReadWriteDlm2

julia> a = Union{Missing, Float64}[1.1 0/0;missing 2.2;1/0 -1/0]
3×2 Array{Union{Missing, Float64},2}:
   1.1        NaN
    missing     2.2
 Inf         -Inf

julia> writedlm2("test.csv", a; missingstring="???")     # use "???" for missing data

julia> read("test.csv", String)
"1,1;NaN\n???;2,2\nInf;-Inf\n"

julia> readdlm2("test.csv", Union{Missing, Float64}; missingstring="???")
3×2 Array{Union{Missing, Float64},2}:
   1.1        NaN
    missing     2.2
 Inf         -Inf

Date And DateTime With locale="french"

julia> using ReadWriteDlm2, Dates

julia> Dates.LOCALES["french"] = Dates.DateLocale(
           ["janvier", "février", "mars", "avril", "mai", "juin",
               "juillet", "août", "septembre", "octobre", "novembre", "décembre"],
           ["janv", "févr", "mars", "avril", "mai", "juin",
               "juil", "août", "sept", "oct", "nov", "déc"],
           ["lundi", "mardi", "mercredi", "jeudi", "vendredi", "samedi", "dimanche"],
           ["lu", "ma", "me", "je", "ve", "sa", "di"],
           );

julia> a = hcat([Date(2017,1,1), DateTime(2017,1,1,5,59,1,898), 1, 1.0, "text"])
5x1 Array{Any,2}:
  2017-01-01
  2017-01-01T05:59:01.898
 1
 1.0
  "text"

julia> writedlm2("test.csv", a; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")

julia> read("test.csv", String)  # to see what have been written in "test.csv" file
"dimanche, 1.janvier 2017\ndi, 1.janv 2017 5:59:1,898\n1\n1,0\ntext\n"

julia> readdlm2("test.csv"; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
5×1 Array{Any,2}:
  2017-01-01
  2017-01-01T05:59:01.898
 1
 1.0
  "text"

Tables-Interface Examples With DataFrames

See -> DataFrames for installation and more information.

julia> using ReadWriteDlm2, Dates, DataFrames, Statistics
Write CSV: Using Tables interface and create from DataFrame
julia> df = DataFrame(                                # Create DataFrame `df`
    date = [Date(2017,1,1), Date(2017,1,2), nothing],
    value_1 = [1.4, 1.8, missing],
    value_2 = [2, 3, 4]
    )
3×3 DataFrame
│ Row │ date       │ value_1  │ value_2 │
│     │ Union…     │ Float64⍰ │ Int64   │
├─────┼────────────┼──────────┼─────────┤
│ 1   │ 2017-01-01 │ 1.4      │ 2       │
│ 2   │ 2017-01-02 │ 1.8      │ 3       │
│ 3   │            │ missing  │ 4       │

julia> writedlm2("testdf_com.csv", df)   # decimal comma: write DataFrame df

julia> read("testdf_com.csv", String)    # check csv data
"date;value_1;value_2\n2017-01-01;1,4;2\n2017-01-02;1,8;3\nnothing;na;4\n"
Read CSV: Using Tables interface and create a DataFrame
julia> df2 = DataFrame(readdlm2("testdf_com.csv", header=true, tables=true))
3×3 DataFrame
│ Row │ date       │ value_1  │ value_2 │
│     │ Union…     │ Float64⍰ │ Int64   │
├─────┼────────────┼──────────┼─────────┤
│ 1   │ 2017-01-01 │ 1.4      │ 2       │
│ 2   │ 2017-01-02 │ 1.8      │ 3       │
│ 3   │            │ missing  │ 4       │

julia> mean(skipmissing(df2[!, :value_1]))
1.6

julia> mean(df2[!, :value_2])
3.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.