ReadWriteDlm2

CSV IO Supports Decimal Comma, Date, DateTime, Time, Complex, Missing and Rational

ReadWriteDlm2 functions readdlm2(), writedlm2(), readcsv2() and writecsv2() are similar to those of stdlib.DelimitedFiles, but with additional support for Dates formats, Complex, Rational, Missing types and special decimal marks. ReadWriteDlm2 supports the Tables.jl interface.

For "decimal dot" users the functions readcsv2() and writecsv2() have the respective defaults: Delimiter is ',' (fixed) and decimal='.'.
The basic idea of readdlm2() and writedlm2() is to support the decimal comma countries. These functions use ';' as default delimiter and ',' as default decimal mark. "Decimal dot" users of these functions need to define decimal='.'.
Alternative package: CSV (supports also special decimal marks)

Installation

This package is registered and can be installed within the Pkg REPL-mode: Type ] in the REPL and then:

pkg> add ReadWriteDlm2

Basic Example(-> more): How To Use `ReadWriteDlm2`

julia> using ReadWriteDlm2, Dates           # activate modules ReadWriteDlm2, Dates

julia> a = ["text" 1.2; Date(2017,1,1) 1];  # create array with: String, Date, Float64 and Int eltype

julia> writedlm2("test.csv", a)             # test.csv(decimal comma): "text;1,2\n2017-01-01;1\n"

julia> readdlm2("test.csv")                 # read `CSV` data: All four eltypes are parsed correctly!
2×2 Array{Any,2}:
 "text"      1.2
 2017-01-01  1

julia> using DataFrames                     # Tables interface: auto Types for DataFrame columns

julia> DataFrame(readdlm2("test.csv", tables=true))
2×2 DataFrame
│ Row │ Column1    │ Column2 │
│     │ Any        │ Real    │
├─────┼────────────┼─────────┤
│ 1   │ text       │ 1.2     │
│ 2   │ 2017-01-01 │ 1       │

Function `readdlm2()`

Read a matrix from source. The source can be a text file, stream or byte array. Each line, separated by eol (default is '\n'), gives one row. The columns are separated by ';', another delim can be defined.

readdlm2(source; options...)
readdlm2(source, T::Type; options...)
readdlm2(source, delim::Char; options...)
readdlm2(source, delim::Char, T::Type; options...)
readdlm2(source, delim::Char, eol::Char; options...)
readdlm2(source, delim::Char, T::Type, eol::Char; options...)

Pre-processing of source with regex substitution changes the decimal marks from d,d to d.d. For default rs the keyword argument decimal=',' sets the decimal Char in the r-string of rs. When a special regex substitution tuple rs=(r.., s..) is defined, the argument decimal is not used ( -> Example). Pre-processing can be switched off with: rs=().

In addition to stdlib readdlm(), data is also parsed for Dates formats (ISO), theTime format HH:MM[:SS[.s{1,9}]] and for complex and rational numbers. To deactivate parsing dates/time set: dfs="", dtfs="". locale defines the language of day (E, e) and month (U, u) names.

The result will be a (heterogeneous) array of default element type Any. If header=true it will be a tuple containing the data array and a vector for the columnnames. Other (abstract) types for the data array elements could be defined. If data is empty, a 0×0 Array{T,2} is returned.

With tables=true[, header=true] option[s] a Tables interface compatible MatrixTable with individual column types is returned, which for example can be used as argument for DataFrame().

Additional Keyword Arguments `readdlm2()`

decimal=',': Decimal mark Char used by default rs, irrelevant if rs-tuple is not the default one
rs=(r"(\d),(\d)", s"\1.\2"): Regex (r,s)-tuple, the default change d,d to d.d if decimal=','
dtfs="yyyy-mm-ddTHH:MM:SS.s": Format string for DateTime parsing
dfs="yyyy-mm-dd": Format string for Date parsing
locale="english": Language for parsing dates names, default is english
tables=false: Return Tables interface compatible MatrixTable if true
dfheader=false: dfheader=true is shortform for tables=true, header=true
missingstring="na": How missing values are represented, default is "na"

Function `readcsv2()`

readcsv2(source, T::Type=Any; opts...)

Equivalent to readdlm2() with delimiter ',' and decimal='.'.

Documentation For Base `readdlm()`

More information about Base functionality and (keyword) arguments - which are also supported by readdlm2() and readcsv2() - is available in the documentation for readdlm().

Compare Default Functionality `readdlm()` - `readdlm2()` - `readcsv2()`

Module	Function	Delimiter	Dec. Mark	Element Type	Ext. Parsing
DelimitedFiles	`readdlm()`	`' '`	`'.'`	Float64/Any	No (String)
ReadWriteDlm2	`readdlm2()`	`';'`	`','`	Any	Yes
ReadWriteDlm2	`readcsv2()`	`','`	`'.'`	Any	Yes
ReadWriteDlm2	`readdlm2(opt:tables=true)`	`';'`	`','`	Column spec.	Yes, + col T
ReadWriteDlm2	`readcsv2(opt:tables=true)`	`','`	`'.'`	Column spec.	Yes, + col T

Function `writedlm2()`

Write A (a vector, matrix, or an iterable collection of iterable rows, a Tables source) as text to f (either a filename or an IO stream). The columns are separated by ';', another delim (Char or String) can be defined.

writedlm2(f, A; options...)
writedlm2(f, A, delim; options...)

By default, a pre-processing of values takes place. Before writing as strings, decimal marks are changed from '.' to ','. With a keyword argument another decimal mark can be defined. To switch off this pre-processing set: decimal='.'.

In writedlm2() the output format for Date and DateTime data can be defined with format strings. Defaults are the ISO formats. Day (E, e) and month (U, u) names are written in the locale language. For writing Complex numbers the imaginary component suffix can be selected with the imsuffix= keyword argument.

Additional Keyword Arguments `writedlm2()`

decimal=',': Character for writing decimal marks, default is a comma
dtfs="yyyy-mm-ddTHH:MM:SS.s": Format string, DateTime write format
dfs="yyyy-mm-dd": Format string, Date write format
locale="english": Language for writing date names, default is english
imsuffix="im": Complex - imaginary component suffix "im"(=default), "i" or "j"
missingstring="na": How missing values are written, default is "na"

Function `writecsv2()`

writecsv2(f, A; opts...)

Equivalent to writedlm2() with fixed delimiter ',' and decimal='.'.

Compare Default Functionality `writedlm()` - `writedlm2()` - `writecsv2()`

Module	Function	Delimiter	Decimal Mark
DelimitedFiles	`writedlm()`	`'\t'`	`'.'`
ReadWriteDlm2	`writedlm2()`	`';'`	`','`
ReadWriteDlm2	`writecsv2()`	`','`	`'.'`

More Examples

`writecsv2()` And `readcsv2()`

julia> using ReadWriteDlm2

julia> a = Any[1 complex(1.5,2.7);1.0 1//3];   # create array with: Int, Complex, Float64 and Rational type

julia> writecsv2("test.csv", a)                # test.csv(decimal dot): "1,1.5+2.7im\n1.0,1//3\n"

julia> readcsv2("test.csv")                    # read CSV data: All four types are parsed correctly!
2×2 Array{Any,2}:
 1    1.5+2.7im
 1.0    1//3

`writedlm2()` And `readdlm2()` With Special `decimal=`

julia> using ReadWriteDlm2

julia> a = Float64[1.1 1.2;2.1 2.2]
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

julia> writedlm2("test.csv", a; decimal='€')     # '€' is decimal Char in 'test.csv'

julia> readdlm2("test.csv", Float64; decimal='€')      # a) standard: use keyword argument
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

julia> readdlm2("test.csv", Float64; rs=(r"(\d)€(\d)", s"\1.\2"))    # b) more flexible: rs-Regex-Tupel
2×2 Array{Float64,2}:
 1.1  1.2
 2.1  2.2

`writedlm2()` And `readdlm2()` With `Union{Missing, Float64}`

julia> using ReadWriteDlm2

julia> a = Union{Missing, Float64}[1.1 0/0;missing 2.2;1/0 -1/0]
3×2 Array{Union{Missing, Float64},2}:
   1.1        NaN
    missing     2.2
 Inf         -Inf

julia> writedlm2("test.csv", a; missingstring="???")     # use "???" for missing data

julia> read("test.csv", String)
"1,1;NaN\n???;2,2\nInf;-Inf\n"

julia> readdlm2("test.csv", Union{Missing, Float64}; missingstring="???")
3×2 Array{Union{Missing, Float64},2}:
   1.1        NaN
    missing     2.2
 Inf         -Inf

`Date` And `DateTime` With `locale="french"`

julia> using ReadWriteDlm2, Dates

julia> Dates.LOCALES["french"] = Dates.DateLocale(
           ["janvier", "février", "mars", "avril", "mai", "juin",
               "juillet", "août", "septembre", "octobre", "novembre", "décembre"],
           ["janv", "févr", "mars", "avril", "mai", "juin",
               "juil", "août", "sept", "oct", "nov", "déc"],
           ["lundi", "mardi", "mercredi", "jeudi", "vendredi", "samedi", "dimanche"],
           ["lu", "ma", "me", "je", "ve", "sa", "di"],
           );

julia> a = hcat([Date(2017,1,1), DateTime(2017,1,1,5,59,1,898), 1, 1.0, "text"])
5x1 Array{Any,2}:
  2017-01-01
  2017-01-01T05:59:01.898
 1
 1.0
  "text"

julia> writedlm2("test.csv", a; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")

julia> read("test.csv", String)  # to see what have been written in "test.csv" file
"dimanche, 1.janvier 2017\ndi, 1.janv 2017 5:59:1,898\n1\n1,0\ntext\n"

julia> readdlm2("test.csv"; dfs="E, d.U yyyy", dtfs="e, d.u yyyy H:M:S,s", locale="french")
5×1 Array{Any,2}:
  2017-01-01
  2017-01-01T05:59:01.898
 1
 1.0
  "text"

`Tables`-Interface Examples With `DataFrames`

See -> DataFrames for installation and more information.

julia> using ReadWriteDlm2, Dates, DataFrames, Statistics

Write CSV: Using `Tables` interface and create from `DataFrame`

julia> df = DataFrame(                                # Create DataFrame `df`
    date = [Date(2017,1,1), Date(2017,1,2), nothing],
    value_1 = [1.4, 1.8, missing],
    value_2 = [2, 3, 4]
    )
3×3 DataFrame
│ Row │ date       │ value_1  │ value_2 │
│     │ Union…     │ Float64⍰ │ Int64   │
├─────┼────────────┼──────────┼─────────┤
│ 1   │ 2017-01-01 │ 1.4      │ 2       │
│ 2   │ 2017-01-02 │ 1.8      │ 3       │
│ 3   │            │ missing  │ 4       │

julia> writedlm2("testdf_com.csv", df)   # decimal comma: write DataFrame df

julia> read("testdf_com.csv", String)    # check csv data
"date;value_1;value_2\n2017-01-01;1,4;2\n2017-01-02;1,8;3\nnothing;na;4\n"

Read CSV: Using `Tables` interface and create a `DataFrame`

julia> df2 = DataFrame(readdlm2("testdf_com.csv", header=true, tables=true))
3×3 DataFrame
│ Row │ date       │ value_1  │ value_2 │
│     │ Union…     │ Float64⍰ │ Int64   │
├─────┼────────────┼──────────┼─────────┤
│ 1   │ 2017-01-01 │ 1.4      │ 2       │
│ 2   │ 2017-01-02 │ 1.8      │ 3       │
│ 3   │            │ missing  │ 4       │

julia> mean(skipmissing(df2[!, :value_1]))
1.6

julia> mean(df2[!, :value_2])
3.0

strickek / readwritedlm2.jl Goto Github PK

readwritedlm2.jl's Introduction

ReadWriteDlm2

CSV IO Supports Decimal Comma, Date, DateTime, Time, Complex, Missing and Rational

Installation

Basic Example(-> more): How To Use ReadWriteDlm2

Function readdlm2()

Additional Keyword Arguments readdlm2()

Function readcsv2()

Documentation For Base readdlm()

Compare Default Functionality readdlm() - readdlm2() - readcsv2()

Function writedlm2()

Additional Keyword Arguments writedlm2()

Function writecsv2()

Compare Default Functionality writedlm() - writedlm2() - writecsv2()

More Examples

writecsv2() And readcsv2()

writedlm2() And readdlm2() With Special decimal=

writedlm2() And readdlm2() With Union{Missing, Float64}

Date And DateTime With locale="french"

Tables-Interface Examples With DataFrames

Write CSV: Using Tables interface and create from DataFrame

Read CSV: Using Tables interface and create a DataFrame

Recommend Projects

Recommend Topics

Recommend Org