Git Product home page Git Product logo

vba-csv's Introduction

VBA-CSV

VBA-CSV provides CSV (Comma-Separated Values) parsers and writer as VBA functions. The CSV parsers read CSV text and return Collection or Array of the CSV table contents. The CSV writer converts 2-dimensional array to CSV text.

  • The parsers and writer are compliant with the CSV format defined in RFC4180, which allows commas, line breaks, and double-quotes included in the fields.
  • Function test procedure, performance test procedure and examples are included.
  • The parser takes about 2.2 sec. for 8MB CSV, 8000 rows x 100 columns. (on Core i5-3470 CPU @ 3.2GHz, 4GB RAM)
  • The writer takes about 1.2 sec. for 8MB CSV, 8000 rows x 100 columns. (on Core i5-3470 CPU @ 3.2GHz, 4GB RAM)
  • The parsers do not fully check the syntax error (they parse correctly if the CSV has no syntax error).

Also includes VBScript version in VBScript folder.

Usage and Examples

ParseCSVToCollection

Function ParseCSVToCollection( csvText As String, 
                               Optional allowVariableNumOfFields As Boolean = False ) As Collection
[example]
    Dim csv As Collection
    Dim rec As Collection, fld As Variant

    Set csv = ParseCSVToCollection("aaa,bbb,ccc" & vbCr & "xxx,yyy,zzz")
    If csv Is Nothing Then
        Debug.Print Err.Number & " (" & Err.Source & ") " & Err.Description
    End If
    
    Debug.Print csv(1)(3) '----> ccc
    Debug.Print csv(2)(1) '----> xxx
    For Each rec In csv
      For Each fld In rec
        Debug.Print fld
      Next
    Next

ParseCSVToCollection() returns a Collection of records, and the record is a collection of fields. If error occurs, it returns Nothing and the error information is set in Err object. Optional boolean argument allowVariableNumOfFields specifies whether variable number of fields in records is allowed or handled as error.

ParseCSVToArray

Function ParseCSVToArray( csvText As String, 
                          Optional allowVariableNumOfFields As Boolean = False ) As Variant
[example]
    Dim csv As Variant
    Dim i As Long, j As Variant

    csv = ParseCSVToArray("aaa,bbb,ccc" & vbCr & "xxx,yyy,zzz")
    If IsNull(csv) Then
        Debug.Print Err.Number & " (" & Err.Source & ") " & Err.Description
    End If
    
    Debug.Print csv(1, 3) '----> ccc
    Debug.Print csv(2, 1) '----> xxx
    For i = LBound(csv, 1) To UBound(csv, 1)
      For j = LBound(csv, 2) To UBound(csv, 2)
        Debug.Print csv(i, j)
      Next
    Next

ParseCSVToArray() returns a Variant that contains 2-dimensional array - String(1 To recordCount, 1 To fieldCount). If error occurs, it returns Null and the error information is set in Err object. If input text is zero-length (""), it returns empty array - String(0 To -1). Optional boolean argument allowVariableNumOfFields specifies whether variable number of fields in records is allowed or handled as error.

ConvertArrayToCSV

Function ConvertArrayToCSV( inArray As Variant,
                            Optional fmtDate As String = "yyyy/m/d",
                            Optional quoting As CSVUtilsQuote = CSVUtilsQuote.MINIMAL,
                            Optional recordSeparator As String = vbCrLf ) As String
[example]
    Dim csv As String
    Dim a(1 To 2, 1 To 2) As Variant
    a(1, 1) = DateSerial(1900, 4, 14)
    a(1, 2) = "Exposition Universelle de Paris 1900"
    a(2, 1) = DateSerial(1970, 3, 15)
    a(2, 2) = "Japan World Exposition, Osaka 1970"
    
    csv = ConvertArrayToCSV(a, "yyyy/mm/dd")
    If Err.Number <> 0 Then
        Debug.Print Err.Number & " (" & Err.Source & ") " & Err.Description
    End If
    
    Debug.Print csv

ConvertArrayToCSV() reads 2-dimensional array inArray and return CSV text. If error occurs, it return the string "", and the error information is set in Err object. fmtDate is used as the argument of text formatting function Format if an element of the array is Date type. The optional argument quoting specifies what type of fields to be quoted:

  • MINIMAL: Quoting only if it is necessary (the field includes double-quotes, comma, line breaks).
  • ALL: Quoting all the fields.
  • NONNUMERIC: Similar to MINIMAL, but quoting also all the String type fields.

The optional arugment recordSeparator specifies record separator (line terminator), default is CRLF.

SetCSVUtilsAnyErrorIsFatal

SetCSVUtilsAnyErrorIsFatal(value As Boolean)
[example]
    SetCSVUtilsAnyErrorIsFatal True
    SetCSVUtilsAnyErrorIsFatal False

This function changes error handling mode for CSV parsers and writer.

False (default) - When run-time error occurs, the parser function returns special value (Nothing, Null, etc.), and the error information is set to properties of Err object.
True - Any run-time error that occurs is fatal (an error message is displayed and execution stops).

ParseCSVToDictionary

Public Function ParseCSVToDictionary(ByRef csvText As String, Optional ByRef keyColumn As Long = 1, 
                                     Optional ByRef allowVariableNumOfFields As Boolean = False) As Object
[example]
    Dim csv As String
    Dim csvd As Object

    csv = "key,val1, val2" & vbCrLf & "name1,v11,v12" & vbCrLf & "name2,v21,v22"
    Set csvd = ParseCSVToDictionary(csv, 1)
    Debug.Print csvd("name1")(2)  ' --> val11
    Debug.Print csvd("name1")(3)  ' --> val12
    Debug.Print csvd("name2")(2)  ' --> val21

ParseCSVToDictionary() returns a Dictionary (Scripting.Dictionary) of records; the records are Collections of fields. In default, the first field of each record is the key of the dictionary. The column number of the key field can be specified by keyColumn, whose default value is 1. If there are multiple records whose key fields are the same, the value for the key is set to the last record among them. If error occurs, it returns Nothing and the error information is set in Err object. Optional boolean argument allowVariableNumOfFields specifies whether variable number of fields in records is allowed or handled as error.

GetFieldDictionary

Public Function GetFieldDictionary(ByRef csvText As String) As Object
[example]
    Dim csv As String
    Dim csva
    Dim field As Object

    csv = "key,val1, val2" & vbCrLf & "name1,v11,v12" & vbCrLf & "name2,v21,v22"
    Set field = GetFieldDictionary(csv)
    csva = ParseCSVToArray(csv)
    Debug.Print csva(2, field("key"))  ' --> name1
    Debug.Print csva(3, field("val1"))  ' --> v21

GetFieldDictionary() returns a Dictionary (Scripting.Dictionary) of field names, whose keys are the field values of the first records and whose values are the column numbers of the fields. If there are multiple fields of the same value in the first record, the value for the key is set to the largest column number among the fields. If error occurs, it returns Nothing and the error information is set in Err object.

Installation

  1. Download the latest release.
  2. Import CSVUtils.bas (and other *.bas) into your project (Open VBA Editor, Alt + F11; File > Import File)

Tested in

  • MS Excel 2000 on Windows 10
  • MS Excel 2013 on Windows 7

The CSV File format

There is no definitive standard for CSV (Comma-separated values) file format, however the most commonly accepted definition is RFC4180. VBA-CSV is compliant with RFC 4180, while still allowing some flexibility where CSV text deviate from the definition. The followings are the rules of CSV format such that VBA-CSV can handle correctly. (The rules indicated by italic characters don't exists in RFC4180)

  1. Each record is located on a separate line, delimited by a line break (CRLF, CR, or LF).

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx CRLF
    
  2. The last record in the file may or may not have an ending line break.    The CSV file containing nothing (= "") is recognized as empty (it has no record nor fields).

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx
    
  3. Within each record, there may be one or more fields, separated by commas.

    aaa,bbb,ccc
    
  4. Each record should contain the same number of fields throughout the file.

  5. Each field may or may not be enclosed in double quotes.

    "aaa","bbb","ccc" CRLF
    zzz,yyy,xxx
    
  6. Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    "aaa","b CRLF
    bb","ccc" CRLF
    zzz,yyy,xxx
    
  7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

    "aaa","b""bb","ccc"
    
  8. Spaces (including tabs) are considered part of a field and should not be ignored. If fields are enclosed with double quotes, then leading and trailing spaces outside of double quotes are ignored.

    " aaa", "bbb", ccc
    
  9. The special quotation expression (="CONTENT") is allowed inside of the double-quotes. CONTENT (field content) must not include any double-quote ("). MS Excel can read this.

    aaa,"=""bbb""",ccc
    

Author

sdkn104

License

This software is released under the MIT License.

vba-csv's People

Contributors

sdkn104 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vba-csv's Issues

ParseCSVToDictionary

Would it be possible to implement a function that returns a Dictionary where either the first column is used for the key or you can provide a column number that would be used as the key. Most of the time the data I am working with needs to be obtained using a key in my code so I end up creating a Collection/Array and using that to create the Dictionary so it would be nice to just get a Dictionary.

Bug

Currently, I am improving the CSVinterface class to make it RFC4180 standard fully compliant. As you know, your well developed project is used by me as a reference.

I test your code using the following CSV text and the returned value by your ParseCSVToArray function is null:

"Field with 
multiple lines" , "Another field 
with some 
line breaks inside" , "Include some  comma, for test" , Normal field here
1, 2, 3 ,4 
"Field 1", "Field 2" , Field 3 , "Field 4"

The above CSV complies the RFC4180 standard.

Performance issue

I currently work on a CSV-VBA interface that I suposses'll to allow users exchange data between VBA arrays and CSV files, like your project do, but at really high speed. You claim your code is able to read CSV data at about 5.33 MB/sec., but in my test, on a fairly outdated laptop, the performance of it is frustrating. Some suggestion?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.