Git Product home page Git Product logo

html5csv's Introduction

html5csv

Build Status

License: html5csv.js is dual-licensed and may be used under your choice of GPLv3 free software license or MIT open source license.

Quickly create apps that generate, "upload", "download", slice, analyze, plot, edit and store CSV tabular data...

... all without a server . Or with a server, too. Extendable.

Warning: Unmaintained code

This 6 year old code library is largely unmaintained.

I am somewhat surprised by the popularity of this library. I initially developed it as a learning experience and for data analysis in Javascript apps. I have since moved to Plotly for charting. For data analysis working in python and Jupyter is more mainstream than coding at a lower, more detailed level in Javascript.

From time to time I'll get some email about current applications. One company used this code to help manage commercial buildings. This code enabled them to see charts and then get the .csv data files from another web-based system. Another company had used the random table generator herein to run their computerized kiosk games in a small casino in Asia. Feel free to let me know if you have come up with a unique or unexpected application.

Help wanted

The following unpaid volunteer roles are available:

  • co-maintainers
  • contributors

Some prior Javascript experience required. Data science background a plus.

Tell me what you would like to improve or work on. One overriding goal was to keep the library very simple. For example, an undergraduate with a bit of website experience should be able to use this library and Jquery to make a demonstration website out of research data.

Here are some obvious needs:

  • standardize documentation and improve readability
  • portfolio of examples (this library can do multiple regression and principal components analysis)
  • pluggable additions
  • additional "made simple" machine learning techniques

Email: [email protected]

Dependencies

Required

  • jQuery

Optional

  • LZString (for compression)
  • numeric.js (for analysis)
  • qunit (for unit testing)

The optional libraries are authored by others and distributed under the MIT-License. Verbatim copies recent as of August 2013 are included in the external-free subdirectory.

Example 1: "Hello World"

JSFiddle for Example 1: Hello World in html5csv

In every introductory language class there is usually a "Hello World" program that teaches the basics of how to get the language to do something by having it print "Hello World".

Usually in a "Hello World" example one shows the boilerplate code that is common to most applications and the procedures for getting the code to run. Usually, the "Hello World" example merely prints "Hello World".

Here, we want to work with tabular or matrix data, so we need to go a little further.

Example 1 will:

  1. create some data
    • Hello and World as the column headings
    • Various names and their planets as the values.
  2. show how to get this data into CSV.begin()
  3. Create and display an HTML table from the data.
  4. Save the data in browser local storage, so it will still be there tomorrow, or at least for Example 2.

For HTML Boilerplate you would need to load at least jQuery and html5csv.js. Optionally you could load LZString for automatic (de)compression of objects being stored into local storage.

example1.html

<!DOCTYPE html5>
<html>
<head>
  <title>html5csv Example 1: Hello World</title>
</head>
<body>
  <div id='output'></div>
  <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.1/jquery.min.js"></script>
  <script src="./external-free/lz-string-1.3.0.js"></script>
  <script src="./html5csv.js"></script>
  <script src="./example1.js"></script>
</body>
</html>

example1.js

CSV.
  begin([
   ["Hello","World"],
   ["Paul","Earth"],
   ["Marvin","Mars"],
   ["Spock","Vulcan"]
         ]).
  table("output",{header:1,caption:"My First html5csv program"}).
  save("local/helloWorld").
  go();

How Example 1 works

  1. CSV. accesses the CSV object created by html5csv.js CSV. is the only global object created by html5csv, on window.
  2. begin selects data to fetch into the CSV engine. The chain of methods starting with begin and ending with go or finalize is queued internally as a CSV workflow.
  3. table will generates a table in the div named "output". If the div does not exist, it is appended to document.body
  4. save will save the table data (and some meta data) in HTML 5 browser localStorage with key "local/helloWorld", using LZString compression if available;
  5. go() starts the chain of methods, or CSV workflow, executing. Execution of each step is asynchronous and involves a brief delay in setTimeout before going to the next step.

Example 2: Serverless browser-based "download" of HelloWorld.csv

JSFiddle Example 2 Downloading HelloWorld.csv

Prerequisite: Example 1 loads data that will be used by Example 2. Do Example 1 first.

Example #2 may not work on all browsers, but recent Chrome and Firefox seem ok.

Code for IE11 was also added Jan 2015.

Credit: The serverless download code was inspired by Stack Overflow answers from adeneo for FF/Chrome and Manu Sharma for IE11 to Export javascript data to csv file without server interaction

In HTML, load jQuery, LZstring if you used it in example 1, and html5csv.js, and then generating a download is a Javascript one-liner (and can also be tried from the console):

CSV.begin("local/helloWorld").download("HelloWorld.csv").go();

Here, we access the data created in example 1.

It was stored away in the browser, and it will stay there until explicitly deleted.

We are going to take that data, and generate a CSV file called "HelloWorld.csv" and output it to the user. The data downloads directly from the browser. No server is involved.

The user will not have to click on a link, it is pushed at him.

Being less pushy

For a better user experience, this code could be triggered by a click in the usual way, such as $('#someButtton').on('click', function(){ CSV.begin.....go(); });

Clearing local storage

If you are working with the jsFiddle example, you could execute a localStorage.clear() within the jsFiddle javascript window to clear the local storage. Local Storage is stored by site, so if you go to another website, it will not be accessible, but it will become available again when the browser is pointed at the site that stored it. Otherwise, to clear the data from a browser, do a general a "delete cookies, storage, and other site and plugin data" from the browsers control panel.

html5csv does Session storage too

A name like session/helloWorld (instead of local/helloWorld) would delete data when the session tab or window is closed instead of keeping it potentially forever... because then the data will be stored in HTML 5 session storage instead of HTML 5 local storage.

and ajax

Names with slants that are not local/ or session/ must be mapped by a plugin, possibly to a server. No plugins are currently available, but one is under development and testing. Names beginning with a / are interpreted as local URLs, to run through an ajax get. The ajax get feature will be documented after some additional testing.


CSV.begin(....) tricks

Input capabilities include:

  • prompting the user for an input file (in CSV format)
  • scraping an HTML table
  • reading local and session storage, which was demonstrated above
  • creating special arrays of uniform or normal random variables
  • arrays prefilled with zero, a number, or a diagonal,
  • arrays generated by a user supplied function
  • reading from URLs via ajax (same origin restriction applies unless the server sends a CORS header or supports JSONP)

To "Upload" to the CSV app a file of CSV data

<input type='file' id='choose' />

Note: this code does not immediately read the file chosen in #choose, it attaches an event listener to the #choose file input and waits for a change. See Issue #12 for more discussion.

     CSV.begin('#choose').....go();

Scraping an existing HTML table

Works the same way, but there is no HTML input element.

The name in CSV.begin(name) can be a valid jQuery selector like #divId, and if there are table rows as descendents it will gather the data.

Examples with Basic Math

  • A 5x10 matrix of uniform [0,1] Random Variables CSV.begin('%U', {dim: [5,10]}).do-stuff...go();

  • A 1000x2 matrix of Normal (mean=0,var=1) Random Variables naming the columns E1 and E2 in the first row CSV.begin('%N', {dim: [1000,2], header:['E1','E2'}).do-stuff...go();

  • A 5x5 matrix with 1 along the diagonal CSV.begin('%I', {dim: [5,5]}).do-stuff....go(); or CSV.begin('%D', {diag: [1,1,1,1,1]}).do-stuff....go();

  • A 100x100 matrix with values given by a function of row i, column j over [0,l-1] CSV.begin('%F', {dim:[100,100], func: function(i,j){ return (1+i)/(1+j); }})......go();


Getting to the final data with ...go(finalCallbackFunction);

go takes a Javascript function as a parameter.

We call this function the "final callback" function, because it is executed at the end of the workflow or when an error occurs.

If you don't supply a final callback function, a default is supplied for you:

default final callback: function(e,D){ if(e) console.log(e) }

The default will log all errors to the console log.

But you can tell go() to do something useful with the data at the end of the CSV work flow... for example, you can pass the data rows to another function, or modify them and start another CSV workflow from the modified rows.

Let's say there is a dashboad function in your app that wants rows of data supplied as dashboard('draw', rows).

If e is null, the data rows are in D.rows and the meta data (if any) is in D.meta.

We could write an appropriate finalCallback as an anonymous function inline in the CSV workflow.

CSV.......go(
  function(e,D){
     if (e) return console.log(e);  //something went wrong and we return
     // if OK, then call our custom "dashboard"  function as discussed
     dashboard('draw', D.rows);  
});

Warning: CSV.....go(callback) is asynchronous, don't fall prey to the asynchronous/global-var anti-pattern

Do not set a global variable in the final callback and expect to retrieve the data immediately after go returns.

These accesses will almost always fail, because ...go(callback) or ...go() returns almost immediately, before the workflow has had time to finish.

The correct approach is to put all activity that depends on the final returned data in the final Callback.

If you are a beginner and use to the imperative style of coding, (first do this, then do that) this may involve a shift in your thinking towards a more functional style of coding (when event X happens, let function handleX handle it). The browser environment generally favors such a shift of thinking in terms of functions that responsd to user clicks or changes in forms.

Trying to use a callback as a way to return data into a global variable is a common AJAX anti-pattern, and questions appear on Stack Overflow every week about this regarding ajax calls: why the ajax doesn't work or why the return value disappears outside of the ajax success function.

Although CSV.....go() in many cases could have been implemented synchronously, htm5csv.js makes all workflow asynchronous to give clear expectations and also allow the browser a pause to process other events between potentially intensive tasks.

Asynchronous workflows can appear when involving user editing of input, supervised machine learning, as well as ajax interactions with a server, when necessary.


Next steps

Possibly, you want to do something a little more serious now, like generate or upload numerical data, run a regression, or plot data.

More documentation, still a bit rough, can be found in the wiki.

If unsure about the wiki, look in the html5csv source code

For more examples, read through the unit tests in qtestcsv.js

Since Feb.2016, every commit to html5csv is automatically tested in Firefox by travis-ci, generating the clickable "build" badge at the top of this page.

You can also run the unit tests in your own browser using the Qunit page. Note that the code and tests come from the gh-pages branch, which is not updated as frequently as the main branch.

html5csv's People

Contributors

bryant1410 avatar drpaulbrewer avatar emav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

html5csv's Issues

planner()'s sloppy candone for loop

This loop (about html5csvj.js line 427):

  for(i=0,l=candone.length;i<l;++i) 
    methods[candone[2*i]] = candone[2*i+1];

is sloppy and the loop constraint should be adjusted to (2*i+1)<l

The value of accessing an out-of-bounds array index is an "undefined", so the unintended effect of this loop currently is to insert

    { undefined: "undefined"}

into the available CSV methods that are valid after CSV.begin().

Since no one is probably calling CSV.begin().undefined() this is immediately harmless but is sloppy and should be fixed. Suggests a review of all for loops at some point in the future as well.

Custom separator & programmatically table

Hi,

I'm using your library to generate a CSV from two different tables. The first thing I have noticed is that this:

var table = document.createElement("table");
table.id = "mergedTable"

$("#mergedTable")
.append($("#csvInfoTable2 > tbody").html())
.append($("#csvInfoTable > tbody").html());

CSV.begin("#mergedTable").download('MyData.csv').go();

Doesn't work. I don't know if it's because the table isn't in the DOM or because the ID has to be a DIV. Anyway I have fixed it to work with my program (merging the tables inside a div and in the DOM).

My real problem now is that I'm using to generate a file that has, at first, the content of the first table (10 rows, 2 columns) separate with ':' and the second table (a lot of rows, 9 columns) separates with ','. Is there anyway to change the separator when creating the CSV?

Thanks and good job! =)

refactor common code that teases numbers from strings in table or CSV input

These are mostly notes I am making to myself, but if anyone wants to comment, go ahead.

affected functions:

  • shared.parseCSV
  • `shared.fromTable`
    
  • editor()'s private func onCellChange()
  • If meta data were created about header rows on CSV object creation (at begin or after reading a line in fetch) then the code would have better direction of when and in what columns to tease numbers from strings. However, not all data has a header row. Some data has rows whose column types, number of columns, and meanings vary from one row to the next and I think it is desirable to support arbitrary comma separated data, including non-square data.
  • There is duplication of need to tease numbers, or null/undefined in input from tables and input CSV files and similar code in shared.parseCSV and shared.fromTable. code in onCellChange() is a little simpler.
  • In the shared routines cells are set to null or undefined when encountering a string "null" or "undefined".
  • In light of how JSON-serialization works, setting anything to undefined is probably a bad idea. undefined does not JSON-serialize across bare values, objects, and arrays (at least on chrome, the results are undefined, an omitted key:value pair, and a null element and the last two are obviously not invertible). JSON is used in the interactions with Storage and future ajax.
  • The goals should be: tease numbers from strings consistently in user, table, or file input; handle empty or null elements in a way that makes sense; avoid use of undefined as it tends to create inconsistencies

Question whose answers should be determined before refactoring:

  • When, if ever, should null be converted consistently to a blank string or a blank string to null or anything else to/from null when reading or writing data? Should null be avoided completely and all CSVs be in terms of string/number?
  • If something is undefined or the string "undefined", what should happen? Perhaps string "undefined" or even string "null" is valid data in some use cases.

export selected in dropdown

Hi, can you add it to csv when the element is a dropdown and only select the selected option in the dropdown?

Is it possible?

Thanks!

download does no escaping

according to html5csv.js#L984 there is no escaping for double quotes.

A dirty hotfix was
for(i=0,l=rows.length; i<l; ++i) csvString += '"'+rows[i].map(function(e){return e.replace(/"/g, '""')}).join('","')+'"'+"\n";

Handle carriage return and excess whitespace in table cell reader

When CSV.begin('#divID') scans a HTML Table, if the table cells contain whitespace or carriage returns, these are placed in the resulting CSV file.

This is probably undesirable, and right and left stripping white space is probably reasonable.

The carriage returns are from carriage returns in the HTML and are not meaningful. They are not br -- HTML line break

read default R CSV format

R is capable of emitting by default a CSV that looks like the sample below.

The problem is that the html5csv parser expects consistent use of quotes
and separator in a line. The header line here looks ok but the data lines would fail
to parse because it will be splitting on quote-comma-quote and the subsequent lines have the first field quoted but subsequent fields with only commas.

"row","a","b","c"
"1",-2.27959885391693,-20.2280898124645,-13.508764177841
"2",1.05317972131473,13.448794323743,-6.00385250399099
"3",0.311643884487952,5.37364106815135,-7.73261554119904

Co-maintainer(s) wanted

Seeking co-maintainer(s) who would enjoy the challenge of creating an easy to use data-science and plotting library for beginning/intermediate website developers.

The existing library already does data input, editing, and sequential/random generation, charting, OLS regression, and principal components analysis.

Qualifications for co-maintainer are:

  • previous JavaScript experience
  • at least student-level experience with data science / statistics
  • a vision for future development that keeps things simple
    • end-users are website developers with minimal JavaScript/JQuery experience
    • this project doesn't require end-users to know about react, vue, promises, nodejs, babel, etc.
    • this doesn't preclude internal project CI/CD that uses nodejs/headless browsers for testing
  • time to do it (how many other things are you doing already? this is certainly one of my problems...)
  • desire to volunteer 4+ hours/week for 2-3 months or more

Ability to commit for more than a few months, bring in other people or resources is a plus.

Benefits include:

  • maintainer access to repository
  • you get to handle issues from our users :-)
  • credit on the front page
  • influence on how things work
  • limited bragging rights

Interested? The bad news up front:

  • no funding
  • old code -- from 2015 or so
  • my own time is limited

In essence, this is an opportunity to eventually take over the project.

I also expect that the project would be rebranded under a new organization name, instead of "DrPaulBrewer".

For obvious reasons, co-maintainers will need to provide reliable "real-life" contact information and your real-life identity is determinable from your github profile.

Contributions made under a pseudonym are certainly welcome, but
at the co-maintainer level higher visibility and professionalism are required.

Uncaught ReferenceError: CSV is not defined

After the meteor compatibility commit, which includes the note "CSV is now declared as a global, but not with window.CSV", I get the following error when I try to load the library:

Uncaught ReferenceError: CSV is not defined

Changing it back to window.CSV fixes the problem. This happens on Chrome 45, Firefox 38, and IE 10.

UTF8 Problem

When I export something on my pc the file is not saved in UTF8 format. This results in problems when opening the files in notepad++.

I fixed this by adding "\ufeff" to the blob file and the a.href when doing the final export steps.

Maybe you can add some property to choose if the file is saved in utf-8 format.

// try IE solution first
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
    try {
	var blob = new Blob(
	    ["\ufeff", decodeURIComponent(encodeURI(csvString))], {
		type: "text/csv;charset=utf-8;"
	    });
	navigator.msSaveBlob(blob, fname);
    } catch(e){ 
	errormsg = "error on CSV.download, IE blob branch:"+e;
	console.log(errormsg);
	if (strict) throw errormsg; 
    }
} else {
    // try Firefox/Chrome solution here
    try {
	var a = document.createElement('a');
	if (!('download' in a)) throw "a does not support download";
	a.href = 'data:attachment/csv,'+ "\ufeff" +encodeURIComponent(csvString);
	a.target = '_blank';
	// use class instead of id here -- PJB 2015.01.10
	a.class = 'dataURLdownloader';
	a.download = fname;
	document.body.appendChild(a);
	a.click();
    } catch(e){
	errormsg = "error on CSV.download, data url branch:"+e;
	console.log(errormsg);
	if (strict) throw errormsg; 
    }

bad test exposed by newer testing software. refactor test

18 04 2017 05:57:17.732:ERROR [Firefox 31.0.0 (Linux 0.0.0)]: ReferenceError: equal is not defined
at http://localhost:9876/base/qtestcsv.js?09ce313cfd4d4b32b948d6ce6cfc5c20ceae2cab:279

  1. Line 279 is an "equals" test based on comparing JSON.stringify of two objects.

    equal(JSON.stringify(D.rows), JSON.stringify(DD.rows), "data matches");

  2. JSON.stringify instead of assert.deepEqual seems like an error-prone way to compare contents of two objects (in this case, array of arrays)

  3. there is no function equal() in qtestcsv.js which seems to be the reported error.

Suggest replacing with assert.deepEqual

CSV.begin('#someFileInput') doesn't work if the user has already chosen a file.

The code in html5csv.js, currently function fromFile defined in lines 262-297 is designed to act as an event handler for a file input. To capture the file this handler needs to be attached before the user chooses a file.

But another use case is that there is a file input where the user has already chosen the file and we would like to read from it. The existing code doesn't handle this use case because the reading of the file is triggered by a change event on the file chooser input.

The current inability to process an already-selected file makes it impossible to check the filename or file extension chosen by the user before processing it. It is possible, for instance, to upload a .PNG image file instead of a .CSV file and the code will try to tease rows of text out of the binary data.

The docs for CSV.begin on the main page also don't currently explain these limitations.

The fix/enhancement is simply breaking the existing file reading routines into two routines. Also, provide an option to restrict the asynchronous version to only allow .csv files or files matching a regex.

Examples TODO

Provide examples for each of the following:
editor
jqplot
ols
hslice
appendCol

"Advanced Topics"
Operational deatils, data organization, "shared"
call
go vs finalize
CSV.extend

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.