fsprojects / fsharp.data Goto Github PK

View Code? Open in Web Editor NEW

805.0 41.0 289.0 141.21 MB

F# Data: Library for Data Access

Home Page: https://fsprojects.github.io/FSharp.Data

License: Other

Shell 0.01% F# 99.17% C# 0.81% Batchfile 0.01%

fsharp html csv json xml worldbank http typeprovider data

fsharp.data's Introduction

FSharp.Data: Making Data Access Simple

The FSharp.Data package (FSharp.Data.dll) implements everything you need to access data in your F# applications and scripts. It implements F# type providers for working with structured file formats (CSV, HTML, JSON and XML) and for accessing the WorldBank data. It also includes helpers for parsing CSV, HTML and JSON files and for sending HTTP requests.

We're open to contributions from anyone. If you want to help out but don't know where to start, you can take one of the Up-For-Grabs issues, or help to improve the documentation.

You can see the version history here.

Building

Install the .NET SDK specified in the global.json file
build.sh -t Build or build.cmd -t Build

Formatting

dotnet fake build -t Format
dotnet fake build -t CheckFormat

Documentation

This library comes with comprehensive documentation. The documentation is automatically generated from *.fsx files in the content folder and from the comments in the code. If you find a typo, please submit a pull request!

FSharp.Data package home page with more information about the library, contributions, etc.
The samples from the documentation are included as part of FSharp.Data.Tests.sln, make sure you build the solution before trying out the samples to ensure that all needed packages are installed.

Releasing

Releasing of the NuGet package is done by GitHub actions CI from master branch when a new version is pushed.

Releasing of docs is done by GitHub actions CI on each push to master branch.

Support and community

If you have a question about FSharp.Data, ask at StackOverflow and mark your question with the f#-data tag.
If you want to submit a bug, a feature request or help with fixing bugs then look at issues and read contributing to FSharp.Data.
To discuss more general issues about FSharp.Data, its goals and other open-source F# projects, join the fsharp-opensource mailing list

Code of Conduct

This repository is governed by the Contributor Covenant Code of Conduct.

We pledge to be overt in our openness, welcoming all people to contribute, and pledging in return to value them as whole human beings and to foster an atmosphere of kindness, cooperation, and understanding.

Library license

The library is available under Apache 2.0. For more information see the License file in the GitHub repository.

Maintainers

Current maintainers are Don Syme and Phillip Carter

Historical maintainers of this project are Gustavo Guerra, Tomas Petricek and Colin Bull.

fsharp.data's People

Contributors

Stargazers

Watchers

Forkers

forki rickasaurus ascjones follesoe ryansroberts tfmorris tpetricek jamesholwell dam0 bennylynch rojepp remkoboschker bentayloruk ovatsus sdghub curit evolvedmicrobe vasily-kirichenko buybackoff mydogisbox yukitos enricosada mavnn sadgit artur-s tihan colinbull dahlbyk kiwidev danielfabian okayx6 phatcher bryanhunter rncor pezipink sebfia mathias-brandewinder thinkbeforecoding duckmatt theimowski 7sharp9 runefs atwoodtm mexx deepakramas akrasheninnikov mrange kimsk tfrimor russvanbert eulerfx visemet acassells nkw moonmile aastevenson tomfreeman jruizaranguren danielmarti danielrbradley mjsottile devboy taylorwood christianlang weismat alfonsogarciacaro hariguru lasandell blumu spurnaye finnnk gabology piotrwolkowski haf alex-bogomaz seanhussey nwolverson b-e-n-j stuarthillary trevormcg cboudereau daredude tonyj444 pver hyperlobic jjcorrea ntr andrevdm jeff-cortese telefunkenvf14 modulexcite jhamm giacomociti lalberto8085 sabotageandi bsulima martinjoshua olgierdd youscribe dsyme

fsharp.data's Issues

Will FSharp.Data work with mono on a Mac?

Does the type providers work with mono on Mac? I am using visual studio on Windows right now but would like to switch to Mac if possible.

publish a new nuget (keep for code sprint)

Write API for JsonProvider

Allow to construct and/or modify typed JSON objects

Before reading the rest, see fsprojects/FSharpx.Extras#196 for the start of this thread

Date support in JSON

Use AddDefinitionLocation in the type providers when using a file in the sample

So when pressing F12 in VisualStudio on a generated property, it opens up the sample csv/xml/json in the right place.
See http://msdn.microsoft.com/en-gb/library/hh361034.aspx for an example

The sample data can be loaded from an url, the Load methods of the generated types should also support that

We can reuse the same code

Support XSD in XmlProvider

This is awesome!

I love the name FSharp.Data. In FSharpx we have the json parser in FSharpx.Core and the json typeprovider in FSharpx.TypeProviders.Documents, effectively grouping stuff by the implementation type. For newcomers having FSharp.Data on nuget is much easier to find, and it's a nice concept.
An the cherry on top is the documentation you've made. Very nice!

If @forki and the rest of the fsharpx team agrees, I would suggest the following:

Move Freebase type provider from fsharpx into here also, as it makes sense to have both wordBank and freebase together
Removing FSharpx.TypeProviders.Freebase and FSharpx.TypeProviders.Documents from FSharpx
Moving this into github/fsharp organization
Add support for portable profile

FSharpx is great as a holder of general purpose extensions to F#, but when there's a good group of functionality that makes sense to hold together, it makes sense to split it. We're also discussing there moving the DataStructures to a separate package.

I don't have much free time this weekend but I can try to do a pull request with the merging if you all agree

Testing for public provider API

We need to turn the script-based mechanism for testing provider APIs (as implemented in https://github.com/tpetricek/FSharp.Data/blob/master/src/Test.fsx) into a normal unit test that can be easily executed.

I'm not entirely sure what this could test, but checking that the generated types are from the right assembly would be great. Checking the public signature against some baseline would be great too.

Improve the naming generated by the Json provider

http://saxonmatt.co.uk/2013/01/introducing-the-fsharp-github-api.html

"had no control over the names of the fields in each of my record types. Having everything lower-cased with underscores all over didn’t float my boat (as a .NET developer)."

Add FsCharts to the nuget package

Hi,

since nearly all of the samples are using FsChart it would be cool it this could become part of the nuget package. This would allow me to write easier tutorials for my Dynamics NAV friends.

Cheers,
Steffen

Allow CsvProvider to accept the sample csv directly in a string, in addition to through a filename

FSharpx Json provider accepts either a Schema or a FileName parameter, while the FSharp.Data Json provider reuses the same parameter and tries to figure out if it's a filename, url, or json.
It would be nice for the csv provider to also allow that.

Yahoo Finance Type Provider

I'm not sure if this fits within FSharp.Data or not, but it does seem like a fairly common use case. We already use yahoo data in the CSV samples, but like WorldBank, where we can use the Json TP, having a dedicated TP is much better.

Some links with info on the API:
http://code.google.com/p/yahoo-finance-managed/wiki/CSVAPI
http://developer.yahoo.com/yql/
http://www.jarloo.com/yahoo-stock-symbol-lookup/

Fix support for headers and post body in FSharp.Net.Http for portable version

Support mixed separators in CsvProvider?

I just saw a file that is classified as tab separated values, but in which the first column is a collection of 5 values comma separated :/

This was a random file from the EU open data, but I wonder if this is common:
http://open-data.europa.eu/open-data/data/dataset/00YYPa7FUadFAd4HH4quTw/resource/f6884ab7-edb7-46fa-a61b-ee0b0c0cb723

Hide implementation methods of Freebase provider

Use same interface pattern used in WorldBank

Improve the debugger display of CSV rows

When seeing in the debugger or fsi, you only see raw data, it would be nice to customize that

file resolution

I use the example in a build.fsx file

type Stocks = CsvProvider<"data/MSFT.csv">
...

it compiles with no pb. I now run this through Fake

.\tools\FAKE\tools\Fake.exe "build.fsx"

where build.fsx is the previous file.
This leads to an error
build.fsx(12,15): error FS3033: The type provider 'ProviderImplementation.CsvProvider' reported an error: The input sequ
ence was empty. Parameter name: source

There should be more information about which resolved file the TP was trying to open.
Even after copying data/MSFT.csv to the Fake.exe folder, I have the error so I am quite puzzled actually.

I switched to absolute path of course, but that is far from ideal for scripting purposes in shared environment. That would be bad to loose strong type safety because of environment variable :)

(btw, staged execution would not have this pb I guess as in that case I'd generate the metastage before running the fake.exe, removing exposure to environment change influencing type generation)

add some column descriptors (for code sprint)

we could expose the inferred types of columns at the TP level (and even add additional information like mean, variance for numeric types, 10 first options cases etc... )

Add async loading

When reading CSV, XML or JSON from the web, it should be possible to read the data asynchronously.

For XML and JSON, we read the entire file before processing, so this should be just a simple AsyncLoad method.
For CSV, it would be nice to use some sort of async enumerator so that we can read the data asynchronously on demand.

Use a smarter default separator for CSVProvider

If the sample file/url ends with tsv, use \t as the default separator if nothing was specified.
We could also look at the header row and infer it in case there isn't any comma

CsvProvider incorrect type for DNB Currency Exchange file

The Norwegian bank DNB has online CSV files for currency exchange rates, which is an excellent dataset to play around with when learning F#. The CSV file for historical rates can be found at https://www.dnb.no/portalfront/datafiles/miscellaneous/csv/historiske_kurser.csv

The first few lines looks like this:

Dato,USD,EUR,SEK,DKK,GBP,CHF,JPY,CAD,ISK,AUD
31.01.2013,5.4833,7.4312,86.20,99.61,8.6784,601.40,6.0309,5.4688,4.2524,5.7013
30.01.2013,5.4897,7.4180,86.34,99.44,8.6540,595.64,6.0293,5.4828,4.2642,5.7458
29.01.2013,5.5316,7.4336,86.06,99.64,8.6924,597.72,6.1015,5.5022,4.2981,5.7847
28.01.2013,5.5368,7.4379,85.53,99.67,8.7033,596.35,6.1038,5.4904,4.3028,5.7613

When using the provider like this:

type CurrencyCsv = FSharp.Data.CsvProvider<"historiske_kurser.csv", ",", "en-us", 10>

let wc = new WebClient()
let data = wc.DownloadString("https://www.dnb.no/portalfront/datafiles/miscellaneous/csv/historiske_kurser.csv")

let exchange = CurrencyCsv.Parse(data)

exchange.Data 
|> Seq.map(fun row -> (row.Dato, row.Usd))
|> Seq.sortBy(fun (date, usd) -> usd)
|> printfn "%A"

It infers the correct column names, but the Usd field is of type DateTime, and not float as expected.

I have tried with the nb-no culture, but with same result.

I'm happy to fix it with a pull request as soon as I get the project set up and building on my machine, but will keep this issue for reference, and something to link the pull request against.

Add assembly info with version number

Currently there's only version in the nuget package, the dll always has version 0.0.0.0
We could reuse some of the fake scripts from fsharpx to auto increase both the assembly info version and the nuget packager version based on the tag

Fix System.Xml.Linq reference in portable library for Silverlight

If not possible

Improve type providers error reporting

When there's an error because the sample can't be found on disk or is invalid, the error message in Visual Studio isn't very informative.
We can change it to provide a little bit of more information.
Will submit pull request later today

csv type provider does not accept spaces in file name (for code sprint)

NameUtils improvements

When we find a field like "Foo%", instead of generating "Foo", generate "FooPct" or "FooPercentage"
When we find something like "Foo&Bar", instead of generating "FooBar" generate "FooAndBar"
When we find something like "Foo@Bar", instead of generating "FooBar" generate "FooAtBar"

Documentation for Freebase provider

Consume the sample data directly by default

The FSharpx type providers for json and xml start with the data already loaded, which is handy for scripting scenarios and demos.
In FSharp.Data we always have to do a .Load or .Parse after defining the type. By default it could load the sample data, and we would only need to call .Load or .Parse if we wanted to override it. You could disable that behavior by passing a LoadSampleData=false

version of fsharpchart is not the current one from nuget

version the packages config + some specific build targets for downloading them ?

CSV Provider performance improvements for big files

The csv providers still hangs VS when we give it very large files. Possible improvements:

Do the inference only when accessing the first member of a row, so it doesn't start processing before we're able to change the InferRows parameters
Make the default InferRows something other than int.Max. Let's say put 1000 the max by default
Do the Inference asynchronously with a timeout so we don't hang VS when it takes more than 10 seconds

Problem with booleans in CsvProvider

With a csv file like this:

Column1,Column2,Column3
TRUE,NO,3

When compiling this code:

open FSharp.Data

type csvType = CsvProvider<"C:/temp.csv">
let csv = csvType.Load "C:/temp.csv"
for line in csv.Data do
    printfn "%b %b %i" line.Column1 line.Column2 line.Column3

We get the following errors:

Error   1   The type provider 'ProviderImplementation.CsvProvider' reported an error in the context of provided type 'FSharp.Data.CsvProvider,Sample="C:/temp.csv"+DomainTypes+Row', member 'get_Column1'. The error: Constructing call of the 'ConvertBoolean' operation failed.   i:\documents\visual studio 2012\Projects\ConsoleApplication12\ConsoleApplication12\Program.fs   6   24  ConsoleApplication12
Error   2   The type provider 'ProviderImplementation.CsvProvider' reported an error in the context of provided type 'FSharp.Data.CsvProvider,Sample="C:/temp.csv"+DomainTypes+Row', member 'get_Column2'. The error: Constructing call of the 'ConvertBoolean' operation failed.   i:\documents\visual studio 2012\Projects\ConsoleApplication12\ConsoleApplication12\Program.fs   6   37  ConsoleApplication12

This was working in previous versions and was broken recently

Document AssemblyReplacer.fs

The code in AssemblyReplacer.fs is implementing an essential functionality for the portable profile, but is not commented at all.

We need to add at least some overview of a big picture (what is it doing in general) and some explanatory comments to all top level functions (similarly to how this is done in the rest of the code-base).

Improve handling of missing values in the CSV provider

I have some code ready to push, but I'd also like to discuss alternatives

Currently, when there's a missing value, the inference will force that column to be of type string. The only exception is when there's an explicit #N/A on columns of type double, in that case inference will still recognize that column as a double and use double.NaN at runtime

I propose the following:

When there's a missing value in a double column, also treat is as a double.NaN
When there's a missing value in a decimal column, infer that column to be double instead, so we can use double.NaN
When there's a missing value in int32, int64, bool, or date column, make that column type an option

Other alternatives:

Instead of option types use nullables for int32, int64, bool, and date columns. Both the XML and JSON providers use options, but the freebase provider uses nullables. Maybe add a parameter named PreferNullableTypes to activate use nullables but use options by default? Or make nullables the default and allow to switch to options? Nullables are easier to handle for numbers because of the Linq.NullableOperators module
Never generate options/nullables for datetimes and instead return the default datetime at runtime

Hide implementation methods

Csv

Method CsvFile.Parse: data:TextReader * sep:string option -> CsvFile is accessible, but it really shouldn't, because it returns CsvFile, and not the generated CsvType.
On other type providers, this problem doesn't exist because we usually replace the methods with others with the same signature in the derived generated class, but in the CsvProvider case, the generated Parse method doesn't have the sep parameter.
The ideal would be to make this method protected, but F# doesn't support that. Any other ideas to fix this?

WorldBank

There's a bunch of _Get methods that get the untyped data. We can hide them by putting them in an interface. Maybe that can also fix the problem with csv

Improve samples projects that use portable library and move them out of the tests solution

Add freebase provider

add a option for header or not in Csvprovider (for code sprint)

when there is no header, it takes the first data row as of now.
we can add names by hand to the csv or generate generic names like col0, ..

Add more tests

The current tests test the structural inference (and some aspects of JSON), but it we need more tests for the end-user type providers and for JSON parser.

I think these can be largely adapted from fsharpx.

Consider renaming CsvProvider

I know this suggestion is a little bold, but thinking about it, CsvProvider currently works not only with just csv files but also with tab separated files, or any other similar textual format, and in the future it might well support more formats of tabular data (like xls/xlsx, hdf5/netCDF4, .rdata, .mat, etc...), either directly or maybe as plugins (I have some ideas about how to make that work without changing the api or creating dependencies...). But the inference and generation of typed properties is the same between all the formats.
Both the R tools and the several Python libraries that work with all those kind of files are usually called read.table or read_table (even though they have overloads called read.csv or read_csv that the only thing they do is to set the default separator to ',')
Do you think renaming CsvProvider to TabularDataProvider would be a good idea? Or are people expecting that name and we can always make the same type provider available under other additional names (like we do with freebase and worldbank that have two versions each)?

Remove pluralization service

The NameUtils.fs file uses PluralizationService from System.Data.Entity.Design.PluralizationServices. This needs to be replaced with some other library or a custom implementation of English pluralization (in order to make the library compatible with the Client profile and more importantly also for portable profile and Mono).

Allow to specify the NA string in CSV provider

In some datasets it's ":"

CSV Provider - support delimiter within quotes

I'm afraid the provider may fail with delimiter within quotes, for instance with this data

12,"Usual Suspects, The (1995)",14-Aug-95

Thanks,

Allow overriding the schema in the CSV provider

Two alternatives:

Type provider parameter like in in TryFSharp.org:

type csvType = CsvFile<file, Schema = "date,float,float,float,float,int,float">

Allow to specify the type in the header title within braces, like we already allow for units
```
Column1 (m), Column1 (float), Column2 (float<m>)
```

CsvProvider fails to compile if csv file is open in Excel

This is very frequent and very annoying.
We should change File.OpenRead in Helpers.fs to use File.Open and pass the FileShare.ReadWrite parameter, as described in http://stackoverflow.com/questions/897796/how-do-i-open-an-already-opened-file-with-a-net-streamreader/898017#898017

Fix loading of external xml and json in Silverlight sample

Needs some config xml files to disable the default security checks

Generate enums in csv type provider

If a column is infered as string, and there are many repeated values, it's probably an enumeration, so we could generate an enum. If the inference geets it wrong, we could always override (#19). We could use something like (number of distinct values / number of rows) < 0.2 to trigger this