Git Product home page Git Product logo

data_frame's Introduction

Data Frame

This is a general data frame. Load arrays and labels into it, and you will have a very powerful set of tools on your data set.

Usage

df = DataFrame.from_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv')
df.labels
# => [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]
df.dmc
# => [26.2, 35.4, 43.7, 33.3, 51.3, 85.3,...]
df.dmc.max
# => 291.3
df.dmc.min
# => 1.1
df.dmc.mean
# => 110.872340425532
df.dmc.std
# => 64.0464822492543
df = DataFrame.new(:list, :of, :things)    
# => #<DataFrame:0x24ec6e8 @items=[], @labels=[:list, :of, :things]>
df.labels                              
# => [:list, :of, :things]
df << [1,2,3]                          
# => [[1, 2, 3]]
df.import([[2,3,4],[5,6,7]])           
# => [[2, 3, 4], [5, 6, 7]]
df.items                               
# => [[1, 2, 3], [2, 3, 4], [5, 6, 7]]
df.list                                
# => [1, 2, 5]
df.list.correlation(df.things)
# => 1.0
df.list
# => [1, 2, 5]
df.things
# => [3, 4, 7]

There are a few important features to know:

  • DataFrame.from_csv works for a string, a filename, or a URL.

  • FasterCSV parsing parameters can be passed to DataFrame.from_csv

  • DataFrame looks for operations first on the column labels, then on the row labels, then on the items table. So don’t name things :mean, :standard_deviation, :min, and that sort of thing.

  • CallbackArray allows you to set a callback anytime an array is tainted or untainted (taint, shift, pop, clear, map!, that sort of thing). This is generally useful and will probably be copied into the Repositories gem.

  • TransposableArray is a subclass of CallbackArray, demonstrating how to use it. It creates a very simple approach to memoization. It caches the transpose of the table and resets it whenever it is tainted.

To get your feet wet, you may want to play with data sets found here:

http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html

Transformations

A lot of the work in the data frame is to transform the actual table. You may need to drop columns, filter results, replace values in a column or create a new data frame based on the existing one. Here’s how to do that:

>  df = DataFrame.from_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv')
# => DataFrame rows: 517 labels: [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]
> df.drop!(:ffmc)
# => DataFrame rows: 517 labels: [:x, :y, :month, :day, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]
> df.drop!(:dmc, :dc, :isi, :rh)
# => DataFrame rows: 517 labels: [:x, :y, :month, :day, :temp, :wind, :rain, :area]
> df.x
# => [7, 7, 7, 8, 8, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6,...]
> df.replace!(:x) {|e| e * 3}
# => DataFrame rows: 517 labels: [:x, :y, :month, :day, :temp, :wind, :rain, :area]
> df.x                       
# => [21, 21, 21, 24, 24, 24, 24, 24, 24, 21, 21, 21, 18, 18, 18,...]
> df.filter!(:open_struct) {|row| row.x == 24}
# => DataFrame rows: 61 labels: [:x, :y, :month, :day, :temp, :wind, :rain, :area]
> df.x
# => [24, 24, 24, 24, 24, 24, 24, 24, 24,...]
> new_data_frame = df.subset_from_columns(:x, :y)
# => DataFrame rows: 61 labels: [:x, :y]
> new_data_frame.items
# => [[24, 6], [24, 6], [24, 6], [24, 6], ...]

Note: most of these transformations are not optimized. I’ll work with things for a while before I try to optimize this library. However, I should say that I’ve used some fairly large data sets (thousands of rows) and have been fine with things so far.

Models

Data Frame can now create sub-models:

>> df = DataFrame.from_csv(‘archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv’) => DataFrame rows: 517 labels: [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area] >> df.model(:weekend) do |m| ?> m.day %w(sat sun) >> end => DataFrame rows: 179 labels: [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area] >> df.models.weekend.day.uniq => [“sat”, “sun”] >> df.models => #<OpenStruct weekend=DataFrame rows: 179 labels: [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]>

Utilities

I use data frame for a lot of things, and I’ve added some utilities for this gem in case you would like to as well. For instance, here is how I take the data in a data frame and load it into a neural network:

# Show mlp.  Will probably need to add a row classifier for training and test data.  Also, will probably want to

CLI

There are some really interesting things that have good command-line shortcuts:

  • Make

  • A

  • List

# Now add some demos

Installation

sudo gem install davidrichards-data_frame

Dependencies

  • ActiveSupport: sudo gem install activesupport

  • JustEnumerableStats: sudo gem install davidrichards-just_enumerable_stats

  • FasterCSV: sudo gem install fastercsv

Copyright © 2009 David Richards. See LICENSE for details.

data_frame's People

Contributors

davidrichards avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.