Git Product home page Git Product logo

smallsets's Introduction

Smallset Timelines with smallsets

Do you use R or Python to preprocess datasets for analyses? smallsets is an R package that transforms your R/Python preprocessing script into a Smallset Timeline, so that you can document and share your preprocessing decisions in a practical manner.

A full description of the Smallset Timeline can be found in the paper Smallset Timelines: A Visual Representation of Data Preprocessing Decisions in the proceedings of ACM FAccT ’22. A short (3 min) and long (15 min) YouTube video provide an introduction to the project.

If you have questions about using smallsets or would like help building a Smallset Timeline, please email Lydia at [email protected].

Quick start example

After installing smallsets, run the snippet of code to build your first Smallset Timeline!

library(smallsets)

set.seed(145)

Smallset_Timeline(data = s_data,
                  code = system.file("s_data_preprocess.R", package = "smallsets"))

Structured comments

The Smallset Timeline above is based on the R preprocessing script below (s_data_preprocess.R). Structured comments were added to it, informing smallsets what to do.

# smallsets start s_data caption[Remove rows where C2
# is FALSE.]caption
s_data <- s_data[s_data$C2 == TRUE, ]

s_data$C6[is.na(s_data$C6)] <- mean(s_data$C6, na.rm = TRUE)
# smallsets snap s_data caption[Replace missing values in C6 and
# C8 with column means. Drop C7 because there are too many
# missing values.]caption
s_data$C8[is.na(s_data$C8)] <- mean(s_data$C8, na.rm = TRUE)
s_data$C7 <- NULL

s_data$C9 <- s_data$C3 + s_data$C4
# smallsets end s_data caption[Create a new column,
# C9, by summing C3 and C4.]caption

Citing smallsets

Please cite the Smallset Timeline paper if you use the smallsets software.

Lydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022. Smallset Timelines: A Visual Representation of Data Preprocessing Decisions. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1136–1153. https://doi.org/10.1145/3531146.3533175

@inproceedings{smallsets2022, 
author = {Lucchesi, Lydia R. and Kuhnert, Petra M. and Davis, Jenny L. and Xie, Lexing}, 
title = {Smallset Timelines: A Visual Representation of Data Preprocessing Decisions}, 
year = {2022}, 
isbn = {9781450393522}, 
publisher = {Association for Computing Machinery}, 
address = {New York, NY, USA}, 
url = {https://doi.org/10.1145/3531146.3533175}, 
doi = {10.1145/3531146.3533175}, 
location = {Seoul, Republic of Korea}, 
series = {FAccT '22}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.