Git Product home page Git Product logo

dsci_522_group_34's Introduction

DSCI_522_Group_34

  • Contributors: Kangbo Lu, Siqi Zhou, Mitchie Zhao, Mengyuan Zhu

A data analysis project of group 34 for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.

About

Here we attempt to conduct a two-tailed permutation test to answer a statistical research question, that is, whether the number of graffiti per location of Vancouver's downtown area differs from the number of graffiti per location of the Vancouver Strathcona area. We proposed to sequentially do exploratory data analysis, determine what features and columns to be retrieved to support our permutation testing, and attach with a suitable test flavour as median to verify whether the median number of graffiti per location of Vancouver's downtown area differs from Vancouver's Strathcona area. After conducting exploratory data analysis and hypothesis testing, the results show there is no statistically significant difference between the median of counts of graffiti per recorded location in these 2 areas in Vancouver since the p-value is 1 and it's larger than the significance level of 0.05.

In the research project, the dataset provides information on the location of sites with graffiti as identified by the Vancouver city staff. The graffiti location data is sourced from the Vancouver Open Data Portal and it can be found here, specifically this file. As for the data schema, there are three columns related to our research interest. The columns are named as "COUNT", "GEO LOCAL AREA" and "GEOM". We utilized the "COUNT" and the "GEO LOCAL AREA" columns to conduct a permutation test with the difference in medians to study the graffiti situation in the Vancouver Downtown area and the Vancouver Strathcona area.

Report

The final report can be found here.

Project Collaboration

We created the following 4 files that are important for collaboration:

  1. Team work contract
  2. Code of Conduct file
  3. Contributing file
  4. License file

Usage

To replicate the analysis, clone this GitHub repository, install the dependencies listed below, and run the following commands at the command line/terminal from the root directory of this project:

make all

To reset the repo to a clean state, with no intermediate or results files, run the following command at the command line/terminal from the root directory of this project:

make clean

Makefile Dependency Diagram

Makefile Dependency Diagram

Dependencies

  • Python 3.8.3 and Python packages:

    • docopt==0.6.2
    • requests==2.23.0
    • pandas==1.1.1
  • R version 4.0.2 and R packages:

    • knitr==1.29
    • docopt==0.7.1
    • tidyverse==1.3.0
    • ggplot2==3.3.2
    • RCurl==1.98.1.2
    • infer==0.5.3

License

The DSCI_522_Group_34 materials here are licensed under the MIT License Copyright (c) 2020 DSCI_522_Group_34. If re-using/re-mixing please provide attribution and link to this webpage.

References

Modern Dive: An Introduction to Statistical and Data Sciences via R by Chester Ismay and Albert Y. Kim. https://moderndive.com/index.html.

Quantile estimation by Thomas Bzik. https://www.astm.org/SNEWS/images/ja14_dp.pdf.

“Graffiti.” City of Vancouver Open Data Portal, 3 Feb. 2020, https://opendata.vancouver.ca/explore/dataset/graffiti/information/.

dsci_522_group_34's People

Contributors

kangbolu avatar shoebillm avatar julie-m-zhu avatar roycezhou avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.