Git Product home page Git Product logo

analyzing_brooklyn's Introduction

Accessing The Paper:

In this project, we were studying the livability and safety of different neighborhoods in Brooklyn. This topic is important to investigate because the neighborhood you live in in New York has a very big impact on the life you live. Therefore, we set out to understand which neighborhoods were highly livable and which were not. Before we started our research, we already had an understanding that many parts of Brooklyn were highly segregated, so we were expecting quite a large variance between different neighborhoods in terms of 311 complaints, energy usage, and automobile accidents. By doing this study, we will further the knowledge that data analysts have regarding urban development in New York and can support urban policy to make the lives of New Yorkers better. We expected that certain zip codes like those in East Flatbush would be far more livable....

Read the Rest of Moving to Brooklyn Paper

Folder structure:

data_ingest - contains the steps we took to ingest the file to HDFS after downloading the datasets from online sources

etl_code - contains the MapReduce programs we all wrote independently to clean our giant data files of certain columns that we did not need.

profiling_code - the way we created the Hive tables to store the data correctly. Also, the way we used HiveQL to organize and group the data by zipcode in preparation for joining.

test_code - all kinds of testing and trialing we did to understand how we can best organize our data by zipcode. We experimented in various ways to get the most relevant information from our data. This testing can be seen in this folder.

screenshots - screenshots of the steps we took to create our cleaned and organized datasets

Running the code:

First, all the files must be found on the local directory or HDFS. Then it is necessary to create the Hive tables to store the files then use copyFromLocal to move these files into the Hive table. Then we must use the Hive commands to organize the data by zip code as shown in the profiling_code folder. Finally, we must run the Hive command to join all the new tables together into one final table. This table will be used for our analytic. All analyis can be done on this new table through Hive.

To create the final analytic first follow the Ingestion-Abed Islam.txt to load all the data accordingly. Then look at the queriesToCreateAnalytic.hql file. Run the file in the exact order the queries are in. It will first create all the tables. Then the table energyCleaned will create the profiled table Abed needs. The table energyCrisis is a join between Abed and Sheika's table(crashes and enery). Then with energyCrisis it is joined in with Afnan table which creates the analytic table.

analyzing_brooklyn's People

Contributors

ai1138 avatar

Stargazers

Sean P. Myrick V19.1.7.2 avatar

Watchers

 avatar Sean P. Myrick V19.1.7.2 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.