README
- Associated Files: 1_MTA_Exploration.ipynb and Clean_and_Process.ipynb
- Put the data into pandas data frame
- Explore the data through different angles (duplicates, exits distribution, etc.)
- Clean data for New Jersey stations
- Format the data and only leave applicable columns
- Analyse and handle outliers
- Plot the data to understand the trends in the data
- Return a list of top stations sorted descending by number of exits
- Generate MTA score associated with each identified subway station
- Associated Files:
- Utilized web-scraping to generate list of 21 most valuable tech companies' in New York
- Utilized Google Maps API to get geo-locations of the identified tech companies
- Generated tech company score associated with each identified subway station
- Associated Files: Starbucks_and_Census_Alan.ipynb
- Utilized Google Maps API to find the geo-locations of the Starbucks surrounding each identified subway station
- Generated Starbucks score associated with each identified subway station
- Associated Files: Starbucks_and_Census_Alan.ipynb, community_districts.geojson, totpop_singage_sex2010_cd.xlsx
- Incorporated US Census 2010 data and geospatial data for the NYC Community Districts (CDs) to assign a gender score to each CD
- Generated gender score based on which CD each identified subway station was located in
- Associated Files:
- Use geopandas
- Colour code the stations based on different criteria
- Associated Files: Final_Presentation
- Draw conclusions
- Write up recommendation(s)
- Build a presentation (6 min)
- Divide up the presentation topics between team members
Person / People | Area of Responsibility |
---|---|
Billy | MTA data |
Auste | MTA data |
Xu | MTA data |
Joyce | Web Scraping and Google Maps data (Tech Companies) |
Alan | Google Maps data (Starbucks), US 2010 Census Data (Gender) |
Chelan | Visualization |