Git Product home page Git Product logo

weather-analysis's Introduction

Sample weather data - early 2016

weather.py

  Script that reads in the weather data (194697 rows x 15 columns) and coverts into parquet data format. Performs minor transformations as detailed below under `Assumptions`. Then answers the following questions (individual python functions):

  a. Which date was the hottest day?
  b. Temperature on that day
  c. Region was the hottest day

test_weather.py

  Test suite that independently tests small units of codes within the weather.py file.

file_path_sample.py

  Small python module that supplies/ hides your local file path for the .csv files. Note: Rename file_path_sample.py to file_path.py

Repro steps via cli:

  • Clone the repository using terminal/cli.
    $ mkdir weather
    $ cd weather
    $ git clone https://github.com/Chrisochok/weather_analysis.git
    
  • Refactor file_path_sample.py to file_path.py, and
  • Set the file_path to your csv files location.
  $ cd ~/Desktop/weather/data/
  • Navigate to the folder containing python files.
  $ pwd `to check that you are in the right location`
  $ cd ~/Desktop/weather/
  • Install pandas and fastparquet modules and dependencies (if you do not have these already) On Windows:
  $ pip install pandas
  $ pip install fastparquet

On Mac:

  $ pip3 install pandas
  $ pip3 install fastparquet
  • Run test_weather.py script ( the tests should fail at this stage )
  $ python3 -m unittest test_weather.py
  • Run weather.py script
  $ python3 weather.py
  • Now run test_weather.py once more
  $ python3 -m unittest test_weather.py
  • Alternatively you can run the scripts directly from your favourite IDE.

ASSUMPTIONS:

Upon previewing the source data:

  1. The 'ObservationDate' field should be 'date' data type so was coerced to 'datetime' from 'str' data type for querying accuracy.
  2. 'NaN' values were identified in the following numeric fields and have been imputted with zeros - [WindGust - 151411, Visibility - 26493, Pressure - 14820].
  3. 'NaN' values (27760) were identified in the 'Country' field. These were replaced with empty strings.
  4. "The converted data should be queryable to answer the following question." - it's assumed that this statement refers to converting data file to parquet columnar store before further analysis.

weather-analysis's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.