Git Product home page Git Product logo

cs5433bigdatagroup6's Introduction

Hello ๐Ÿ‘‹ Use the drop down arrows to learn more about the open-source work I've been apart of.
Controller Area Network bus
Ground heat exchangers
  • pygfunction served as the backbone for a first of kind ground heat exchanger design tool, ghedt.
  • cpgfunctionEP has been integrated into EnergyPlus as a third party application.
  • cpgfunction was used to compute g-functions on a high performance computing cluster (HPCC).

(Refer to my masters thesis for a more detailed discussion.)

cs5433bigdatagroup6's People

Contributors

thommms avatar

Watchers

 avatar

cs5433bigdatagroup6's Issues

Task 1 Data Correction

DATA CORRECION

Missing data (cells with no values), and out of range values (for example, a temperature of 1000G in a weather dataset) are listed below on rows 9 and 10.

  1. Define "out of range" values only for the cells listed in section 9 For example, you may decide that any temperature t where -30 > t > 120 is out of range. Note: there are no strict rules on this, so decide on some range that looks reasonable
  2. Only for cells listed in section 9, replace the missing data and out of range values with values from another row that is most similar to the row where data is missing or out of range. For example:
    image
    Row 9 column C (cell 9C) has a value missing. The closest value appears in row 11. Therefore cell 9C is given a value of 77

Row 10 column D has an out of range value. The closest row appears to be row 6, although the city is different. Therefore, cell 10D is given a value of 50. The closest row may also be row 5 as the city if the same, but the temperature difference is greater than for row 6. The similarity algorithm will identify the closest row.

There are a number of similarity algorithms such as Cosine similarity, Jacard similarity etc. You may choose any similarity algorithm. Missing or out of range values will only be numerical (integers or float) values. You may have to use one-hot encoding for text (such as 'Stillwater' for example).

Task 3 Prediction Algorithm

Using the housing data set predict Price using a random regression forest machine learning algorithm. Create a network for both of the following:

  • all the features in the dataset
  • only the feature with the highest correlation as determined in task 2

Task 2 Data Correlation

Identify which column or feature has the most impact on the result. For example, if the goal is to predict if a person may get a stroke, the age of the person is the most important variable. It is more important than gender and other variables. Some variables or features or columns such as a person's address is irrelevant when it comes to predicting stroke.

Note: you are required to do a pairwise comparison only. In other words, compare incidence of stroke with one feature only. You will therefore have to do a correlation between each feature (age, gender, etc.) and the output feature stroke. Foe example:

image

Do a correlation between Age and Stroke, Gender and Stroke, Systolic Pressure and Stroke, ... as Stroke is the output feature.

Task 5 Bonus

Create a pipeline for tasks 1 - 4. this is a bonus and is optional.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.