Git Product home page Git Product logo

finding-donors's Introduction

项目: 为CharityML寻找捐献者

简介:该项目属于优达学城机器学习纳米学位”监督学习“项目,主要通过监督学习方法对美国人口普查数据进行分析,以帮助CharityML(一个虚拟的慈善机构)识别最有可能向他们捐款的人,项目首先探索数据以了解人口普查数据是如何记录的。接着使用一系列的转换和预处理技术清洗数据。然后基于数据集特点评价所选择的几个算法,并考虑哪一个是最合适的。最后对所选模型进行优化并探索其预测能力。

注:由于是学位项目,过程中除了代码会穿插若干问题,通过建立模型并回答问题的模式,能够更深入的了解各种监督学习算法的实质及其应用场景。

安装

这个项目需要安装下面这些python包:

推荐安装Anaconda, 这是一个已经打包好的python发行版,它包含了我们这个项目需要的所有的库和软件。

代码

代码包含在finding_donors.ipynb这个notebook文件中。还会用到visuals.py和名为census.csv的数据文件来完成这个项目。 注:该项目中一小部分引导性的代码以及visuals.py文件由优达课程提供。其它部分自行完成。

数据

修改的人口普查数据集含有将近32,000个数据点,每一个数据点含有13个特征。这个数据集是Ron Kohavi的论文*"Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid",*中数据集的一个修改版本。你能够在这里找到论文,在UCI的网站找到原始数据集。

特征

  • age: 一个整数,表示被调查者的年龄。
  • workclass: 一个类别变量表示被调查者的通常劳动类型,允许的值有 {Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked}
  • education_level: 一个类别变量表示教育程度,允许的值有 {Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool}
  • education-num: 一个整数表示在学校学习了多少年
  • marital-status: 一个类别变量,允许的值有 {Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse}
  • occupation: 一个类别变量表示一般的职业领域,允许的值有 {Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces}
  • relationship: 一个类别变量表示家庭情况,允许的值有 {Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried}
  • race: 一个类别变量表示人种,允许的值有 {White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black}
  • sex: 一个类别变量表示性别,允许的值有 {Female, Male}
  • capital-gain: 连续值。
  • capital-loss: 连续值。
  • hours-per-week: 连续值。
  • native-country: 一个类别变量表示原始的国家,允许的值有 {United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands}

目标变量

  • income: 一个类别变量,表示收入属于那个类别,允许的值有 {<=50K, >50K}

finding-donors's People

Contributors

melissalj avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.