Git Product home page Git Product logo

easy_university_selection's Introduction

easy_university_selection

一个抓取全国高校历年在各地区的录取分数线的项目(包括专业),用于高考学生筛选学校

背景

家里亲戚小孩高考,文科,竟然咨询我一个学渣如何报考。 我一个理科学渣更是不懂文科学校报考。 网站上查了半天各个学校分数线,也是麻烦。 于是就抓了网站上面各大高校的分数线,筛选一番。。。

脚本使用方法

./easy_university_selection.py 10024 10034 393 2017 10148

10024 表示福建地区 10034 表示文科 393 考生分数 2017 表示高考年份 10148 表示专科(这个参数用于过滤筛选,表示排除专科)

图

基本算法

评估考生对应的历年高考分数

  • 粗暴的算法评估分数(只评估近三年的分数)
  • 计算 学生的分数在今年高考划线的比值
  • 如 该生文科393 ,2017 划线 489,380,300 ,比值分别是393/489=0.803,393/380=1.034,393/300=1.31
  • 2016年,划线501,403,319,预估得分为(501X0.803+403X1.034+319X1.31)/3=412.3,预估该生在2016分数为412.3
  • 是否要考虑考生人数,难度干预系数等,有时间再看看
  • update 20170630 按照分数所在批次加权计算

筛选方法

  • 抓取的数据 主要包括某高校(包含专业)在某地区某批次入取的最高分,最低分,平均分
  • 这样一共分为9个等级(1-9):如 北京大学 在 福建 文科
  • 2016年 入取 最高分 400 最低分 350 平均分 380
  • 2015年 入取 最高分 400 最低分 350 平均分 380
  • 2014年 入取 最高分 400 最低分 350 平均分 380
  • 按时间衰减计算(考虑这样的梯度是否能准确衡量,可以调整)
  • 考生1分数为 393(高于平均分),算出2016年的样本数据入取预测值为6,2015年为5,2014年为4
  • 考生2分数为 420(高于最高分),算出2016年的样本数据入取预测值为9,2015年为8,2014年为7
  • 考生3分数为 354(高于最低分),算出2016年的样本数据入取预测值为3,2015年为1,2014年为1

结果

图 图

其他定义

10035 理科 10034 文科

10036 一本 10037 二本 10038 三本 10148 专科

上海 10000 云南 10001 内蒙古 10002 北京 10003 吉林 10004 四川 10005 天津 10006 宁夏 10007 安徽 10008 山东 10009 山西 10010 广东 10011 广西 10012 ** 10013 江苏 10014 江西 10015 河北 10016 河南 10017 浙江 10018 海南 10019 湖北 10021 湖南 10022 甘肃 10023 福建 10024 西藏 10025 贵州 10026 辽宁 10027 重庆 10028 陕西 10029 青海 10030 黑龙江 10031

easy_university_selection's People

Contributors

siu91 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

easy_university_selection's Issues

syntaxerror

(base) PS C:\Users\TS3\Desktop\easy_university_selection-master> python ./easy_university_selection.py 10024 10034 393 2
017 10148
File "./easy_university_selection.py", line 156
print '共加载' + str(len(d)) + '条专业分数线数据'
^
SyntaxError: invalid syntax

請問一下這是什麽問題0.0

404是不是网站改了

'http://gkcx.eol.cn/schoolhtm/scores/provinceScores31_10016_10035_10036.xml'

Traceback (most recent call last):
File "C:/Users/Administrator.DESKTOP-DHPJ48Q/Desktop/easy_university_selection-master/aaa.py", line 10, in
response = urllib2.urlopen(req)
File "D:\Python27\Lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "D:\Python27\Lib\urllib2.py", line 435, in open
response = meth(req, response)
File "D:\Python27\Lib\urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "D:\Python27\Lib\urllib2.py", line 467, in error
result = self._call_chain(*args)
File "D:\Python27\Lib\urllib2.py", line 407, in _call_chain
result = func(*args)
File "D:\Python27\Lib\urllib2.py", line 654, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "D:\Python27\Lib\urllib2.py", line 435, in open
response = meth(req, response)
File "D:\Python27\Lib\urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "D:\Python27\Lib\urllib2.py", line 473, in error
return self._call_chain(*args)
File "D:\Python27\Lib\urllib2.py", line 407, in _call_chain
result = func(*args)
File "D:\Python27\Lib\urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

求助,抓取的url好像有问题

尝试跑了一下,有http的报错

python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148

好像是抓取的链接根本不存在的问题
在gkcx.eol.cn找了很久也没找到该怎么改

http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml

python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
年份:2018
地区:山西
分数:512 理科
过滤:专科
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
加载高校库完成,共有2766所高校信息载入
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
抓取高校库中所有高校在[山西]地区[理科]招生分数线
http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml
Traceback (most recent call last):
File "./easy_university_selection.py", line 679, in
spider_university_province_score_line('10036', '本一批次')
File "./easy_university_selection.py", line 384, in spider_university_province_score_line
'http://gkcx.eol.cn/schoolhtm/scores/provinceScores', 'provinceScores', tier, info)
File "./easy_university_selection.py", line 406, in spider_score_line
res_data = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 467, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.