amphibian-dev / toad Goto Github PK
View Code? Open in Web Editor NEWESC Team's credit scorecard tools.
Home Page: https://toad.readthedocs.io
License: MIT License
ESC Team's credit scorecard tools.
Home Page: https://toad.readthedocs.io
License: MIT License
KS、lift和ROC是风控建模里面最常用的几张图了,希望可以加上
如题,toad会把空值分在第0箱中,希望能单独处理缺失值成一箱而不是和其他箱合并。
calculate IV for each bins in features
check data order and make it in correct order
could please provide 'train.csv' in the demo notebook?
support pass list of the split pointers to bucket
parameter, and support return split pointers for using later
使用的数据集就是german_credit_data.csv,画分箱可视化的时候出现cannot convert float NaN to integer这个错误
In cases when modification needs to WOETransformer, or during group collaboration.
{'a':[1,2,3,4,5]}
{'a':[1,2,3,4,5]}
combiner2 是一个新的实例,没做任何input,确因为继承父类的原因,和combiner共享部分数据
在连续变量分箱的时候,可以将缺失值单独作为一箱分箱,但是如何将缺失值的分箱与其他分箱合并成一箱呢?
你好,请教一下ChiMerge函数卡方分箱的原理,我看源代码里面没有涉及到卡方分布的自由度、分位点,和论文《ChiMerge: Discretization of Numeric Attributes》(Randy Kerber,1992)的原理不一样?
from toad.plot import badrate_plot, proportion_plot
Traceback (most recent call last):
File "D:\program\Anaconda\lib\site-packages\numpy\core\function_base.py", line 117, in linspace
num = operator.index(num)
TypeError: 'float' object cannot be interpreted as an integer
更新numpy 版本未解决>>> numpy.version
'1.18.0'
card
parameter is not a attribute of ScoreCard
, it will raise an error when print it in console.
combiner = Combiner()
bins = combiner.fit_transform(df, target, n_bins = 5)
transer = WOETransformer()
woe = transer.fit_transform(bins, target)
如何对combiner和transer 进行保存,方便下次直接进行数据的transform
card = ScoreCard(
combiner = combiner,
transer = transer,
)
算了一下变量的GINI, 跟商业软件算的差别很大, IV还是比较接近
Gini returns bad values, its returns 42-43 for all variables (checked on binned, and non binned values)
想请教下toad中gbdt+lr的使用教程,在文档里看到了该功能,但是只写了部分,想看看完整版的,谢谢
1、连续变量分箱能否提供强制woe单调的功能
2、有没有微信交流群,谢谢
col_lst = train_selected.columns.values[:-1].tolist()
for col in col_lst:
bin_plot(c.transform(train_selected[[col,'is_bad']], labels=True), x=col, target='is_bad')
在分箱画图的时候弹出的提示,No handles with labels found to put in legend. 虽然不影响出图
设置n_bins = 5 结果分了12箱。
`combiner = toad.transform.Combiner()
combiner.fit(dev_slct2,dev_slct2['target'],method='chi',min_samples = 0.05,n_bins = 5,
exclude=ex_lis)
bins = combiner.export()
bins['age']`
[22, 23, 25, 26, 27, 28, 31, 33, 35, 38, 43]
Combiner
should combine not empty feature into several bins first, then merge the empty features to the most similar group.
c=(arr==value).sum
1、在用toad.quality()计算特征IV值的时候是是基于哪个分箱的依据的,若希望在按照指定要求分箱并计算WOE后再计算出特征的IV值如何操作?
2、在上一步的基础上对每个特征分箱后能否用分箱号替换特征值,而不仅仅用WOE替换特征值?
3、在用toad分箱的时候是否需要考虑分类变量和连续性变量分别分箱,尤其是像身份证所在省份这样类别特别多的分类变量怎么样和连续性变量进行区分?
from toad.plot import bin_plot
Traceback (most recent call last):
File "C:\miniconda\lib\site-packages\numpy\core\function_base.py", line 117, in linspace
num = operator.index(num)
TypeError: 'float' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "C:\miniconda\lib\site-packages\toad\plot.py", line 6, in
from .tadpole import tadpole
File "C:\miniconda\lib\site-packages\toad\tadpole_init_.py", line 5, in
from .base import Tadpole
File "C:\miniconda\lib\site-packages\toad\tadpole\base.py", line 2, in
from .utils import (
File "C:\miniconda\lib\site-packages\toad\tadpole\utils.py", line 16, in
HEATMAP_CMAP = sns.diverging_palette(240, 10, as_cmap = True)
File "C:\miniconda\lib\site-packages\seaborn\palettes.py", line 744, in diverging_palette
neg = palfunc((h_neg, s, l), 128 - (sep / 2), reverse=True, input="husl")
File "C:\miniconda\lib\site-packages\seaborn\palettes.py", line 641, in light_palette
return blend_palette(colors, n_colors, as_cmap)
File "C:\miniconda\lib\site-packages\seaborn\palettes.py", line 777, in blend_palette
pal = _ColorPalette(pal(np.linspace(0, 1, n_colors)))
File "<array_function internals>", line 6, in linspace
File "C:\miniconda\lib\site-packages\numpy\core\function_base.py", line 121, in linspace
.format(type(num)))
TypeError: object of type <class 'float'> cannot be safely interpreted as an integer.
numpy 和 cpython都是最新
toad和依赖包已经正确安装:
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: toad in /home/mcq/anaconda3/lib/python3.8/site-packages (0.0.63) Requirement already satisfied: numpy<1.20,>=1.18.0 in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (1.19.5) Requirement already satisfied: scikit-learn>=0.21 in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (0.23.1) Requirement already satisfied: joblib>=0.12 in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (0.16.0) Requirement already satisfied: pandas in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (1.0.5) Requirement already satisfied: seaborn>=0.10.0 in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (0.10.1) Requirement already satisfied: scipy in /home/mcq/anaconda3/lib/python3.8/site-packages (from toad) (1.5.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/mcq/anaconda3/lib/python3.8/site-packages (from scikit-learn>=0.21->toad) (2.1.0) Requirement already satisfied: python-dateutil>=2.6.1 in /home/mcq/anaconda3/lib/python3.8/site-packages (from pandas->toad) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /home/mcq/anaconda3/lib/python3.8/site-packages (from pandas->toad) (2020.1) Requirement already satisfied: matplotlib>=2.1.2 in /home/mcq/anaconda3/lib/python3.8/site-packages (from seaborn>=0.10.0->toad) (3.2.2) Requirement already satisfied: six>=1.5 in /home/mcq/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.6.1->pandas->toad) (1.15.0) Requirement already satisfied: kiwisolver>=1.0.1 in /home/mcq/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn>=0.10.0->toad) (1.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/mcq/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn>=0.10.0->toad) (2.4.7) Requirement already satisfied: cycler>=0.10 in /home/mcq/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.1.2->seaborn>=0.10.0->toad) (0.10.0)
import 时报错
`
ValueError Traceback (most recent call last)
in
----> 1 import toad
~/anaconda3/lib/python3.8/site-packages/toad/init.py in
----> 1 from .merge import merge, DTMerge, ChiMerge, StepMerge, QuantileMerge, KMeansMerge
2 from .detector import detect
3 from .metrics import KS, KS_bucket, F1
4 from .stats import quality, IV, VIF, WOE, entropy, entropy_cond, gini, gini_cond
5 from .selection import select
~/anaconda3/lib/python3.8/site-packages/toad/merge.pyx in init toad.merge()
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`
看了源码,好像在分箱的过程中没有开多线程?向大佬确认下,是否默认开启了多线程,不然像卡方之类的分箱会很慢。
请教,生成proba的逻辑回归和toad.scorecard.ScoreCard的逻辑回归用同样的参数,predict、woe_to_score、bin_to_score方法的分数一样,proba_to_score方法和前三者不一样
Y axis start with zero
sklearn.tree.plot_tree
can not work correct after used toad.plot
您好,我在对带有缺失值的连续型变量分箱的时候,先是把缺失单独分成一箱,然后再进行分箱调整的时候发现name 'nan' is not defined的问题,导致无法正常分箱,请问如何处理这个问题?
建议分箱的时候,用户能够指定缺失值为特殊一箱,不参与分箱
非常感谢您开发的框架,感谢你的分享
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.