Git Product home page Git Product logo

featexp's Introduction

Hi there 👋

featexp's People

Contributors

abhayspawar avatar mmeendez8 avatar pauroger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

featexp's Issues

Bug in xgboost DMatrix

dtrain = xgb.DMatrix(X_test, label=y_test, missing=np.nan)
dtest = xgb.DMatrix(X_train, label=y_train, missing=np.nan)

The X_test is for dtrain not X_train?? Is there anything wrong?

Bug in get_trend_stats()

I am getting this error using get_trend_stats()


ValueError Traceback (most recent call last)
in ()
----> 1 stats = get_trend_stats(data=train, target_col='CANTIDAD_DIR_REP_BIG_RT', data_test=test)
2 stats

~/anaconda3/lib/python3.6/site-packages/featexp/base.py in get_trend_stats(data, target_col, features_list, bins, data_test)
247 ignored.append(feature)
248 else:
--> 249 cuts, grouped = get_grouped_data(input_data=data, feature=feature, target_col=target_col, bins=bins)
250 trend_changes = get_trend_changes(grouped_data=grouped, feature=feature, target_col=target_col)
251 if has_test:

~/anaconda3/lib/python3.6/site-packages/featexp/base.py in get_grouped_data(input_data, feature, target_col, bins, cuts)
36 # if reduced_cuts>0:
37 # print('Reduced the number of bins due to less variation in feature')
---> 38 print(cuts)
39 cut_series = pd.cut(input_data[feature], cuts)
40 else:

~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates)
226 bins = _convert_bin_to_numeric_type(bins, dtype)
227 if (np.diff(bins) < 0).any():
--> 228 raise ValueError('bins must increase monotonically.')
229
230 fac, bins = _bins_to_cuts(x, bins, right=right, labels=labels,

ValueError: bins must increase monotonically.

I checked cuts value and this is its content:
[475836.1, 897023.3999999999, 1256710.22, 1334681.24, 1838614.84, 1838614.8399999999, 3230684.84]
So it seems there is a bug since two cuts have the same value!

Colab version compatibility

Great package. I have some issues when running !pip install featexp in google Colab.
I get the following messages:

ERROR: google-colab 1.0.0 has requirement pandas~=0.24.0, but you'll have pandas 0.23.4 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.
Successfully installed featexp-0.0.5 matplotlib-3.0.2 numpy-1.15.4 pandas-0.23.4
WARNING: The following packages were previously imported in this runtime:
[matplotlib,mpl_toolkits,numpy,pandas]
You must restart the runtime in order to use newly installed versions.

As a consequence (I believe) commands like dataframe.head() no longer work.

featexp_demo file not exist error


FileNotFoundErrorTraceback (most recent call last)
in ()
----> 1 X_train, X_test, y_train, y_test, train_users, test_users = import_and_create_train_test_data()
.....(blablabla)
FileNotFoundError: File b'demo/data/application_train.csv' does not exist

why? any knows how to solve?

Pandas SettingWithCopyWarning

When I run get_trend_stats I get the following warning multiple times:

featexp/base.py:23: SettingWithCopyWarning
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

The origin of the problem is the fact that get_grouped_data changes its input dataframe. It might be better to copy the input data before doing anything with it.

Setup:

  • Python 3.8.13
  • featexp 0.0.7
  • Pandas 1.4.3

Matplotlib outputs

Hi again Abhay,

I ended up locally playing with the library since it was not allowing manipulation of the output figures (like changing resolution and saving pdf outputs as opposed to png). It is quite a quick fix I think, just need to restructure a bit the draw_plots function.

bug in get_trend_stats

I meet this problem: ValueError: missing values must be missing in the same location both left and right sides. Can you help me solve this wrong? Thank you very much

代码中test相关文件名的问题

在get_dataloader.py中,定义test_name='testb',但原始test数据文件夹里(百度云下载的),数据文件前缀都是test_a,没有testb开头的,代码执行时会报文件找不到的错误
还有在main_LR.py的load_dataset中,也是‘/test/testb_base.csv'这样的引用,,但并没有testb这种数据

请问这个testb是否实际是指test_a这样开头的文件?谢谢!

@wj19971997

Implementing sample/data weight

Hi Abhay,
I appreciate your great effort on publishing your code. I wonder if there is a way to implement weights for each data point since this is often the case in my domain. Thanks.
Cheers,
Rui

KeyError: "Column 'target' does not exist!"

Hello, I am trying to use this wonderful tool, but the error occurs as 'KeyError: "Column 'target' does not exist!"'. I am pretty sure that 'target' is in train data as a column.
Could you help me with this? Many thanks.
image
image

cannot import name 'get_trend_stats_feature'

First, thanks for developing this package. I tried the "get_univariate_plots" function and it worked but I can not import the "get_trend_stats_feature" function. Do you have ideas?

Update

Hello,

I forked the repo today to be able to use it with newer versions of the required libraries.
btw, it worked with:

numpy==1.17.4
pandas==0.25.3
matplotlib==3.0.2

Also, I did some cosmetic changes which I tried to add with a PR, but I guess you are not allowing them? I am getting the following error:

remote: Permission to abhayspawar/featexp.git denied to pauroger.
fatal: unable to access 'https://github.com/abhayspawar/featexp.git/': The requested URL returned error: 403

why bins are not Equal frequency

AS the function get_univariate_plots says, the bins should be equal frequency, but many plots in featexp_demo.ipynb behave the opposite, for example, 'Plots for CNT_CHILDREN'、'Plots for AMT_INCOME_TOTAL'

Add new function

I think there could add an new function like "get_grouped_data" by using Decision Tree to give the best bins for users.

Citing featexp

Thank you very much for putting together a nice library. Would there be a better way of citing featexp than this repository?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.