Git Product home page Git Product logo

predict-customer-churn's Issues

Question regarding "3. Feature Engineering.ipynb"

In the notebook under the "Interesting Values" section, you're using "is_cancel" as a feature. Doesn't that create a bias since you're providing future information to the model telling it that the customer has cancelled the subscription?

Issues about notebook "3. Feature Engineering.ipynb"

After running the jupyter notebook "3. Feature Engineering.ipynb" in the update-reqs, I get two errors:

(1) in chunk [39]:
ValueError: ('Unknown transform primitive weekend. ', 'Call ft.primitives.list_primitives() to get', ' a list of available primitives') ,
I think this is because you specify transformation primitives as trans_primitives = ['weekend', 'cum_sum', 'day', 'month', 'diff', 'time_since_previous'], but 'weedend' is not a primitive, 'is_weekend' is a primitive.

(2) in chunk [42]:
'AttributeError: Cutoff time DataFrame must contain a column with either the same name as the target entity time_index or a column named "time" ',
I have check the dataframe of cutoff_time and members, they have the common columns "msno", so I have no idea about this issue?

What does DIFF() measure exactly?

I have a dataframe of person information that includes each person's quality of health (from 0 to 100)

e.g. if I extract just the Name and HealthQuality columns:

John: 70
Mary: 20
Paul: 40

etc etc.

After applying featuretools I noticed a new DIFF(HealthQuality) variable.
Difference between what exactly?

According to the docs, this is what DIFF does:

"Compute the difference between the value in a list and the previous value in that list."

Is featuretools calculating the difference between Mary and John's health quality in this instance?

Issues on the running

I was running the program (3. Feature Engineering.ipynb) in the colab. The program can not go through by some reasons. It gave :"tuple is not allowed for map key". Can you help me to solve it? Thanks.

Issue with customers having a single transaction

Hi, I was trying out the code in "2. Prediction Engineering.ipynb", particularly the "label_customer()" with a dataset that I have and in it I have several customers who only have a single transaction. Now for example if they registered for the service on 10th Jan and the membership expired on 15th Feb the code labels it as a false churn since there is no next transaction date. Is this a bug?

sample_submission_v2.csv

Hi,
i looking for the file sample_submission_v2.csv. should i build from Train....csv with pandas test_train_split from Pandas.

Potential data leakage due to some features?

Firstly, thanks for the detailed notebooks! Learnt a lot from it.

In https://github.com/Featuretools/predict-customer-churn/blob/main/churn/3.%20Feature%20Engineering.ipynb,
I assume that the features num_25, num_50, num_75, num_985, num_100, and num_unq, etc represent the number of specific types of transactions (e.g., number of songs played with 25% completion, 50% completion, etc.) for each customer.

Wouldnt using these features (and various transformations/aggregations on it) in the windows lead to data leakage?

Because basically we are using the information from the future or outside the given time window to create features or train a model.

I have a similar use-case where we have few customer columns like ltv, nr_orders etc which reflect value "as of today". I am not sure how to handle these in the windows that are created.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.