Description
Stripe is an online payment company that offers software and APIs for processing payments and business management.
This pipeline uses Stripe API and dlt to load data such as Customer, Subscription, Event and etc. to the database and to calculate the MRR and churn rate.
- name of the pipeline is "stripe_analytics"
- category and description of the pipeline: loads the relevant data from Stripe API and calculates some metrics.
Use Cases
- Pipeline provides automatic data collection using Stripe API and storing them in a database (DuckDB, Postgres, Redshift, BigQuery). You will be able to store information about customers, all their purchases, subscriptions, as well as information about your products and etc.
You can load data from any endpoint load_data(endpoints=("Customer", "Product"))
.
You can load data in Incremental mode load_incremental_endpoints(endpoints=("Event", "Invoice"))
.
- With the pipeline you can calculate the most important metrics (MRR and subscription churn rate) and store them in a database as a resource. Use
load_data_and_get_metrics()
.
Sources / Resources / Endpoints
We have two sources: stripe_source and incremental_stripe_source.
-
stripe_source:
Retrieves data from the Stripe API for the specified endpoints. For almost all endpoints, Stripe API responses do not provide key "updated", so in most cases we are forced to load the data in 'replace' mode. This source is suitable for all types of endpoints, including 'Events', 'Invoice' etc, but these endpoints can also be loaded in incremental mode (see source incremental_stripe_source).
-
incremental_stripe_source:
As Stripe API does not include the "updated" key in its responses, we are only able to perform incremental downloads from endpoints where all objects are uneditable. This source yields the resources with incremental loading based on "append" mode. You will load only the newest data without duplicating and without downloading a huge amount of data each time.
To calculate the metrics in load_data_and_get_metrics()
, we need endpoints Subscription and Event. All endpoints become resources automatically.
Resource Subscription:
- With Subscriptions, customers make recurring payments for access to a product. Subscriptions require you to retain more information about your customers than one-time purchases do because you need to charge customers in the future.
- Updated Endpoint. Object states can change over time, for example, customers can unsubscribe, so that the status of the subscription changes from active to canceled.
- To get a list of subscriptions from Stripe API we used method list. The
list
method returns a maximum of 100 elements. To get all the elements we iterate over the pages using the key "has_more".
- By default, method
list
returns a list of subscriptions that have not been canceled. Passing in a value of all
will return subscriptions of all statuses. In current pipeline we set status all
for subscriptions.
Resource Event:
- Events are the way of letting you know when something interesting happens in your account. When an interesting event occurs, stripe create a new Event object. For example, when a charge succeeds, stripe create a charge.succeeded event; and when an invoice payment attempt fails, stripe create an invoice.payment_failed event. Note that many API requests may cause multiple events to be created. For example, if you create a new subscription for a customer, you will receive both a customer.subscription.created event and a charge.succeeded event.
- Incremental Endpoint. Objects do not change over time. If a new event occurs, it is simply added to Event. For example, if the customer canceled the subscription, then a new event will appear with the type "customer.subscription.deleted".
- To get a list of events from Stripe API we used method list. List events, going back up to 30 days.
Resource Metrics:
- Stripe Billing Analytics Dashboard provides a summarized view of your account, which provides a lot of useful information like MRR, churn, and so on. Unfortunately, there is not an API for the Stripe Billing Analytics Dashboard. So, we can’t get the dashboard out to Python directly. But Stripe does provide some guidance on how they calculate those metrics.
- Metrics are saved to the database as a resource each time you run a function
from stripe_analytics import metrics_resource
- You can add your own metrics in this resource (see Customisation).
Metrics calculation
You can find some metrics implementation in the pipelines/stripe_analytics/metrics.py
script.
MRR
Monthly Recurring Revenue (MRR) can be thought of as the total amount of monthly revenue you can reliably expect to receive on a recurring basis. It is one of the most important metrics for a SAAS business to track, as it provides a forward looking measure of growth and predicted revenue. You can calculate the approximate MRR by summing the monthly-normalized amounts of all subscriptions from which payment is being collected at that time.
- First, we need to get all the subscription data. Here we set status="all" to get all subscription data including canceled subscription. We can also not include this argument to only get the “active” and “past_due” status since we don’t use the “canceled” status in this calculation.
- Then we get the information about the subscription plan, i.e., amount, and interval (yearly or monthly plan).
- If there is a discount on someone’s plan, we get the information about the discount.
- Next, we calculate the normalized monthly plan amount by normalizing the yearly amount and also applying the discount.
- Finally, MRR is calculated as the sum of the normalized monthly plan amount for those who have an “active” or “past_due” status.
This calculation is based on the Stripe article, and a blog post 1 and a blog post 2.
Churn rate
The churn rate is measured by the sum of churned subscribers in the past 30 days divided by the number of active subscribers as of 30 days ago, plus any new subscribers in those 30 days.
- First, we calculate the number of churned subscribers in the past 30 days. We can either use the event data or we can use the subscription data and see who canceled the subscription in the past 30 days.
- Second, we calculate the number of active or past-due subscriptions.
- Then we can calculate the churn rate based on these two numbers.
This calculation is based on the descriptions from this article.
Test account / test data
Stripe's test account doesn't provide any test data, therefore, you need to accumulate your own product data, or generate test data.
Customisations
There is a few additional features. With this pipeline you can also:
- Add other endpoints.
- Calculate other metrics based on data.
Custom endpoints
If you want to add another endpoint, just enumerate them in a tuple and pass to the function.
For example, you want to load data for the next endpoints: Price and Coupon. So run the function stripe_source()
with the argument endpoints=("Price", "Coupon")
:
pipeline = dlt.pipeline(
pipeline_name="stripe_analytics",
destination="duckdb",
dataset_name="stripe_data",
)
source = stripe_source(endpoints=("Price", "Coupon"))
load_info = pipeline.run(source)
print(load_info)
Full list of the Stripe API endpoints you can find here.
Most popular endpoints:
- Subscription
- Account
- Coupon
- Customer
- PaymentMethod
- Product
- Price
Possible Incremental endpoints:
- Event
- BalanceTransaction
- Invoice