EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
License: MIT License
Python 54.02%Jupyter Notebook 45.98%
ecommercetools's Introduction
EcommerceTools
EcommerceTools is a data science toolkit for those working in technical ecommerce, marketing science, and technical seo and includes a wide range of features to aid analysis and model building. The package is written in Python and is designed to be used with Pandas and works within a Jupyter notebook environment or in standalone Python projects.
Installation
You can install EcommerceTools and its dependencies via PyPi by entering pip3 install ecommercetools in your terminal, or !pip3 install ecommercetools within a Jupyter notebook cell.
Transactions
Load sample transaction items data
If you want to get started with the transactions, products, and customers features, you can use the load_sample_data() function to load a set of real world data. This imports the transaction items from widely-used Online Retail dataset and reformats it ready for use by EcommerceTools.
The utilities module includes a range of tools that allow you to format data, so it can be used within other EcommerceTools functions. The load_data() function is used to create a Pandas dataframe of formatted transactional item data. When loading your transaction items data, all you need to do is define the column mappings, and the function will reformat the dataframe accordingly.
The get_transactions() function takes the formatted Pandas dataframe of transaction items and returns a Pandas dataframe of aggregated transaction data, which includes features identifying the order number.
This is an extension of the regular Recency, Frequency, Monetary value (RFM) model that includes an additional parameter "H" for heterogeneity. This shows the number of unique SKUs purchased by each customer. While typically unassociated with targeting, this value can be very useful in identifying which customers should probably be buying a broader mix of products than they currently are, as well as spotting those who may have stopped buying certain items.
EcommerceTools allows you to predict the AOV, Customer Lifetime Value (CLV) and expected number of orders via the Gamma-Gamma and BG/NBD models from the excellent Lifetimes package. By passing the dataframe of transactions from get_transactions() to the get_customer_predictions() function, EcommerceTools will fit the BG/NBD and Gamma-Gamma models and predict the AOV, order quantity, and CLV for each customer in the defined number of future days after the end of the observation period.
fromecommercetoolsimportadvertisingtext="Fly Reels from {Orvis|Loop|Sage|Airflo|Nautilus} for {trout|salmon|grayling|pike}"spin=advertising.generate_spintax(text, single=False)
spin
['Fly Reels from Orvis for trout',
'Fly Reels from Orvis for salmon',
'Fly Reels from Orvis for grayling',
'Fly Reels from Orvis for pike',
'Fly Reels from Loop for trout',
'Fly Reels from Loop for salmon',
'Fly Reels from Loop for grayling',
'Fly Reels from Loop for pike',
'Fly Reels from Sage for trout',
'Fly Reels from Sage for salmon',
'Fly Reels from Sage for grayling',
'Fly Reels from Sage for pike',
'Fly Reels from Airflo for trout',
'Fly Reels from Airflo for salmon',
'Fly Reels from Airflo for grayling',
'Fly Reels from Airflo for pike',
'Fly Reels from Nautilus for trout',
'Fly Reels from Nautilus for salmon',
'Fly Reels from Nautilus for grayling',
'Fly Reels from Nautilus for pike']
The get_sitemaps() function takes the location of a robots.txt file (always stored at the root of a domain), and returns the URLs of any XML sitemaps listed within.
The get_dataframe() function allows you to download the URLs in an XML sitemap to a Pandas dataframe. If the sitemap contains child sitemaps, each of these will be retrieved. You can save the Pandas dataframe to CSV in the usual way.
The get_core_web_vitals() function retrieves the Core Web Vitals metrics for a list of sites from the Google PageSpeed Insights API and returns results in a Pandas dataframe. The function requires a a Google PageSpeed Insights API key.
The get_knowledge_graph() function returns the Google Knowledge Graph data for a given search term. This requires the use of a Google Knowledge Graph API key. By default, the function returns output in a Pandas dataframe, but you can pass the output="json" argument if you wish to receive the JSON data back.
The query_google_search_console() function runs a search query on the Google Search Console API and returns data in a Pandas dataframe. This function requires a JSON client secrets key with access to the Google Search Console API.
The get_indexed_pages() function uses the "site:" prefix to search Google for the number of pages "indexed". This is very approximate and may not be a perfect representation, but it's usually a good guide of site "size" in the absence of other data.
7. Get keyword suggestions from Google Autocomplete
The google_autocomplete() function returns a set of keyword suggestions from Google Autocomplete. The include_expanded=True argument allows you to expand the number of suggestions shown by appending prefixes and suffixes to the search terms.
The get_serps() function returns a Pandas dataframe containing the Google search engine results for a given search term. Note that this function is not suitable for large-scale scraping and currently includes no features to prevent it from being blocked.
The get_summaries() function of the nlp module takes a Pandas dataframe containing text and returns a machine-generated summary of the content using a Huggingface Transformers pipeline via PyTorch. To use this feature, first load your Pandas dataframe and import the nlp module from ecommercetools.
Specify the name of the Pandas dataframe, the column containing the text you wish to summarise (i.e. product_description), and specify a column name in which to store the machine-generated summary. The min_length and max_length arguments control the number of words generated, while the do_sample argument controls whether the generated text is completely unique (do_sample=False) or extracted from the text (do_sample=True).
Since the model used for text summarisation is very large (1.2 GB plus), this function will take some time to complete. Once loaded, summaries are generated within a second or two per piece of text, so it is advisable to try smaller volumes of data initially.