andrewguenthner / top_revenue_film_genres_via_sakila Goto Github PK
View Code? Open in Web Editor NEWTests the sakila database in mysql, then builds a robust query to track which film genres generate the most revenue
License: MIT License
Tests the sakila database in mysql, then builds a robust query to track which film genres generate the most revenue
License: MIT License
Sakila Queries: Goal: A set of queries that will demonstrate the capabilities of mySQL for retrieving valuable information from a set of multiple, interrelated data sources in a scalable and reproducible way for Sakila. Rationale: Why This? To quote Matt Velloso, technical advisor to the CEO at Microsoft: "Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY." Why Me? The data executive is best equipped to create a reproducible and scalable system that provides key information directly to executives, who are in the best position to take effective action. Why Now? The database has recently become available, further delay will cost significant opportunities to make use of the data. Because minimal time is required to set up the reporting system, delay will not enable effective pursuit of other priorities. General Approach: 1. Demonstrate general queries 2. Demonstrate constrained queries 3. Demonstrate ability to update table architecture 4. Demonstrate ability to aggregate 5. Demonstrate ability to recover and reproduce schemas 6. Demonstrate queries from multiple sources 7. Demonstrate sub-queries 8. Demonstrate production of a scalable tool to routinely and automatically generate a report of interest. Design notes: The attached .sql file contains comments describing each of the demonstrations. The first seven are straightforward. The final demonstration query combines data from several tables to provide the top-five film genres by revenue. This result is meant to be a production item. Because the current intent is to use it every day, it needs to be more robust than normal. In particular, as a ranked list, it needs to explictly include the rank (handling ties correctly) and contain "pretty" formatting of the dollar values. Although in most cases, these items could be implicit, in this case we want to make sure than even an occasional mistake such as mis-reading a dollar amount or mis-counting rows to generate a rank doesn't get "cut and pasted" into an executive-level public presentation and create an issue. The SQL query for the top-5 table (pseudocode) is as follows: SELECT rank AS 'Revenue Rank', film_cat AS 'Genre', CONCAT('$',FORMAT(revenue,2)) AS 'Revenue from Genre' FROM ranking_table WHERE rank <= 5; This gives nice column names and a proper format. Rank is a column that will be created using ranking_table. Ranking_table will handle the logic for ties and gaps. Using WHERE rather than LIMIT lets us display more than 5 values if there is a tie for rank #5. The ranking_table is as follows: SELECT film_cat, revenue, IF(revenue = @last_amt, @cur_rank:=@cur_rank, @cur_rank:=@seq) AS rank, @seq:=@seq + 1, @last_amt:=revenue FROM ranked_revenue In this table we create five columns, the film_cat and revenue to pass to the outer query, plus the final rank to pass to the outer query, along with a sequence, and a place-holder to store temporary variables we need. The variable @seq just increments by 1 for each row and is otherwise untouched. We use it to get "back on track" with a gap when there's a tie. The variable @last_amt is just a tracker that repeats whatever is in revenue. On the next row, if there's a tie, the new revenue will equal it. The column 'rank' just impliments the tie detector with an IF function. The IF function always sets the variable @cur_rank to the rank value. @Cur_rank either is not increased when the tie detector returns TRUE, else it defaults back to @seq when the TIE detector is false. The selecting is done from a table called ranked_revenue, where all the appropriate joining, grouping, and sorting is done. The variables also will be iniitalized within this table so that a single query does all the work. The table ranked_revenue is generated by: SELECT category.`name` AS film_cat, SUM(amount) AS revenue FROM payment JOIN rental ON (payment.rental_id = rental.rental_id) JOIN inventory ON (rental.inventory_id = inventory.inventory_id) JOIN film ON (inventory.film_id = film.film_id) JOIN film_category ON (film.film_id = film_category.film_id) JOIN category ON (film_category.category_id = category.category_id) JOIN (SELECT @cur_rank:=1, @seq:=1, @last_amt:=NULL) rank_init GROUP BY category.`name` ORDER BY SUM(amount) DESC In this table, the columns film_cat and revenue are created to pass along to the outer query, the joining of multiple tables is accomplished, as well as the appropriate grouping and sorting. A final cross JOIN is used with a set of variables to initialize their values. Given how many other JOINs we have going on between tables, plus the grouping, aggregation, and sorting, three more cross JOINs with simple fixed values are not going to hurt the query time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.