Git Product home page Git Product logo

top_revenue_film_genres_via_sakila's Introduction

Sakila Queries:

Goal:
A set of queries that will demonstrate the capabilities of
mySQL for retrieving valuable information from a set of 
multiple, interrelated data sources in a scalable and 
reproducible way for Sakila.

Rationale:
Why This?
To quote Matt Velloso, technical advisor to the CEO
at Microsoft: "Half of the time when companies say 
they need "AI" what they really need is a SELECT clause with 
GROUP BY." 
Why Me?
The data executive is best equipped to create a reproducible
and scalable system that provides key information directly to
executives, who are in the best position to take effective
action.
Why Now?
The database has recently become available, further delay
will cost significant opportunities to make use of the data.
Because minimal time is required to set up the reporting
system, delay will not enable effective pursuit of other
priorities.

 General Approach:
 1.  Demonstrate general queries
 2.  Demonstrate constrained queries
 3.  Demonstrate ability to update table architecture
 4.  Demonstrate ability to aggregate
 5.  Demonstrate ability to recover and reproduce schemas
 6.  Demonstrate queries from multiple sources
 7.  Demonstrate sub-queries 
 8.  Demonstrate production of a scalable tool to routinely
     and automatically generate a report of interest.
	 
Design notes:
The attached .sql file contains comments describing each
of the demonstrations.  The first seven are straightforward.
The final demonstration query combines data from several
tables to provide the top-five film genres by revenue.
This result is meant to be a production item.  Because the
current intent is to use it every day, it needs to be more
robust than normal.  In particular, as a ranked list, it
needs to explictly include the rank (handling ties correctly)
and contain "pretty" formatting of the dollar values.  
Although in most cases, these items could be implicit,
in this case we want to make sure than even an occasional 
mistake such as mis-reading a dollar amount or mis-counting
rows to generate a rank doesn't get "cut and pasted" into
an executive-level public presentation and create an issue.

The SQL query for the top-5 table (pseudocode) is as follows:

SELECT 
    rank AS 'Revenue Rank',
    film_cat AS 'Genre',
    CONCAT('$',FORMAT(revenue,2)) AS 'Revenue from Genre'
FROM
    ranking_table 
WHERE
    rank <= 5;

This gives nice column names and a proper format.  Rank
is a column that will be created using ranking_table.  
Ranking_table will handle the logic for ties and gaps.  
Using WHERE rather than LIMIT lets us display more than 5
values if there is a tie for rank #5.  

The ranking_table is as follows:

SELECT 
    film_cat,
    revenue,
    IF(revenue = @last_amt, @cur_rank:=@cur_rank,
		@cur_rank:=@seq) AS rank,
    @seq:=@seq + 1,
    @last_amt:=revenue
FROM
    ranked_revenue 
	
In this table we create five columns, the film_cat and
revenue to pass to the outer query, plus the final rank to
pass to the outer query, along with a sequence, and a 
place-holder to store temporary variables we need.  The
variable @seq just increments by 1 for each row and is 
otherwise untouched.  We use it to get "back on track"
with a gap when there's a tie.  The variable @last_amt
is just a tracker that repeats whatever is in revenue.  On
the next row, if there's a tie, the new revenue will equal it.
The column 'rank' just impliments the tie detector with an
IF function.  The IF function always sets the variable 
@cur_rank to the rank value.  @Cur_rank either is not increased
when the tie detector returns TRUE, else it defaults back to
@seq when the TIE detector is false.   

The selecting is done from a table called ranked_revenue, 
where all the appropriate joining, grouping, and sorting is
done.  The variables also will be iniitalized within this
table so that a single query does all the work.

The table ranked_revenue is generated by:

SELECT 
    category.`name` AS film_cat, SUM(amount) AS revenue
FROM
    payment
    JOIN rental ON (payment.rental_id = rental.rental_id)
    JOIN inventory ON (rental.inventory_id = inventory.inventory_id)
    JOIN film ON (inventory.film_id = film.film_id)
    JOIN film_category ON (film.film_id = film_category.film_id)
    JOIN category ON (film_category.category_id = category.category_id)
    JOIN (SELECT @cur_rank:=1, @seq:=1, @last_amt:=NULL) rank_init
GROUP BY category.`name`
ORDER BY SUM(amount) DESC

In this table, the columns film_cat and revenue are created
to pass along to the outer query, the joining of multiple 
tables is accomplished, as well as the appropriate grouping
and sorting.  A final cross JOIN is used with a set of variables
to initialize their values.  Given how many other JOINs 
we have going on between tables, plus the grouping, 
aggregation, and sorting, three more cross JOINs with
simple fixed values are not going to hurt the query time.

top_revenue_film_genres_via_sakila's People

Contributors

andrewguenthner avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.