Git Product home page Git Product logo

dsc-1-05-12-one-to-many-and-many-to-many-joins's Introduction

One-to-Many and Many-to-Many Joins

Introduction

Previously, you've learned about the typical case where one joins on a primary or foreign key. In this section, you'll explore other types of joins using One-to-Many and Many-to-many relationships!

Objectives

You will be able to:

  • Explain why Join Tables are needed in Many-to-Many relationships

One-to-Many and Many-to-Many relationships

We've looked at a couple kinds of different join statements: left joins and inner joins. Both of these refer to the way in which we would like to define our joins based on the tables and their shared information. Another perspective on this is the number of matches between the tables based on our defined links with the keywords on or using.

We've investigated the typical case where one joins on a primary or foreign key. For example, when we join on customerID or employeeID, this value should be unique to that table. As such, our joins have been very similar to using a dictionary to find additional information associated with that record. In cases where there are multiple entries, in either table, for the field you are joining on, you will similarly be given multiple rows in your resulting view, one for each of these entries.

For example, let's say you have another table 'restaurants' that has many columns including name, city, and rating. If you were to join this table with the offices table using the shared city column, you might get some unexpected behavior. That is, in the office table, there is only one office per city. However, because there is apt to be more then one restaurant for each of these cities in our second table, we will get unique combinations of Offices and Restaurants from our join. If there are 513 restaurants for Boston in our restaurant table and 1 office for Boston, our joined table will have each of these 513 rows, one for each restaurant along with the one office.

If we had 2 offices for Boston, and 513 restaurants, our join would have 1026 rows for boston; 513 for each restuarant along with the first office and 513 for each restaurant with the second office. Three offices in Boston would similarly produce 1539 rows; one for each unique combination of restaurants and offices. This is where you should be particularly careful of many to many joins as the resulting set size can explode drastically potentially consuming vast amounts of memory and other resources.

Connecting to the Database

import sqlite3
import pandas as pd
conn = sqlite3.connect('data.sqlite', detect_types=sqlite3.PARSE_COLNAMES)
cur = conn.cursor()

Checking Sizes of Resulting Joins...

The original tables...

cur.execute('select * from offices;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 7
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
officeCode city phone addressLine1 addressLine2 state country postalCode territory
0 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA
1 2 Boston +1 215 837 0825 1550 Court Place Suite 102 MA USA 02107 NA
2 3 NYC +1 212 555 3000 523 East 53rd Street apt. 5A NY USA 10022 NA
3 4 Paris +33 14 723 4404 43 Rue Jouffroy D'abbans France 75017 EMEA
4 5 Tokyo +81 33 224 5000 4-1 Kioicho Chiyoda-Ku Japan 102-8578 Japan
cur.execute('select * from employees;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
employeeNumber lastName firstName extension email officeCode reportsTo jobTitle
0 1002 Murphy Diane x5800 [email protected] 1 President
1 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales
2 1076 Firrelli Jeff x9273 [email protected] 1 1002 VP Marketing
3 1088 Patterson William x4871 [email protected] 6 1056 Sales Manager (APAC)
4 1102 Bondur Gerard x5408 [email protected] 4 1056 Sale Manager (EMEA)

A One-to-One Join...

cur.execute('select * from offices join employees using(officeCode);')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
officeCode city phone addressLine1 addressLine2 state country postalCode territory employeeNumber lastName firstName extension email reportsTo jobTitle
0 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA 1002 Murphy Diane x5800 [email protected] President
1 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA 1056 Patterson Mary x4611 [email protected] 1002 VP Sales
2 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA 1076 Firrelli Jeff x9273 [email protected] 1002 VP Marketing
3 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA 1143 Bow Anthony x5428 [email protected] 1056 Sales Manager (NA)
4 1 San Francisco +1 650 219 4782 100 Market Street Suite 300 CA USA 94080 NA 1165 Jennings Leslie x3291 [email protected] 1143 Sales Rep

A One-to-Many Join

Here we join products with product lines. There are only a few product lines that will be matched to each product. As a result, the product line descriptions will be repeated in our resulting view.

cur.execute('select * from products;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 110
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
productCode productName productLine productScale productVendor productDescription quantityInStock buyPrice MSRP
0 S10_1678 1969 Harley Davidson Ultimate Chopper Motorcycles 1:10 Min Lin Diecast This replica features working kickstand, front... 7933 48.81 95.70
1 S10_1949 1952 Alpine Renault 1300 Classic Cars 1:10 Classic Metal Creations Turnable front wheels; steering function; deta... 7305 98.58 214.30
2 S10_2016 1996 Moto Guzzi 1100i Motorcycles 1:10 Highway 66 Mini Classics Official Moto Guzzi logos and insignias, saddl... 6625 68.99 118.94
3 S10_4698 2003 Harley-Davidson Eagle Drag Bike Motorcycles 1:10 Red Start Diecast Model features, official Harley Davidson logos... 5582 91.02 193.66
4 S10_4757 1972 Alfa Romeo GTA Classic Cars 1:10 Motor City Art Classics Features include: Turnable front wheels; steer... 3252 85.68 136.00
cur.execute('select * from productlines;')
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 7
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
productLine textDescription htmlDescription image
0 Classic Cars Attention car enthusiasts: Make your wildest c...
1 Motorcycles Our motorcycles are state of the art replicas ...
2 Planes Unique, diecast airplane and helicopter replic...
3 Ships The perfect holiday or anniversary gift for ex...
4 Trains Model trains are a rewarding hobby for enthusi...
cur.execute("""select * from products
                      join productlines
                      using(productLine);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 110
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
productCode productName productLine productScale productVendor productDescription quantityInStock buyPrice MSRP textDescription htmlDescription image
0 S10_1678 1969 Harley Davidson Ultimate Chopper Motorcycles 1:10 Min Lin Diecast This replica features working kickstand, front... 7933 48.81 95.70 Our motorcycles are state of the art replicas ...
1 S10_1949 1952 Alpine Renault 1300 Classic Cars 1:10 Classic Metal Creations Turnable front wheels; steering function; deta... 7305 98.58 214.30 Attention car enthusiasts: Make your wildest c...
2 S10_2016 1996 Moto Guzzi 1100i Motorcycles 1:10 Highway 66 Mini Classics Official Moto Guzzi logos and insignias, saddl... 6625 68.99 118.94 Our motorcycles are state of the art replicas ...
3 S10_4698 2003 Harley-Davidson Eagle Drag Bike Motorcycles 1:10 Red Start Diecast Model features, official Harley Davidson logos... 5582 91.02 193.66 Our motorcycles are state of the art replicas ...
4 S10_4757 1972 Alfa Romeo GTA Classic Cars 1:10 Motor City Art Classics Features include: Turnable front wheels; steer... 3252 85.68 136.00 Attention car enthusiasts: Make your wildest c...

A Many-to-Many Join

If we join the employees and offices table, we will have a view with repeat cities listed. (Recall this was 23 rows, one for each employee. Joining this with the customer table on the cities column could cause us to have a huge number of rows, one for each employee and customer combination for a given city.) In this particular example, our results are limited as this mock database is much smaller then is apt to happen in practice. However, it is important to conceptualize the potential impact of ill conceived joins, as severe load can be put on the database causing slow execution time, and potentially even tying up database resources for other analysts who may be using the system.

cur.execute("""select * from employees
                        join offices
                        using(officeCode)
                        join customers
                        using(city);""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 45
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
employeeNumber lastName firstName extension email officeCode reportsTo jobTitle city phone ... contactLastName contactFirstName phone addressLine1 addressLine2 state postalCode country salesRepEmployeeNumber creditLimit
0 1002 Murphy Diane x5800 [email protected] 1 President San Francisco +1 650 219 4782 ... Murphy Julie 6505555787 5557 North Pendale Street CA 94217 USA 1165 64600.00
1 1002 Murphy Diane x5800 [email protected] 1 President San Francisco +1 650 219 4782 ... Brown Julie 6505551386 7734 Strong St. CA 94217 USA 1165 105000.00
2 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales San Francisco +1 650 219 4782 ... Murphy Julie 6505555787 5557 North Pendale Street CA 94217 USA 1165 64600.00
3 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales San Francisco +1 650 219 4782 ... Brown Julie 6505551386 7734 Strong St. CA 94217 USA 1165 105000.00
4 1076 Firrelli Jeff x9273 [email protected] 1 1002 VP Marketing San Francisco +1 650 219 4782 ... Murphy Julie 6505555787 5557 North Pendale Street CA 94217 USA 1165 64600.00

5 rows ร— 28 columns

cur.execute("""select * from employees;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 23
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
employeeNumber lastName firstName extension email officeCode reportsTo jobTitle
0 1002 Murphy Diane x5800 [email protected] 1 President
1 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales
2 1076 Firrelli Jeff x9273 [email protected] 1 1002 VP Marketing
3 1088 Patterson William x4871 [email protected] 6 1056 Sales Manager (APAC)
4 1102 Bondur Gerard x5408 [email protected] 4 1056 Sale Manager (EMEA)
cur.execute("""select * from customers;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print('Number of results:', len(df))
df.head()
Number of results: 122
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
customerNumber customerName contactLastName contactFirstName phone addressLine1 addressLine2 city state postalCode country salesRepEmployeeNumber creditLimit
0 103 Atelier graphique Schmitt Carine 40.32.2555 54, rue Royale Nantes 44000 France 1370 21000.00
1 112 Signal Gift Stores King Jean 7025551838 8489 Strong St. Las Vegas NV 83030 USA 1166 71800.00
2 114 Australian Collectors, Co. Ferguson Peter 03 9520 4555 636 St Kilda Road Level 3 Melbourne Victoria 3004 Australia 1611 117300.00
3 119 La Rochelle Gifts Labrune Janine 40.67.8555 67, rue des Cinquante Otages Nantes 44000 France 1370 118200.00
4 121 Baane Mini Imports Bergulfsen Jonas 07-98 9555 Erling Skakkes gate 78 Stavern 4110 Norway 1504 81700.00

Summary

In this section, you expanded your Join knowledge to One-to-Many and Many-to-many Joins!

dsc-1-05-12-one-to-many-and-many-to-many-joins's People

Contributors

loredirick avatar mathymitchell avatar sik-flow avatar tkoar avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsc-1-05-12-one-to-many-and-many-to-many-joins's Issues

One-to-one vs One-to-Many examples

I'm not sure if this is an issue or a lack of my understanding of SQL, but the example for one-to-one and one-to-many seem to be the same type of join.
In explaining the one-to-many joins it says,
"Here we join products with product lines. There are only a few product lines that will be matched to each product. As a result, the product line descriptions will be repeated in our resulting view."

But this applies to the previous example that is listed as a one-to-one join. I could write the exact same description of the that table.
"Here we are joining employees to offices. There are only a few offices that will be matched to each employee. As a result the office descriptions will be repeated in our result view."

Also the line "there are only a few product lines that will be matched to each product," Is slightly confusing. On my first read I thought it was saying that a product could be matched to multiple product lines. I think if you say something like, "Multiple products will map to the same product line." it might be clearer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.