In actuality, we can’t store a full 4 KB of data per leaf node due to the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why 4 pages can search 500GB data about db_tutorial HOT 9 CLOSED

cstack commented on July 17, 2024

Why 4 pages can search 500GB data

from db_tutorial.

Comments (9)

cstack commented on July 17, 2024 1

When I say search here, I mean finding the row with a given primary key. Each node in the btree tells you which node to look at next for primary keys in a given range.

If you wanted to search for a row based on something other than the primary key, you would have to make an index (which I haven't covered yet in the tutorial). An index is represented as another btree that's sorted by a given column or columns instead of primary key, and each entry points to a row in the main btree instead of storing another copy of the row.

from db_tutorial.

SongZhao commented on July 17, 2024

Correct me if I'm wrong. You don't load all the node's content when you do searching, just load the node's index.

from db_tutorial.

gvzhang commented on July 17, 2024

I know what you mean about the primary key. But i still can't figuer out how to calculate the 500 data by 4 pages.

Could you list a calculation formula? @cstack

from db_tutorial.

cstack commented on July 17, 2024

@gvzhang Sure, I list a table of calculations in the article https://cstack.github.io/db_tutorial/parts/part10.html

I'm looking at the maximum number of leaf nodes that a tree could have with three layers of internal nodes. That's our branching factor raised to the third power, or 511^3, or 133,432,831 leaf nodes. If each leaf node has a size of 4 kilobytes, the size of all leaf nodes would be about 550 gigabytes.

If we weren't using a tree, we would instead have to scan through every leaf node to find a given primary key. So by using a tree, it's as if we searched through 550 GB of data, but in actuality we only searched through 4 pages (16 kilobytes).

from db_tutorial.

gvzhang commented on July 17, 2024

Is one page mean a internal node in part10?

from db_tutorial.

cstack commented on July 17, 2024

@gvzhang One node takes up one page. That's true for both internal nodes and leaf nodes.

from db_tutorial.

gvzhang commented on July 17, 2024

One page hold 511 child pointers to leaf node. So four pages can point to 511 * 4=2044 leaf nodes.
And each leaf node has a size of 4 kb, so 2044 * 4kb=8M. So four pages can search 8M data.

which part is wrong? thanks a lot.

from db_tutorial.

cch123 commented on July 17, 2024

@gvzhang

        [    ] 1
      /      \         \
  [    ] 2   [     ]  ... [   ]
    /            \    ....
 [    ] 3     [     ]
   /             /
[    ] 4  ...  [      ] .....     [   ]

not
[ ] 1[ ]2 [ ]3 [ ]4

from db_tutorial.

cstack commented on July 17, 2024

That's right. Each layer of the tree has 511 times as many nodes as the previous layer, and we traverse four layers by visiting four nodes.

from db_tutorial.

Why 4 pages can search 500GB data about db_tutorial HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent