Git Product home page Git Product logo

Comments (3)

alamb avatar alamb commented on August 26, 2024

@samuelcolvin can you share the query plan for this query? Specifically what is the output of this query?

explain select span_name from records order by bit_length(attributes) desc limit 20

I would also not expect it to consume 20GB of memory

from arrow-datafusion.

samuelcolvin avatar samuelcolvin commented on August 26, 2024

Sorry for the delay, here we go:

logical_plan

 Projection: records_store.span_name
  Limit: skip=0, fetch=20
    Sort: bit_length(records_store.attributes) DESC NULLS FIRST, fetch=20
      TableScan: records_store projection=[span_name, attributes]

physical_plan

 ProjectionExec: expr=[span_name@0 as span_name]
  GlobalLimitExec: skip=0, fetch=20
    SortPreservingMergeExec: [bit_length(attributes@1) DESC], fetch=20
      SortExec: TopK(fetch=20), expr=[bit_length(attributes@1) DESC], preserve_partitioning=[true]
        ParquetExec: file_groups={14 groups: [[Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_2139d404-1e6c-4903-90f1-775727315226.parquet:0..4619991, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_400_0001_41744ee3-e741-43cf-ab17-6b986d785a58.parquet:0..4631041, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_91993b2d-9555-4d85-8dd9-71ad352a7066.parquet:0..16945773, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_401_0001_f611b930-13d2-4311-acf2-489585ca7e2a.parquet:0..16920185, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_402_0001_3f208d74-4b67-44ec-aa4a-1ed30d881b7f.parquet:0..18130069, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_403_0001_780473ad-7867-4bcf-b63d-a52460f466cf.parquet:16513148..17675405, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_404_0001_d11bf279-599e-4e2c-9a60-14d13045c8dd.parquet:0..16597007, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_405_0001_456e9ee4-35d7-4483-813e-73c29d8023d2.parquet:0..15207780, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_406_0001_5becfa29-66d8-44ff-b61c-b82987f3d060.parquet:0..16885514, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_407_0001_1a81e83c-a226-4f29-b00b-54f849eb46cd.parquet:0..15583994, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_408_0001_ec8d95f7-1464-4adf-a975-6857937592e1.parquet:12323655..15398417, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_409_0001_f27ecf09-e64b-4fc3-9cb9-cdd53b2200a6.parquet:0..15601539, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_410_0001_bd93a8e8-dd6f-48ae-8622-fd439f9d27f9.parquet:0..15515518, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_411_0001_5ec01d68-8a3a-4d21-b0df-3d507bea2d1c.parquet:0..15347993, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_412_0001_3611c025-d788-4d26-b77d-424e421fcd98.parquet:0..16151170, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_413_0001_17decbd1-13d0-4a01-9213-8fee2ba46b7b.parquet:12069225..16928575, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_414_0001_d7924295-f735-4b34-af7f-3975ad3f91b6.parquet:0..16952022, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_415_0001_7a7e0d3e-de3e-462b-9fb5-264277a2c74b.parquet:0..16826232, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_416_0001_da93e867-a912-454a-a758-c9e27dcf74df.parquet:0..17230023, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_417_0001_577d3fa8-ea7d-471f-a33a-0d65faa54902.parquet:0..17533201, ...], [Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_418_0001_b5f013f2-d3ed-4f98-bb9c-cb58b059b5fc.parquet:4359379..17560075, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_419_0001_22e4dd59-1639-4c86-980b-12745c86aef6.parquet:0..17509164, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_420_0001_24661f8e-bef4-4efd-84fb-55f531e8a495.parquet:0..18239445, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_421_0001_75401751-5740-4086-a2ec-0e9c23bdc9a7.parquet:0..18045516, Users/samuel/code/pydantic-data-platform/src/services/fusionfire/object_store/{org}/records/project_id={col}/day=2024-05-13/data_422_0001_2b55263e-f63b-402c-8024-9e85fc8c3d03.parquet:0..10765386], ...]}, projection=[span_name, attributes]

from arrow-datafusion.

alamb avatar alamb commented on August 26, 2024

🤔 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches

  • 20 is the k
  • 14 is the number of file groups (aka partitions)

🤔

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.