Bug deion When I select the query

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

query history gets a long time to fetch data about superset HOT 12 CLOSED

Habeeb556 commented on September 13, 2024

query history gets a long time to fetch data

from superset.

Comments (12)

dosubot commented on September 13, 2024 3

To address the issue of the query history tab taking a long time to retrieve logs, especially with over 30K rows, you can implement several optimizations:

Pagination and Lazy Loading:
- Ensure that the query history tab uses pagination and lazy loading to fetch data incrementally as the user scrolls. This is already implemented in Superset 4.0.2 using the useInView hook and pageIndex state to load more data when the user reaches the bottom of the page ^[1].
Indexes on the Query Table:
- Verify that the necessary indexes are present on the query table. The query table should have indexes on columns like results_key, user_id, changed_on, and sql_editor_id to optimize query performance ^[2]^[3]^[4].
Efficient Data Fetching:
- The LogDAO class in superset/daos/log.py includes a method get_recent_activity that uses pagination and limits data to one year, which can help in optimizing data retrieval ^[5].
Frontend Optimizations:
- The QueryHistory component in the frontend uses the useEditorQueriesQuery hook to fetch query data and combines it with the Redux store data. It also uses the useEffect hook to load the next page of queries when the bottom is reached ^[1].

By ensuring these optimizations are in place, you can significantly reduce the time it takes to retrieve logs in the query history tab.

To continue talking to Dosu, mention @dosu.

from superset.

dosubot commented on September 13, 2024 3

The query table needs the following indexes to optimize performance:

Index on the results_key column:

op.create_index(
    op.f("ix_query_results_key"), "query", ["results_key"], unique=False
)

Index on the combination of user_id and changed_on columns:

op.create_index(
    op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False
)

Index on the sql_editor_id column:

op.create_index(
    op.f("ix_query_sql_editor_id"), "query", ["sql_editor_id"], unique=False
)

These indexes should help improve the performance of queries on the query table ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

from superset.

dosubot commented on September 13, 2024 3

Yes, low memory could be affecting query log fetching. The process of fetching query results involves several steps, including executing the query, fetching data, and possibly storing results in a backend. If the system is low on memory, it could impact these operations, especially when handling large datasets or multiple concurrent queries. For example, the execute_sql_statement function fetches data and processes it, which could be memory-intensive. Additionally, the execute_sql_statements function handles multiple statements and stores results, which also requires sufficient memory.

Furthermore, the script docker/frontend-mem-nag.sh in the Superset repository checks for at least 3GB of free memory and issues a warning if the available memory is below this threshold. This indicates that the system's memory availability is a known concern and can affect the performance of various operations, including query log fetching ^[1]^[2].

To continue talking to Dosu, mention @dosu.

from superset.

dosubot commented on September 13, 2024 3

To prevent Superset's Gunicorn from consuming all the server memory and causing it to crash, you can configure Gunicorn with specific settings to manage memory usage more effectively. Here is an example configuration that you can use, based on the docker/run-server.sh script from the Superset repository:

gunicorn \
    --bind "${SUPERSET_BIND_ADDRESS:-0.0.0.0}:${SUPERSET_PORT:-8088}" \
    --access-logfile "${ACCESS_LOG_FILE:-"-"}" \
    --error-logfile "${ERROR_LOG_FILE:-"-"}" \
    --workers ${SERVER_WORKER_AMOUNT:-10} \
    --worker-class ${SERVER_WORKER_CLASS:-gevent} \
    --worker-connections ${WORKER_CONNECTIONS:-1000} \
    --threads ${SERVER_THREADS_AMOUNT:-20} \
    --log-level "${GUNICORN_LOGLEVEL:-info}" \
    --timeout ${GUNICORN_TIMEOUT:-120} \
    --keep-alive ${GUNICORN_KEEPALIVE:-2} \
    --max-requests ${WORKER_MAX_REQUESTS:-1000} \
    --max-requests-jitter ${WORKER_MAX_REQUESTS_JITTER:-50} \
    --limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \
    --limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \
    "${FLASK_APP}"

In this configuration:

--workers 10 sets the number of worker processes.
--worker-class gevent uses the gevent worker class for asynchronous workers.
--worker-connections 1000 sets the maximum number of simultaneous clients.
--max-requests 1000 ensures that each worker will be restarted after handling 1000 requests, which helps in freeing up memory.
--max-requests-jitter 50 adds a random jitter to the max requests setting to prevent all workers from restarting at the same time.

Additionally, you can configure the SUPERSET_WEBSERVER_TIMEOUT setting in the superset/config.py file to ensure that the web server timeout is properly managed. This setting should be lower than your load balancer or proxy timeout settings:

SUPERSET_WEBSERVER_TIMEOUT = int(timedelta(minutes=1).total_seconds())

These settings can help in managing memory usage and preventing crashes due to memory exhaustion ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

from superset.

dosubot commented on September 13, 2024 3

The Superset documentation or codebase does not provide specific configuration options for setting a global memory limit for Gunicorn. However, you can manage memory usage indirectly by configuring Gunicorn's worker settings, such as --max-requests and --max-requests-jitter, to help prevent memory leaks and excessive memory consumption.

For more direct memory management, you might need to use external tools or system-level configurations. For example, you can use cgroups (control groups) on Linux to limit the memory usage of the Gunicorn process. Here is an example of how you can use cgroups to limit memory:

Create a cgroup for Gunicorn:
```
sudo cgcreate -g memory:/gunicorn
```

Set a memory limit for the cgroup (e.g., 1GB):

echo 1G | sudo tee /sys/fs/cgroup/memory/gunicorn/memory.limit_in_bytes

Start Gunicorn within the cgroup:

sudo cgexec -g memory:gunicorn gunicorn \
    --bind "0.0.0.0:8088" \
    --access-logfile "-" \
    --error-logfile "-" \
    --workers 10 \
    --worker-class gevent \
    --worker-connections 1000 \
    --threads 20 \
    --log-level "info" \
    --timeout 120 \
    --keep-alive 2 \
    --max-requests 1000 \
    --max-requests-jitter 50 \
    --limit-request-line 0 \
    --limit-request-field_size 0 \
    "superset.app:create_app()"

This approach ensures that the Gunicorn process cannot exceed the specified memory limit, helping to prevent crashes due to memory exhaustion.

To continue talking to Dosu, mention @dosu.

from superset.

dosubot commented on September 13, 2024 2

You do not need an additional index specifically for user filtration on the query table. The existing migration script already creates an index on the user_id and changed_on columns:

op.create_index(
    op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False
)

This index should be sufficient for optimizing queries that filter by user ^[1].

To continue talking to Dosu, mention @dosu.

from superset.

Habeeb556 commented on September 13, 2024

@dosu what indexes is needed on the table query.

from superset.

Habeeb556 commented on September 13, 2024

@dosu one more question: Could low memory be affecting query log fetching, or is it unrelated?

from superset.

Habeeb556 commented on September 13, 2024

@dosu how can I prevent Superset's Gunicorn from consuming all the server memory, which is causing it to crash?

from superset.

Habeeb556 commented on September 13, 2024

The query table needs the following indexes to optimize performance:
Index on the results_key column:
op.create_index(
    op.f("ix_query_results_key"), "query", ["results_key"], unique=False
)
Index on the combination of user_id and changed_on columns:
op.create_index(
    op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False
)
Index on the sql_editor_id column:
op.create_index(
    op.f("ix_query_sql_editor_id"), "query", ["sql_editor_id"], unique=False
)
These indexes should help improve the performance of queries on the query table [1][2][3].

To continue talking to Dosu, mention @dosu.

@dosu need index also for user filtration.

from superset.

Habeeb556 commented on September 13, 2024

To prevent Superset's Gunicorn from consuming all the server memory and causing it to crash, you can configure Gunicorn with specific settings to manage memory usage more effectively. Here is an example configuration that you can use, based on the docker/run-server.sh script from the Superset repository:
gunicorn \
    --bind "${SUPERSET_BIND_ADDRESS:-0.0.0.0}:${SUPERSET_PORT:-8088}" \
    --access-logfile "${ACCESS_LOG_FILE:-"-"}" \
    --error-logfile "${ERROR_LOG_FILE:-"-"}" \
    --workers ${SERVER_WORKER_AMOUNT:-10} \
    --worker-class ${SERVER_WORKER_CLASS:-gevent} \
    --worker-connections ${WORKER_CONNECTIONS:-1000} \
    --threads ${SERVER_THREADS_AMOUNT:-20} \
    --log-level "${GUNICORN_LOGLEVEL:-info}" \
    --timeout ${GUNICORN_TIMEOUT:-120} \
    --keep-alive ${GUNICORN_KEEPALIVE:-2} \
    --max-requests ${WORKER_MAX_REQUESTS:-1000} \
    --max-requests-jitter ${WORKER_MAX_REQUESTS_JITTER:-50} \
    --limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \
    --limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \
    "${FLASK_APP}"
In this configuration:

--workers 10 sets the number of worker processes.

--worker-class gevent uses the gevent worker class for asynchronous workers.

--worker-connections 1000 sets the maximum number of simultaneous clients.

--max-requests 1000 ensures that each worker will be restarted after handling 1000 requests, which helps in freeing up memory.

--max-requests-jitter 50 adds a random jitter to the max requests setting to prevent all workers from restarting at the same time.

Additionally, you can configure the SUPERSET_WEBSERVER_TIMEOUT setting in the superset/config.py file to ensure that the web server timeout is properly managed. This setting should be lower than your load balancer or proxy timeout settings:
SUPERSET_WEBSERVER_TIMEOUT = int(timedelta(minutes=1).total_seconds())
These settings can help in managing memory usage and preventing crashes due to memory exhaustion [1][2][3].

To continue talking to Dosu, mention @dosu.

@dosu can I assigned specific memory overall cannot exceeded with no limits for workers.

from superset.

Habeeb556 commented on September 13, 2024

Awesome!

from superset.

query history gets a long time to fetch data about superset HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent