Comments (12)
To address the issue of the query history
tab taking a long time to retrieve logs, especially with over 30K rows, you can implement several optimizations:
-
Pagination and Lazy Loading:
- Ensure that the
query history
tab uses pagination and lazy loading to fetch data incrementally as the user scrolls. This is already implemented in Superset 4.0.2 using theuseInView
hook andpageIndex
state to load more data when the user reaches the bottom of the page [1].
- Ensure that the
-
Indexes on the Query Table:
-
Efficient Data Fetching:
- The
LogDAO
class insuperset/daos/log.py
includes a methodget_recent_activity
that uses pagination and limits data to one year, which can help in optimizing data retrieval [5].
- The
-
Frontend Optimizations:
- The
QueryHistory
component in the frontend uses theuseEditorQueriesQuery
hook to fetch query data and combines it with the Redux store data. It also uses theuseEffect
hook to load the next page of queries when the bottom is reached [1].
- The
By ensuring these optimizations are in place, you can significantly reduce the time it takes to retrieve logs in the query history
tab.
To continue talking to Dosu, mention @dosu.
from superset.
The query
table needs the following indexes to optimize performance:
-
Index on the
results_key
column:op.create_index( op.f("ix_query_results_key"), "query", ["results_key"], unique=False )
-
Index on the combination of
user_id
andchanged_on
columns:op.create_index( op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False )
-
Index on the
sql_editor_id
column:op.create_index( op.f("ix_query_sql_editor_id"), "query", ["sql_editor_id"], unique=False )
These indexes should help improve the performance of queries on the query
table [1][2][3].
To continue talking to Dosu, mention @dosu.
from superset.
Yes, low memory could be affecting query log fetching. The process of fetching query results involves several steps, including executing the query, fetching data, and possibly storing results in a backend. If the system is low on memory, it could impact these operations, especially when handling large datasets or multiple concurrent queries. For example, the execute_sql_statement
function fetches data and processes it, which could be memory-intensive. Additionally, the execute_sql_statements
function handles multiple statements and stores results, which also requires sufficient memory.
Furthermore, the script docker/frontend-mem-nag.sh
in the Superset repository checks for at least 3GB of free memory and issues a warning if the available memory is below this threshold. This indicates that the system's memory availability is a known concern and can affect the performance of various operations, including query log fetching [1][2].
To continue talking to Dosu, mention @dosu.
from superset.
To prevent Superset's Gunicorn from consuming all the server memory and causing it to crash, you can configure Gunicorn with specific settings to manage memory usage more effectively. Here is an example configuration that you can use, based on the docker/run-server.sh
script from the Superset repository:
gunicorn \
--bind "${SUPERSET_BIND_ADDRESS:-0.0.0.0}:${SUPERSET_PORT:-8088}" \
--access-logfile "${ACCESS_LOG_FILE:-"-"}" \
--error-logfile "${ERROR_LOG_FILE:-"-"}" \
--workers ${SERVER_WORKER_AMOUNT:-10} \
--worker-class ${SERVER_WORKER_CLASS:-gevent} \
--worker-connections ${WORKER_CONNECTIONS:-1000} \
--threads ${SERVER_THREADS_AMOUNT:-20} \
--log-level "${GUNICORN_LOGLEVEL:-info}" \
--timeout ${GUNICORN_TIMEOUT:-120} \
--keep-alive ${GUNICORN_KEEPALIVE:-2} \
--max-requests ${WORKER_MAX_REQUESTS:-1000} \
--max-requests-jitter ${WORKER_MAX_REQUESTS_JITTER:-50} \
--limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \
--limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \
"${FLASK_APP}"
In this configuration:
--workers 10
sets the number of worker processes.--worker-class gevent
uses the gevent worker class for asynchronous workers.--worker-connections 1000
sets the maximum number of simultaneous clients.--max-requests 1000
ensures that each worker will be restarted after handling 1000 requests, which helps in freeing up memory.--max-requests-jitter 50
adds a random jitter to the max requests setting to prevent all workers from restarting at the same time.
Additionally, you can configure the SUPERSET_WEBSERVER_TIMEOUT
setting in the superset/config.py
file to ensure that the web server timeout is properly managed. This setting should be lower than your load balancer or proxy timeout settings:
SUPERSET_WEBSERVER_TIMEOUT = int(timedelta(minutes=1).total_seconds())
These settings can help in managing memory usage and preventing crashes due to memory exhaustion [1][2][3].
To continue talking to Dosu, mention @dosu.
from superset.
The Superset documentation or codebase does not provide specific configuration options for setting a global memory limit for Gunicorn. However, you can manage memory usage indirectly by configuring Gunicorn's worker settings, such as --max-requests
and --max-requests-jitter
, to help prevent memory leaks and excessive memory consumption.
For more direct memory management, you might need to use external tools or system-level configurations. For example, you can use cgroups (control groups) on Linux to limit the memory usage of the Gunicorn process. Here is an example of how you can use cgroups to limit memory:
-
Create a cgroup for Gunicorn:
sudo cgcreate -g memory:/gunicorn
-
Set a memory limit for the cgroup (e.g., 1GB):
echo 1G | sudo tee /sys/fs/cgroup/memory/gunicorn/memory.limit_in_bytes
-
Start Gunicorn within the cgroup:
sudo cgexec -g memory:gunicorn gunicorn \ --bind "0.0.0.0:8088" \ --access-logfile "-" \ --error-logfile "-" \ --workers 10 \ --worker-class gevent \ --worker-connections 1000 \ --threads 20 \ --log-level "info" \ --timeout 120 \ --keep-alive 2 \ --max-requests 1000 \ --max-requests-jitter 50 \ --limit-request-line 0 \ --limit-request-field_size 0 \ "superset.app:create_app()"
This approach ensures that the Gunicorn process cannot exceed the specified memory limit, helping to prevent crashes due to memory exhaustion.
To continue talking to Dosu, mention @dosu.
from superset.
You do not need an additional index specifically for user filtration on the query
table. The existing migration script already creates an index on the user_id
and changed_on
columns:
op.create_index(
op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False
)
This index should be sufficient for optimizing queries that filter by user [1].
To continue talking to Dosu, mention @dosu.
from superset.
@dosu what indexes is needed on the table query
.
from superset.
@dosu one more question: Could low memory be affecting query log fetching, or is it unrelated?
from superset.
@dosu how can I prevent Superset's Gunicorn from consuming all the server memory, which is causing it to crash?
from superset.
The
query
table needs the following indexes to optimize performance:
- Index on the
results_key
column:op.create_index( op.f("ix_query_results_key"), "query", ["results_key"], unique=False )- Index on the combination of
user_id
andchanged_on
columns:op.create_index( op.f("ti_user_id_changed_on"), "query", ["user_id", "changed_on"], unique=False )- Index on the
sql_editor_id
column:op.create_index( op.f("ix_query_sql_editor_id"), "query", ["sql_editor_id"], unique=False )These indexes should help improve the performance of queries on the
query
table [1][2][3].
@dosu need index also for user filtration.
from superset.
To prevent Superset's Gunicorn from consuming all the server memory and causing it to crash, you can configure Gunicorn with specific settings to manage memory usage more effectively. Here is an example configuration that you can use, based on the
docker/run-server.sh
script from the Superset repository:gunicorn \ --bind "${SUPERSET_BIND_ADDRESS:-0.0.0.0}:${SUPERSET_PORT:-8088}" \ --access-logfile "${ACCESS_LOG_FILE:-"-"}" \ --error-logfile "${ERROR_LOG_FILE:-"-"}" \ --workers ${SERVER_WORKER_AMOUNT:-10} \ --worker-class ${SERVER_WORKER_CLASS:-gevent} \ --worker-connections ${WORKER_CONNECTIONS:-1000} \ --threads ${SERVER_THREADS_AMOUNT:-20} \ --log-level "${GUNICORN_LOGLEVEL:-info}" \ --timeout ${GUNICORN_TIMEOUT:-120} \ --keep-alive ${GUNICORN_KEEPALIVE:-2} \ --max-requests ${WORKER_MAX_REQUESTS:-1000} \ --max-requests-jitter ${WORKER_MAX_REQUESTS_JITTER:-50} \ --limit-request-line ${SERVER_LIMIT_REQUEST_LINE:-0} \ --limit-request-field_size ${SERVER_LIMIT_REQUEST_FIELD_SIZE:-0} \ "${FLASK_APP}"In this configuration:
--workers 10
sets the number of worker processes.--worker-class gevent
uses the gevent worker class for asynchronous workers.--worker-connections 1000
sets the maximum number of simultaneous clients.--max-requests 1000
ensures that each worker will be restarted after handling 1000 requests, which helps in freeing up memory.--max-requests-jitter 50
adds a random jitter to the max requests setting to prevent all workers from restarting at the same time.Additionally, you can configure the
SUPERSET_WEBSERVER_TIMEOUT
setting in thesuperset/config.py
file to ensure that the web server timeout is properly managed. This setting should be lower than your load balancer or proxy timeout settings:SUPERSET_WEBSERVER_TIMEOUT = int(timedelta(minutes=1).total_seconds())These settings can help in managing memory usage and preventing crashes due to memory exhaustion [1][2][3].
@dosu can I assigned specific memory overall cannot exceeded with no limits for workers.
from superset.
Awesome!
from superset.
Related Issues (20)
- SuperSet Database connection is successfully established but not able to get the tables under a specific schema HOT 6
- Annotation formula on chart does not work for values above 1*10^15 (peta) HOT 1
- I'm getting ERROR:flask_migrate:Error while upgrading. HOT 6
- [4.1.0rc2] sqlalchemy InvalidRequestError: This nested transaction is inactive when trying to activate embedding on a dashboard HOT 1
- Superset 4.1 with Pinot not aggregating epoch_ms timestamps correctly HOT 1
- Bangladesh is missing in country map dropdown selection for Country Map Chart HOT 2
- Server Pagination Sort Only Sorts Loaded Rows - Table HOT 1
- Drill-by alert "Failed to load dimensions for drill by" using Gamma role HOT 2
- The end date on the filter panel of the dashboard is failing to capture the current data. HOT 1
- Data Zoom should be fixed scrolling on the dashboard HOT 1
- Bar Chart sorting does not appear to work HOT 2
- DB ERROR HOT 1
- Failed to resolve entry for package @superset-ui/switchboard HOT 5
- Pagination is not reset on charts after updating an applied filter HOT 2
- Sql lab + shillelagh does not return column values โโgreater than 64. HOT 2
- Temporary failure resolving 'deb.debian.org' HOT 2
- Superset collapses on inititalization of dashboard HOT 2
- [SIP] Save SQL lab query result and create sharable link HOT 1
- Dashboard owners is not updating correctly in the metadata tooltip
- Uploads are enabled even with Allow file uploads to database disabled
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superset.