Git Product home page Git Product logo

Comments (4)

eugene-nikolaev avatar eugene-nikolaev commented on June 5, 2024

This is how I patched it to work, but it seems far from best practices and I haven't much context to provide a good fix.
Should there be some helper extracted?

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))

def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))


    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str.lower)

from redash.

eugene-nikolaev avatar eugene-nikolaev commented on June 5, 2024

NOTE: my migration makes different hashes if upgrade->revert. Looking into.

from redash.

eugene-nikolaev avatar eugene-nikolaev commented on June 5, 2024

I have tight time so ended up with just making the migration irreversible 😬
Turns out there is a hassle with Snowflake queries and I don't want to mess with that at the moment.

def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

from redash.

eugene-nikolaev avatar eugene-nikolaev commented on June 5, 2024

I've encountered issues while migrating real production data - it was failing on queries which had no statements (e.g. empty or everything commented out), so hacked it that way:

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json
import sqlparse

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None

def is_empty_statement(stmt):
    # copy statement object. `copy.deepcopy` fails to do this, so just re-parse it
    st = sqlparse.engine.FilterStack()
    st.stmtprocess.append(sqlparse.filters.StripCommentsFilter())
    stmt = next(st.run(str(stmt)), None)
    if stmt is None:
        return True
    return str(stmt).strip() == ""


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        if is_empty_statement(record.query):
            continue
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))


def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

😬

from redash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.