deeplcom / sql-mock Goto Github PK

View Code? Open in Web Editor NEW

33.0 33.0 5.0 3.54 MB

A Python library to test your SQL models using mocked input data

License: MIT License

Python 98.31% Batchfile 0.76% Makefile 0.93%

python-testing sql sql-testing testing-tools

sql-mock's People

Contributors

Stargazers

Watchers

Forkers

dev-yoko ravi77o mcnick tomburdge-effem

sql-mock's Issues

ARRAY JOIN operations are not recognised in Clickhouse queries

Consider the following example:

import os

import pytest
from sql_mock.clickhouse import column_mocks as col
from sql_mock.clickhouse.table_mocks import ClickHouseTableMock
from sql_mock.table_mocks import table_meta

QUERY = """SELECT
    sum(1) AS impressions,
    city,
    browser
FROM
(
    SELECT
        ['Istanbul', 'Berlin', 'Bobruisk'] AS cities,
        ['Firefox', 'Chrome', 'Chrome'] AS browsers
)
ARRAY JOIN
    cities AS city,
    browsers AS browser
GROUP BY
    city,
    browser
"""


@table_meta(query=QUERY)
class QueryMock(ClickHouseTableMock):
    impressions = col.Int(0)
    city = col.String("Istanbul")
    browser = col.String("Firefox")


@pytest.fixture(autouse=True)
def clickhouse_credentials() -> None:
    os.environ["SQL_MOCK_CLICKHOUSE_HOST"] = "127.0.0.1"
    os.environ["SQL_MOCK_CLICKHOUSE_PORT"] = "8123"
    os.environ["SQL_MOCK_CLICKHOUSE_USER"] = ""
    os.environ["SQL_MOCK_CLICKHOUSE_PASSWORD"] = ""


def test_query() -> None:
    expected = [
        {
            "impressions": 1,
            "city": "Istanbul",
            "browser": "Firefox",
        },
        {
            "impressions": 1,
            "city": "Berlin",
            "browser": "Chrome",
        },
        {
            "impressions": 1,
            "city": "Bobruisk",
            "browser": "Chrome",
        },
    ]
    result = QueryMock.from_mocks(input_data=[])
    result.assert_equal(expected)

When testing this, sql-mock will fail the test with the following error:

FAILED tests/test_example.py::test_query - sql_mock.exceptions.ValidationError: You need to provide the following input mocks to run your query: ['browsers', 'cities']

The arguments to the ARRAY JOIN operation should not need to be mocked as they are columns in the sub-query.

dbt project in subdirectory

Problem

The dbt query_paths in the manifest file are only provided relative to the dbt project root. In the case of a repository where the dbt project is in a subdirectory using only the query_path to retrieve the query does not work in dbt.dbt_model_meta. In this case the path to the dbt project root has to be prepended to the query_path.

Example

Consider the following structure for a repository

repository
├── other_folders
├── transform
│   ├── models
│   │   └── model_a.sql
│   ├── target
│   │   ├── compiled
│   │   │   └── <project_name>
│   │   │   │   └── models
│   │   │   │   │   └── model_a.sql
│   │   └── manifest.json
│   └── dbt_project.yml
└── other_files

Then query_path for model_a as given in the manifest file is target/compiled/<project_name>/models/model_a.sql however to be specified from the repository root it should be transform/target/compiled/<project_name>/models/model_a.sql

Idea

One could e.g. pass dbt_project.yml instead of/additionally to manifest.json to SQLMockConfig:

The manifest path could be extracted from dbt_project.yml.target-path or directly passed in case a different one was passed to the --target-path parameter on the dbt run
Combining the directory of dbt_project.yml and the query_path provides the correct path from the repository root

Idea: not having to define input mocks for downstream tables when using assert_cte_equal

When I run assert_cte_equal I have to define all downstream tables even if they are not needed. I think this is just an issue in the checks not an actual runtime issue, so I propose a change in validate_all_input_mocks_for_query_provided so it also ignores downstream CTEs when running assert_cte_equal.

This would be a step forward in treating each CTE as a unit and being able to isolate every CTE and define the bare minimum to test it.

Error for BigQueryMockTable -> "E KeyError: 'relation_name'"

I'm using V1 of pydantic, thus I've installed the package using this branch: https://github.com/DeepLcom/sql-mock/tree/pydantic-v1/docs

Added the config in my conftest.py: SQLMockConfig.set_dbt_manifest_path("../dbt/parkdepot/target/manifest.json")
Created two mocks:
1- A source:

@dbt_source_meta(source_name="postgres", table_name="car_owner")
class PostgresCarOwner(BigQueryMockTable):
    id = col.Int(default=1)
    person_id = col.Int(default=1)
    company_id = col.Int(default=1)
    request_id = col.Int(default=1)

    plate = col.String(default="test1")

2- A model:

@dbt_model_meta(model_name="stg_postgres__car_owner")
class StgPostgresCarOwner(BigQueryMockTable):
    owner_id = col.Int(default=1)
    person_id = col.Int(default=1)
    company_id = col.Int(default=1)
    request_id = col.Int(default=1)

    plate = col.String(default="test1")

And then when testing the model, like this:

class TestStgPostgresCarOwner:
    def test_stg_postgres__car_owner(self):
        source_data = [{"id": 1}, {"id": 2}, {"id": 3}]
        expected_output = [{"owner_id": 1}, {"owner_id": 2}, {"owner_id": 3}]

        source = PostgresCarOwner.from_dicts(source_data)
        final = StgPostgresCarOwner.from_mocks(input_data=[source])

        final.assert_equal(expected_output)

I get the error while importing the tests:

parkdepot/.venv/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
parkdepot/repository_parkdepot_tests/dbt/staging/postgres/test_stg_postgres__car_owner.py:2: in <module>
    from repository_parkdepot_tests.dbt.staging.postgres.mocked_stg_postgres__car_owner import StgPostgresCarOwner
parkdepot/repository_parkdepot_tests/dbt/staging/postgres/mocked_stg_postgres__car_owner.py:7: in <module>
    class StgPostgresCarOwner(BigQueryMockTable):
parkdepot/.venv/lib/python3.9/site-packages/sql_mock/dbt.py:101: in decorator
    dbt_meta = _get_model_metadata_from_dbt_manifest(manifest_path=path, model_name=model_name)
parkdepot/.venv/lib/python3.9/site-packages/sql_mock/dbt.py:31: in _get_model_metadata_from_dbt_manifest
    "table_ref": node["relation_name"],
E   KeyError: 'relation_name'

Versions:

dbt-bigquery: 1.5.1
python: 3.9.13

Create a column type for StringArray

Context:
My team are using Clickhouse, and integrate sql-mock into it.

Problem:
While we are able to support most column types, I am currently writing a custom object to handle StringArrays.

class StringArray(col.ClickhouseColumnMock):
    dtype = "Array(String)"

    def to_sql(self, column_name: str, value=NO_INPUT) -> str:
        # Note: Compare against NO_INPUT instead of checking for None since None could be a valid input for nullable columns
        val = value if not isinstance(value, NoInput) else self.default
        # In case the val is None, we convert it to NULL
        if val is None:
            return f"cast(NULL AS {self.dtype}) AS {column_name}"
        return f"cast({val} AS {self.dtype}) AS {column_name}"

Proposal:
Would it be possible to write this directly into the sql-mock object, similar to how I can call col.String or col.Decimal. Would be great if I could call col.StringArray

v0.6.1 Can't parse Clickhouse `argMaxIf` function

Since v0.6.1 SQLMock is unable to parse Clickhouse's argMaxIf function.

Consider the following example:

from datetime import datetime

from sql_mock.clickhouse import column_mocks as col
from sql_mock.clickhouse.table_mocks import ClickHouseTableMock
from sql_mock.table_mocks import table_meta

query = """SELECT
    user_id,
    count() AS num_sessions,
    countIf(valid = 1) AS num_valid_sessions,
    argMaxIf(in_trial, dt, isNotNull(in_trial)) AS in_trial
FROM sessions
GROUP BY user_id
"""


@table_meta(table_ref="sessions")
class SessionsMock(ClickHouseTableMock):
    dt = col.Datetime(default=datetime(2024, 1, 1 ,10, 30))
    valid = col.Boolean(default=True)
    user_id = col.String(default="foo")
    in_trial = col.Int(default=0, nullable=True)


@table_meta(query=query)
class ResultMock(ClickHouseTableMock):
    user_id = col.String(default="foo")
    num_sessions = col.Int(default=0)
    num_valid_sessions = col.Int(default=0)
    in_trial = col.Int(default=0)


def test_example() -> None:
    sessions_mock = SessionsMock.from_dicts(
        [
            dict(
                dt=datetime(2024, 1, 1, 10),
                valid=1,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 2, 10),
                valid=0,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 3, 10),
                valid=1,
                user_id="foo",
                in_trial=1,
            ),
            dict(
                dt=datetime(2024, 1, 4, 10),
                valid=1,
                user_id="foo",
                in_trial=None,
            ),
            dict(
                dt=datetime(2024, 1, 5, 10),
                valid=1,
                user_id="foo",
                in_trial=0,
            ),
        ]
    )

    result = ResultMock.from_mocks(input_data=[sessions_mock])

    expected = [
        dict(
            user_id="foo",
            num_sessions=5,
            num_valid_sessions=4,
            in_trial=0,
        )
    ]

    result.assert_equal(expected)

Running test_example results in the following exception:

self = <sqlglot.generator.Generator object at 0x107920d60>
expression = ('argMax', 'If'), key = None, comment = True

    def sql(
        self,
        expression: t.Optional[str | exp.Expression],
        key: t.Optional[str] = None,
        comment: bool = True,
    ) -> str:
        if not expression:
            return ""
    
        if isinstance(expression, str):
            return expression
    
        if key:
            value = expression.args.get(key)
            if value:
                return self.sql(value)
            return ""
    
        transform = self.TRANSFORMS.get(expression.__class__)
    
        if callable(transform):
            sql = transform(self, expression)
        elif transform:
            sql = transform
        elif isinstance(expression, exp.Expression):
            exp_handler_name = f"{expression.key}_sql"
    
            if hasattr(self, exp_handler_name):
                sql = getattr(self, exp_handler_name)(expression)
            elif isinstance(expression, exp.Func):
                sql = self.function_fallback_sql(expression)
            elif isinstance(expression, exp.Property):
                sql = self.property_sql(expression)
            else:
                raise ValueError(f"Unsupported expression type {expression.__class__.__name__}")
        else:
>           raise ValueError(f"Expected an Expression. Received {type(expression)}: {expression}")
E           ValueError: Expected an Expression. Received <class 'tuple'>: ('argMax', 'If')

We do not see this error for other xIf functions (e.g. the countIf in the example query) or for functions without it If (e.g. plain argMax). My initial thought was a change in sqlglot, but the version specified in poetry.lock has not changed in this release.

Allowing more flexibility for developer assertions

When using the library the following I thought of the following change.

Problem:
Currently the assertion to check if the data is correct is within the table_mocks.py when for a Mock you run assert_equal. When my assertion is not equal I don't receive more information about why they don't match.

Proposal:
It would be better to do the following:

Separate the assertion from the generation of the query and data.

Currently assert_equal also has responsibility over running _generate_query and _get_results. I see benefits from separating these. We could have a method called generate_results so that we can access the query and result directly from the Mock object.

Separate the cleaning logic in _assert_result and move it generate_results this way this usage could look like this:

# assuming we've successfully ran TableMock.from_mocks() we currently do

TableMock.assert_equal(expected_dict)

# I propose the following which allows de the developer to use other assertion tools
res_dict = TableMock.generate_result()

query = TableMock._sql_mock_data.last_query # I can inspect the query even before it fails if I want to

unittest.TestCase.assertDictEqual(res_dict, expected_dict) # I do my own assertion with whatever library I chose

In this case the more verbose unittest assertion solves my problem.

Conclusion
In a way I think BaseTableMock does not need an assert method within the object, you can expose the data and let the developers write their own assertion statement.

Support assert_equal for CTEs

Problem

Currently, the library only allows to check for equality on the final query result. In case a query has a lot of CTEs, the "unit tests" turn into more end-to-end like tests in those cases.
We want to be able to also test parts of the query logic independently and check whether certain CTEs act like they should.

sorting in _assert_equal not working for mixed None values

Problem

When calling BaseMockTable.assert_equal with expected data that contains both None values and not-None values for the same key an Exception can be raised. It's caused by sorting of the list when ignore_order=True, which isn't possible with different data types.

.../sql_mock/table_mocks.py:315: in assert_equal
    self._assert_equal(
.../sql_mock/table_mocks.py:259: in _assert_equal
>           expected = sorted(expected, key=lambda d: sorted(d.items()))
E           TypeError: '<' not supported between instances of 'str' and 'NoneType'
.../sql_mock/table_mocks.py:259: TypeError

The same thing will probably happen for the query result data as well in a similar case (I didn't test that yet).

Example

expected_output = [{'foo': 'bar'}, {'foo': None}]
mock_table.assert_equal(expected) # raises TypeError (see above)

Improve documentation

Problem

Right now, some functionality is not properly documented to the end user - e.g.:

assert_equal method:
- ignore_missing_keys functionality
- ignore_order functionality

We should add a proper documentation for this additional functionality

When a table is not expressed as alias the referencing isn't correct later

Using a simple query:

QUERY_SIMPLE = """
SELECT col1
FROM bee as b 
JOIN a ON a.col1 = b.col1
"""

With models:

@table_meta(table_ref="a")
class aMock(ClickHouseTableMock):
    col1 = col.String(default="1")

@table_meta(table_ref="bee")
class bMock(ClickHouseTableMock):
    col1 = col.String(default="1")

@table_meta(query=QUERY_SIMPLE)
class BugTableMock(ClickHouseTableMock):
    col1 = col.String(default="1")

and test:

    def test_working():
    input_table_mock_1 = aMock()
    input_table_mock_2 = bMock()

    res = BugTableMock.from_mocks(input_data=[
        input_table_mock_1, input_table_mock_2])

    expected = [{'col1': '1'}]
    res.assert_equal(expected)

Yields error ` Cannot get JOIN keys from JOIN ON section: 'a.col1 = col1', found keys: [Left keys: [] Right keys [] Condition columns: 'equals(a.col1, col1)', '']. (INVALID_JOIN_ON_EXPRESSION) (version 23.8.4.69 (official build))`

Internal sql_mock cte is:

   WITH sql_mock__a AS (
 SELECT
   CAST('1' AS String) AS col1
 FROM (
   SELECT
     1
 )
 WHERE
   FALSE
), sql_mock__bee AS (
 SELECT
   CAST('1' AS String) AS col1
 FROM (
   SELECT
     1
 )
 WHERE
   FALSE
), result AS (
 SELECT
   col1
 FROM sql_mock__bee AS b /* bee */
 JOIN sql_mock__a /* a */
   ON a.col1 = b.col1
)
SELECT
 CAST(col1 AS String) AS col1
FROM result

Because table a does not have a AS a this fails. Because the table has now been replaced with slq_mock__a but it is still being referenced as a in a.col1.

Dbt integration

Problem

Currently, you need to pass a query string to the from_mocks method. In the future, we should be able to abstract some of that work for dbt users and automatically pass a rendered query to the class.

Implementation ideas

Use the table_meta decorator to pass a path to the dbt model or its name / ref.

deeplcom / sql-mock Goto Github PK

sql-mock's People

Contributors

Stargazers

Watchers

Forkers

sql-mock's Issues

Problem

Example

Idea

Problem

Problem

Example

Problem

Problem

Implementation ideas

Recommend Projects

Recommend Topics

Recommend Org