Git Product home page Git Product logo

sqlparse-python's Introduction

SQLblood-relationship

This project is based on sqlparse conduct experiments. SQL blood relationship analysis tool based on Python sqlparse The research on SQL parsing and blood tracking can almost come to an end now. Since August 22, when I wrote the article "Parsing the syntax, morphology, and compiler files of HiveSQL source code" to explain this article in detail, I have studied SQL syntax parsing intermittently. Today, I have finally made some achievements. Generally, such research projects are supported by data governance and data middle office services, which play a great role in data security. I have covered a lot of content in the previous article, so I won't mention it here:

We can see this article based on Python sqlparse's implementation of blood relationship tracking and parsing in SQL tables. Next, we will add the improvement of this function, that is, the implementation of blood relationship parsing in SQL fields. This is a function that must be completed for Hive blood relationship or MySQL. Of course, the implementation is also troublesome. Here we mainly talk about the ideas and implementation steps. 图片 图片

Implementation process

Format Output

def get_sqlstr(file_path):
    with open(file_path, encoding='utf-8') as file:
        content = file.read()
        str_sql = sqlparse.format(content, reindent=True, keyword_case='upper')
        str_sql = str_sql.strip(' \t\n;')
        indent_str = textwrap.indent(str_sql, "  ")
    return indent_str

Operation identification

This function must also be implemented. We need to understand what the SQL is mainly for. If INSERT or CREATE is inserted, blood relationship analysis is necessary. If SELECT is selected, simple SQL analysis is required. With the research results of sqlparse source code, we can call the corresponding functions:

# Get the main functions of the SQL
def get_main_functionsql(statment):
    return statment.get_type()

AST_tree

# 获得树形结构
def get_ASTTree(statment):
    return statment._pprint_tree()

Get Field Name

def get_Identifier_keywords_column(identifier):
    if len(identifier.tokens) == 1:
        if not isinstance(identifier.parent, Function):
            for i in range(1, len(table_names) + 1):
                if columns_rank == i:
                    a = identifier.tokens[0].value
                    column_names[columns_rank - 1].append(a)
                    break
        else:
            function_names.append(identifier.value)

    if len(identifier.tokens) == 5:
        if (identifier.tokens[0]._get_repr_name() == 'Name'):
            for i in range(1, len(table_names) + 1):
                if columns_rank == i:
                    a = identifier.tokens[0].value
                    column_names[columns_rank - 1].append(a)
                    break

    if len(identifier.tokens) == 7:
        alias_names.append(identifier.tokens[0].value)

Based on the above basic methods

基于Python-sqlparse的SQL字段与表血缘追踪解析实现

SQL解析和血缘追踪的研究现在差不多可以告一段落了,从8月22日写HiveSQL源码之语法词法编译文件解析一文详解 ,这篇文章以来便断断续续的对SQL语法解析研究,到了今天终于是有了一番成果。一般做此类研究的项目都是在数据治理和数据中台方面的服务作支撑,对于数据安全作用挺大的,多的内容我在上篇文章里面已经讲述了很多了,这里不再多提,基于下面测试的成果:基于Python-sqlparse的SQL表血缘追踪解析实现 ,大家可以看这篇文章,接下来是接着上篇内容补充一下该功能的完善,也就是实现SQL字段血缘的解析,这是做Hive血缘或者mysql必须完成的功能,当然实现起来也是比较麻烦的。这里主要讲一下思路和实现的步骤。 图片

实现过程

格式化输出

算是格式化清洗所有的SQL了:

def get_sqlstr(file_path):
    with open(file_path, encoding='utf-8') as file:
        content = file.read()
        str_sql = sqlparse.format(content, reindent=True, keyword_case='upper')
        str_sql = str_sql.strip(' \t\n;')
        indent_str = textwrap.indent(str_sql, "  ")
    return indent_str

操作识别

该功能也是必须要实现的功能,我们需要明白这个SQL主要是干什么事情的。如果是插入INSERT或者是CREATE就有血缘分析的必要,如果是SELECT的话那么做简单的SQL解析即可。有了研究sqlparse源码的成果我们调用相应的函数即可:

# 获得该SQL主要功能函数
def get_main_functionsql(statment):
    return statment.get_type()

树形解构

# 获得树形结构
def get_ASTTree(statment):
    return statment._pprint_tree()

获取字段

def get_Identifier_keywords_column(identifier):
    if len(identifier.tokens) == 1:
        if not isinstance(identifier.parent, Function):
            for i in range(1, len(table_names) + 1):
                if columns_rank == i:
                    a = identifier.tokens[0].value
                    column_names[columns_rank - 1].append(a)
                    break
        else:
            function_names.append(identifier.value)

    if len(identifier.tokens) == 5:
        if (identifier.tokens[0]._get_repr_name() == 'Name'):
            for i in range(1, len(table_names) + 1):
                if columns_rank == i:
                    a = identifier.tokens[0].value
                    column_names[columns_rank - 1].append(a)
                    break

    if len(identifier.tokens) == 7:
        alias_names.append(identifier.tokens[0].value)

sqlparse-python's People

Contributors

fanstuck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.