Git Product home page Git Product logo

pykusto's Introduction

Introduction

pykusto is an advanced Python SDK for Azure Data Explorer (a.k.a. Kusto).
Started as a project in the 2019 Microsoft Hackathon.

PyPI version Downloads

Getting Started

Installation

Default installation:

pip install pykusto

With dependencies required for running the tests:

pip install pykusto[test]

Without dependencies which are not needed in PySpark:

pip install pykusto --global-option pyspark

Basic usage

from datetime import timedelta
from pykusto import PyKustoClient, Query

# Connect to cluster with AAD device authentication
# Databases, tables, and columns are auto-retrieved
client = PyKustoClient('https://help.kusto.windows.net')

# Show databases
print(tuple(client.get_databases_names()))

# Show tables in 'Samples' database
print(tuple(client.Samples.get_table_names()))

# Connect to 'StormEvents' table
t = client.Samples.StormEvents

# Build query
(
    Query(t)
        # Access columns using table variable 
        .project(t.StartTime, t.EndTime, t.EventType, t.Source)
        # Specify new column name using Python keyword argument   
        .extend(Duration=t.EndTime - t.StartTime)
        # Python types are implicitly converted to Kusto types
        .where(t.Duration > timedelta(hours=1))
        .take(5)
        # Output to pandas dataframe
        .to_dataframe()
) 

Retrying failed queries

# Turn on retrying for all queries
from pykusto import PyKustoClient, RetryConfig, Query

client = PyKustoClient(
    "https://help.kusto.windows.net",
    retry_config=RetryConfig()  # Use default retry config 
)

# Override retry config for specific query 
Query(client.Samples.StormEvents).take(5).to_dataframe(
    retry_config=RetryConfig(attempts=3, sleep_time=1, max_sleep_time=600, sleep_scale=2, jitter=1)
)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

pykusto's People

Contributors

adilefkowitz avatar amos-rimon avatar idanbrus avatar micky-amir avatar microsoft-github-policy-service[bot] avatar netanz avatar ofrikleinfeld avatar yihezkel avatar ymost avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pykusto's Issues

Allow rendering a query with a table name

Currently, the "render" function of Query has two options:

  1. If the Query contains a PyKusto table, it creates: " table | query" render
  2. If there is no PyKusto table, it emits the "table" part and creates only the "query".

However, if we use PyKusto to render the query only, we need another option - to produce " table | query" based on the table name, without providing a real Table.

Verify error messages in tests

Informative error messages are important, so when we have a test with assertRaises, we should verify not only the type of the exception, but also the contained error message

column_generator doesn't support columns with dot in the name


TypeError Traceback (most recent call last)
in
----> 1 col['asd']

c:\python\python37-32\lib\site-packages\pykusto\expressions.py in getitem(self, name)
576
577 def getitem(self, name: str) -> Column:
--> 578 return Column[name]
579
580

TypeError: 'type' object is not subscriptable

Refactor array, mapping dynamic types

  • Introduce DynamicExpression and AnyExpression
  • ArrayExpression and MappingExpression should return AnyExpression
  • Delete Column.getitem
  • Fix inheritance tree

iff type check not working

.extend(timeDelta=f.iff(col.day - col.day1 == 0, timedelta(last_seen_period), col.day - col.day1))

Subtraction between two AnyTypeColumn yields NumberExpression instead of TimespanExpression in this case.

multiplying numbers and columns doesn't always work

examples:
Query().where(f.to_double(100 * t.numberField) > 0.2).render()
returns: TypeError: unsupported operand type(s) for *: 'int' and 'AnyTypeColumn'

Query().where(100 * f.to_double(t.numberField) > 0.2).render()
returns: TypeError: unsupported operand type(s) for *: 'int' and 'NumberExpression'

Query().where(col.of(100) * f.to_double(t.numberField) > 0.2).render()
renders as : "| where (['100'] * (todouble(numberField))) > 0.2" which doesn't compile.

I didn't have any more ideas...

Create a Kusto DataFrame class

Create a Kusto DataFrame class which is backed by a Kusto table instead of in- memory (similar to the Spark DataFrame class)

Add support for ingestion_time()

In some tables, there is no "timestamp" field, but instead, we use the ingestion_time() function to filter the relevant period.

Working without it is unbearably slow and almost always fails as we can't filter first by timestamp.
That's why I think it's a relatively urgent feature.

iff function - type comparison does not compare correctly int and f.array_length({array})

f.iff function checks if both return options return the same type.
consider this code:

Query.extend(some_data_count=
                                  f.iff(some_condition,
                                        f.array_length(col.some_date),
                                        0)))

throws:
pykusto/functions.py", line 271, in iff raise TypeError("The second and third arguments must be of the same type") TypeError: The second and third arguments must be of the same type

Update readme

Now that pykusto is on PyPI, you can update the installation instruction in the main readme file to install there rather than from github.

Support "to typeof(<type_name>)" in mv-expand

By default the column generated by mv-expand is of dynamic type. Usually it has some underlying concrete type, and it is useful to convert it to that type.

Example query which is unsupported:
| mv-expand name to typeof(string)

Suggested pykusto syntax:
.mv_expand(col.name.to_type(TypeName.STRING))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.