humio / python-humio Goto Github PK

python humio adapter

License: Apache License 2.0

Python 100.00%

python-humio's Introduction

Humiolib

The humiolib library is a wrapper for Humio's web API, supporting easy interaction with Humio directly from Python. Full documentation for this repository can be found at https://python-humio.readthedocs.io/en/latest/readme.html.

Vision

The vision for humiolib is to create an opinionated wrapper around the Humio web API, supporting log ingestion and log queries. The project does not simply expose web endpoints as Python methods, but attempts to improve upon the usability experience of the API. In addition the project seeks to add non-intrusive quality of life features, so that users can focus on their primary goals during development.

Governance

This project is maintained by employees at Humio ApS. As a general rule, only employees at Humio can become maintainers and have commit privileges to this repository. Therefore, if you want to contribute to the project, which we very much encourage, you must first fork the repository. Maintainers will have the final say on accepting or rejecting pull requests. As a rule of thumb, pull requests will be accepted if:

The contribution fits with the project's vision

All automated tests have passed

The contribution is of a quality comparable to the rest of the project

The maintainers will attempt to react to issues and pull requests quickly, but their ability to do so can vary. If you haven't heard back from a maintainer within 7 days of creating an issue or making a pull request, please feel free to ping them on the relevant post.

The active maintainers involved with this project include:

Alexander Brandborg

Installation

The humiolib library has been published on PyPI, so you can use pip to install it: :

pip install humiolib

Usage

The examples below seek to get you going with humiolib. For further documentation have a look at the code itself.

HumioClient

The HumioClient class is used for general interaction with Humio. It is mainly used for performing queries, as well as managing different aspects of your Humio instance.

from humiolib.HumioClient import HumioClient

# Creating the client
client = HumioClient(
     base_url= "https://cloud.humio.com",
     repository= "sandbox",
     user_token="*****")

# Using a streaming query 
webStream = client.streaming_query("Login Attempt Failed", is_live=True)
for event in webStream:
    print(event)

# Using a queryjob 
queryjob = client.create_queryjob("Login Attempt Failed", is_live=True)
poll_result = queryjob.poll()
for event in poll_result.events:
    print(event)

# With a static queryjob you can poll it iterativly until it has been exhausted
queryjob = client.create_queryjob("Login Attempt Failed", is_live=False)
for poll_result in queryjob.poll_until_done():
    print(poll_result.metadata)
    for event in poll_result.events:
            print(event)

HumioIngestClient

The HumioIngestClient class is used for ingesting data into Humio. While the HumioClient can also be used for ingesting data, this is mainly meant for debugging.

from humiolib.HumioClient import HumioIngestClient

# Creating the client
client = HumioIngestClient(
   base_url= "https://cloud.humio.com",
   ingest_token="*****")

# Ingesting Unstructured Data
messages = [
      "192.168.1.21 - user1 [02/Nov/2017:13:48:26 +0000] \"POST /humio/api/v1/ingest/elastic-bulk HTTP/1.1\" 200 0 \"-\" \"useragent\" 0.015 664 0.015",
      "192.168.1..21 - user2 [02/Nov/2017:13:49:09 +0000] \"POST /humio/api/v1/ingest/elastic-bulk HTTP/1.1\" 200 0 \"-\" \"useragent\" 0.013 565 0.013"
   ]

client.ingest_messages(messages) 

# Ingesting Structured Data
structured_data = [
      {
         "tags": {"host": "server1" },
         "events": [
               {
                  "timestamp": "2020-03-23T00:00:00+00:00",
                  "attributes": {"key1": "value1", "key2": "value2"}      
               }
         ]
      }
   ]

client.ingest_json_data(structured_data)

python-humio's People

Contributors

Stargazers

Watchers

Forkers

samgdf refractionpoint gshubham55 asch-id asch513 hmpf vishalkuo lukedwest swefraser molatif-dev

python-humio's Issues

Bug: QueryJob cant be used to fetch all filter query results.

The code checks hasMoreEvents, but simply queries the same queryJob, so the same events are returned.

Need to implement this so that users can loop filter query results from a queryJob, perhaps even make it possible to adjust the amount of events the queryJob returns at each poll.

Possibility to specify not to validate SSL certificates

Describe the solution you'd like
A way to specify the Humio clients to not validate SSL certificates. This is very useful for local development.

pytest should not be an "install_requires" dependency

Describe the bug

Having pytest as an install_requires dependency means that applications which use this library end up bundling pytest during packaging. pytest should be a dev dependency (possibly using extras_require?), to indicate that it's only required for development.

https://github.com/humio/python-humio/blob/master/setup.py#L76

Examples use dataspace vs repo

The examples on the front page use the "dataspace" parameter instead of the "repo" parameter which seems to be in use.

The other APIs ( Alerts, Actions...)

I am currently writing my own code for Actions and Alerts, which made me wonder whether the inclusion of these APIs is planned or not.

Pip support

Any intention of packing for pip?

Documentation Issues

Documentation Error & Missing Parameters
The create_queryJob documentation has some errors.

is_live is used twice. The second time should be timezone_offset
arguments is plural in the description but singular in the list.
start and end parameters do not specify units.

To Reproduce
Steps to reproduce the behavior:

Go to https://python-humio.readthedocs.io/en/latest/reference/humioclient.html
Scroll down to 'create_QueryJob'
See errors

Screenshots

Humio Ingest Client should support Hec method

https://library.humio.com/stable/docs/ingesting-data/data-shippers/hec/

Humiolib Python Upload Bug

Basic Information
Customer: Christopher Balles , Crowdstrike
Salesforce Case #: https://crowdstrike.lightning.force.com/lightning/r/Case/5006T0000206vvoQAA/view
Cloud or On-Prem? both

What problem(s) has/have been observed?
Upload a file to an internal/cloud humio server and ran into this error:

Steps to reproduce
Wrote a small python script to reproduce this bug
Reference : https://python-humio.readthedocs.io/en/latest/reference/humioclient.html
Install humiolib using command
pip install humiolib
`
from humiolib.HumioClient import HumioClient
from humiolib.HumioExceptions import HumioException

client = HumioClient(
base_url= "https://cloud.us.humio.com/",
repository= "MJ_Json",
user_token="API_Token")
client.upload_file('./test.csv')
`
Outcome : The file successfully uploaded , However the Json errors are returned.

Additional context

Customer Impact
Is it a blocker for the customer? Not a blocker
Why is it important? Customer developing a POC for a customer.
What are their expectations for resolution/getting an update? They are curious to know if this can be fixed /supported.
Reference : https://python-humio.readthedocs.io/_/downloads/en/latest/pdf/

Unable to Query Between Dates

Describe the bug
Attempting to query between dates the results are either empty lists or a query that returns default results from now.

To Reproduce
Sample Code:


#convert datetime object to epoch time
start_epoch = int(start.timestamp())
end_epoch   = int(end.timestamp())

query = """
        eventsize() 
        | groupBy([field1,field2,field3], function=[count(), sum(_eventSize)])
        | eval(gigabytes=_sum/1073741824) 
        | sort(_count,order=desc)
        | start({start})
        | end({end})
"""

# Replace the start and end values in the query with the epoch times.
query = query.replace("{start}",str(start_epoch)).replace("{end}",str(end_epoch))

# create and execute the query job
queryjob = client.create_queryjob(query,is_live=False)
for poll_result in queryjob.poll_until_done():
        query_results = poll_result.events

print(query_results)

Returns the results from a default query ignoring the start and end.

Removing the start and end parameters and putting in to thecreate_queryJob:
queryjob = client.create_queryjob(query,start=start_epoch,end=end_epoch,is_live=False)

And it returns no results.

Expected behavior
Results for a query between dates are returned.

Python 3.9.14
HumioLib 0.2.5

Poll in static query job doesn't propagate kwargs

Describe the bug
Currently, the poll() method in StaticQueryJob will not propagate **kwargs meaning that if you set a timeout in the function call, it never gets set in the webcaller

To Reproduce

query_job = client.create_queryjob(query)
query_job.poll(timeout=1) # can hang indefinitely

Expected behavior
the above snippet should raise an exception

HumioLib does not sanitize URLs in some cases

Describe the bug
HumioLib doesn't sanitize URLs and remove the trailing slash from them in some cases. This causes for weird errors such as the one below:
HumioHTTPException: ('HTTP method not allowed, supported methods: GET', 405)

To Reproduce
Steps to reproduce the behavior:

Initialize the HumioLib with a url to your cluster and a '/' at the end. For example https://myhumiocluster.com/
Attempt to run a query with this client.

Expected behavior
I would expect a more clear answer on the URL being incorrect. Currently it gives the above error which is not accurate to the problem

Desktop (please complete the following information):

MacOS

Polling Logic seems to look for all data

Describe the bug
According to the humio docs - polling is a single request that returns partial results from a query job.

The current behavior of the StaticQueryJob's poll is actually to continually poll until it receives the done message which is only actually sent when the job is completed. This causes issues since a request will take much longer than the actual timeout passed to this function.

that being said, the link that the client uses here is not what the API docs suggest (/dataspaces/{REPO} vs /repositories/{REPO}) which makes me think we might be using a legacy API, here?

Expected behavior
poll() should only make a request once. This method should actually be renamed to poll_until_completed or something