Git Product home page Git Product logo

python-humio's Introduction

Humiolib

Documentation Status PyPI Package latest release Apache 2.0 License

The humiolib library is a wrapper for Humio's web API, supporting easy interaction with Humio directly from Python. Full documentation for this repository can be found at https://python-humio.readthedocs.io/en/latest/readme.html.

Vision

The vision for humiolib is to create an opinionated wrapper around the Humio web API, supporting log ingestion and log queries. The project does not simply expose web endpoints as Python methods, but attempts to improve upon the usability experience of the API. In addition the project seeks to add non-intrusive quality of life features, so that users can focus on their primary goals during development.

Governance

This project is maintained by employees at Humio ApS. As a general rule, only employees at Humio can become maintainers and have commit privileges to this repository. Therefore, if you want to contribute to the project, which we very much encourage, you must first fork the repository. Maintainers will have the final say on accepting or rejecting pull requests. As a rule of thumb, pull requests will be accepted if:

  • The contribution fits with the project's vision
  • All automated tests have passed
  • The contribution is of a quality comparable to the rest of the project

The maintainers will attempt to react to issues and pull requests quickly, but their ability to do so can vary. If you haven't heard back from a maintainer within 7 days of creating an issue or making a pull request, please feel free to ping them on the relevant post.

The active maintainers involved with this project include:

Installation

The humiolib library has been published on PyPI, so you can use pip to install it: :

pip install humiolib

Usage

The examples below seek to get you going with humiolib. For further documentation have a look at the code itself.

HumioClient

The HumioClient class is used for general interaction with Humio. It is mainly used for performing queries, as well as managing different aspects of your Humio instance.

from humiolib.HumioClient import HumioClient

# Creating the client
client = HumioClient(
     base_url= "https://cloud.humio.com",
     repository= "sandbox",
     user_token="*****")

# Using a streaming query 
webStream = client.streaming_query("Login Attempt Failed", is_live=True)
for event in webStream:
    print(event)

# Using a queryjob 
queryjob = client.create_queryjob("Login Attempt Failed", is_live=True)
poll_result = queryjob.poll()
for event in poll_result.events:
    print(event)

# With a static queryjob you can poll it iterativly until it has been exhausted
queryjob = client.create_queryjob("Login Attempt Failed", is_live=False)
for poll_result in queryjob.poll_until_done():
    print(poll_result.metadata)
    for event in poll_result.events:
            print(event)

HumioIngestClient

The HumioIngestClient class is used for ingesting data into Humio. While the HumioClient can also be used for ingesting data, this is mainly meant for debugging.

from humiolib.HumioClient import HumioIngestClient

# Creating the client
client = HumioIngestClient(
   base_url= "https://cloud.humio.com",
   ingest_token="*****")

# Ingesting Unstructured Data
messages = [
      "192.168.1.21 - user1 [02/Nov/2017:13:48:26 +0000] \"POST /humio/api/v1/ingest/elastic-bulk HTTP/1.1\" 200 0 \"-\" \"useragent\" 0.015 664 0.015",
      "192.168.1..21 - user2 [02/Nov/2017:13:49:09 +0000] \"POST /humio/api/v1/ingest/elastic-bulk HTTP/1.1\" 200 0 \"-\" \"useragent\" 0.013 565 0.013"
   ]

client.ingest_messages(messages) 

# Ingesting Structured Data
structured_data = [
      {
         "tags": {"host": "server1" },
         "events": [
               {
                  "timestamp": "2020-03-23T00:00:00+00:00",
                  "attributes": {"key1": "value1", "key2": "value2"}      
               }
         ]
      }
   ]

client.ingest_json_data(structured_data)

python-humio's People

Contributors

alexanderbrandborg avatar andejens avatar fogh avatar hmpf avatar molatif-dev avatar pmech avatar samgdf avatar swefraser avatar xorsnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-humio's Issues

Bug: QueryJob cant be used to fetch all filter query results.

The code checks hasMoreEvents, but simply queries the same queryJob, so the same events are returned.

Need to implement this so that users can loop filter query results from a queryJob, perhaps even make it possible to adjust the amount of events the queryJob returns at each poll.

Examples use dataspace vs repo

The examples on the front page use the "dataspace" parameter instead of the "repo" parameter which seems to be in use.

Documentation Issues

Documentation Error & Missing Parameters
The create_queryJob documentation has some errors.

  • is_live is used twice. The second time should be timezone_offset
  • arguments is plural in the description but singular in the list.
  • start and end parameters do not specify units.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://python-humio.readthedocs.io/en/latest/reference/humioclient.html
  2. Scroll down to 'create_QueryJob'
  3. See errors

Screenshots
Screenshot 2023-09-25 at 9 17 34 AM

Humiolib Python Upload Bug

Basic Information
Customer: Christopher Balles , Crowdstrike
Salesforce Case #: https://crowdstrike.lightning.force.com/lightning/r/Case/5006T0000206vvoQAA/view
Cloud or On-Prem? both

What problem(s) has/have been observed?
Upload a file to an internal/cloud humio server and ran into this error:

Steps to reproduce
Wrote a small python script to reproduce this bug
Reference : https://python-humio.readthedocs.io/en/latest/reference/humioclient.html
Install humiolib using command
pip install humiolib
`
from humiolib.HumioClient import HumioClient
from humiolib.HumioExceptions import HumioException

client = HumioClient(
base_url= "https://cloud.us.humio.com/",
repository= "MJ_Json",
user_token="API_Token")
client.upload_file('./test.csv')
`
Outcome : The file successfully uploaded , However the Json errors are returned.

Additional context
image

Customer Impact
Is it a blocker for the customer? Not a blocker
Why is it important? Customer developing a POC for a customer.
What are their expectations for resolution/getting an update? They are curious to know if this can be fixed /supported.
Reference : https://python-humio.readthedocs.io/_/downloads/en/latest/pdf/

Unable to Query Between Dates

Describe the bug
Attempting to query between dates the results are either empty lists or a query that returns default results from now.

To Reproduce
Sample Code:


#convert datetime object to epoch time
start_epoch = int(start.timestamp())
end_epoch   = int(end.timestamp())

query = """
        eventsize() 
        | groupBy([field1,field2,field3], function=[count(), sum(_eventSize)])
        | eval(gigabytes=_sum/1073741824) 
        | sort(_count,order=desc)
        | start({start})
        | end({end})
"""

# Replace the start and end values in the query with the epoch times.
query = query.replace("{start}",str(start_epoch)).replace("{end}",str(end_epoch))

# create and execute the query job
queryjob = client.create_queryjob(query,is_live=False)
for poll_result in queryjob.poll_until_done():
        query_results = poll_result.events

print(query_results)

Returns the results from a default query ignoring the start and end.

Removing the start and end parameters and putting in to thecreate_queryJob:
queryjob = client.create_queryjob(query,start=start_epoch,end=end_epoch,is_live=False)

And it returns no results.

Expected behavior
Results for a query between dates are returned.

  • Python 3.9.14
  • HumioLib 0.2.5

Poll in static query job doesn't propagate kwargs

Describe the bug
Currently, the poll() method in StaticQueryJob will not propagate **kwargs meaning that if you set a timeout in the function call, it never gets set in the webcaller

To Reproduce

query_job = client.create_queryjob(query)
query_job.poll(timeout=1) # can hang indefinitely 

Expected behavior
the above snippet should raise an exception

HumioLib does not sanitize URLs in some cases

Describe the bug
HumioLib doesn't sanitize URLs and remove the trailing slash from them in some cases. This causes for weird errors such as the one below:
HumioHTTPException: ('HTTP method not allowed, supported methods: GET', 405)

To Reproduce
Steps to reproduce the behavior:

  1. Initialize the HumioLib with a url to your cluster and a '/' at the end. For example https://myhumiocluster.com/
  2. Attempt to run a query with this client.

Expected behavior
I would expect a more clear answer on the URL being incorrect. Currently it gives the above error which is not accurate to the problem

Desktop (please complete the following information):

  • MacOS

Polling Logic seems to look for all data

Describe the bug
According to the humio docs - polling is a single request that returns partial results from a query job.

The current behavior of the StaticQueryJob's poll is actually to continually poll until it receives the done message which is only actually sent when the job is completed. This causes issues since a request will take much longer than the actual timeout passed to this function.

that being said, the link that the client uses here is not what the API docs suggest (/dataspaces/{REPO} vs /repositories/{REPO}) which makes me think we might be using a legacy API, here?

Expected behavior
poll() should only make a request once. This method should actually be renamed to poll_until_completed or something

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.