Git Product home page Git Product logo

json-streaming-logs's Introduction

JSON Streaming Logs

This packages makes Zeek write out logs in such a way that it makes life easier for external log shippers such as filebeats, logstash, and splunk_forwarder.

The data is structed as JSON with "extension" fields to indicate the time the log line was written (_write_ts) and log type such as http or conn in a field named _path. Files are rotated in the current log directory without being compressed so that any log shipper has a while to catch up before the log file is deleted.

Logs are named in such a way that a glob in your log shipper configuration should be able to easily match all of the logs. Each log will have a prefix of json_streaming_ so that the http log would have the full name of json_streaming_http.log.

Loading this script also doesn't impact any other existing logs that Zeek outputs. If you would like to disable other log output from Zeek, you can change the JSONStreaming::disable_default_logs variable to Tto disable all of the default logs. The only potential issue is that your logs will become completely ephemeral with this change because no logs will be rotated into local storage.

Contact

Please reach out to [email protected] if you have issues with this script or if you have thoughts on ways this could better fit into your environment.

json-streaming-logs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

json-streaming-logs's Issues

Unable to rename files when running with docker

Running the default zeek image and the json-streaming-logs package results in an error logged. I do not know if this is a docker issue, zeek issue, or something that can be fixed in json-streaming-logs. Please close this out if it's not something that should be posted here.

Error:

{
  "_path": "reporter",
  "_system_name": "ubuntu",
  "_write_ts": "2022-10-23T14:40:00.006577Z",
  "level": "Reporter::ERROR",
  "location": "/usr/local/zeek/share/zeek/site/json-streaming-logs/./main.zeek, line 73",
  "message": "cannot rename file '/opt/logs//json_streaming_files-22-10-23_14.39.20.log' to 'json_streaming_files.1.log': Invalid cross-device link (rename(JSONStreaming::info$fname, JSONStreaming::info$path + .1.log))",
  "ts": "2022-10-23T14:40:00.006577Z"
}

Image Name: zeekurity/zeek:latest

create ephermeral file with datestamp instead of count for rollover

I'm finding an error where the process watching the logs is not smart enough to understand the file was renamed and rather re-reads the entire file when it is renamed from .log to .1.log.

It might be better to initially start the file as json_streaming_conn_{ts}.log and on roll over a new file is created with the new timestamp. After N files, it starts to delete the older timestamps(in order).

This way the newest file is always the most recent timestamp, and the files themselves are never actually renamed but rather just stop being written to.

I hope that makes sense. I'll try to hack it in and if it works properly will submit a pull request.

Log Warning: use of out-of-scope local JSONStreaming::filt deprecated

warning in /usr/local/zeek/spool/installed-scripts-do-not-touch/site/packages/./json-streaming-logs/./main.zeek, line 135: use of out-of-scope local JSONStreaming::filt deprecated; move declaration to outer scope

		{
		Log::add_filter(stream, filt);
		}

Not sure what this issue or error is, but when running zeek with this plugin I get a single line entry in stderr.log that says this is deprecated.

Looks like the issue is related to how filt is defined here but I can't really see anything wrong with it.

for ( [stream, filt] in new_filters )

File match glob duplicates events on rotation

From the README:

Logs are named in such a way that a glob in your log shipper configuration should be able to easily match all of the logs. Each log will have a prefix of json_streaming_ so that the http log would have the full name of json_streaming_http.log

In some configurations the typical wildcard match on the json_streaming_ logs results in duplicate event collection by the shipper. For a given event, the log is read once in the current output file, and then one time additionally for each rotated log file. It seems like by default rotate_logs will create at least one rotated file (.1.log), even if JSONStreaming::extra_files is set to 0.

This shows a unique event that only occurred once but is picked up repeatedly at each rotation interval by the shipper (in this case Fluentd/td-agent):

image

The Fluentd configuration, with the path option wildcard matching the log files:

<source>
    @type tail
    @id zeek_json
    @label @zeek
    tag zeek.*
    path /nsm/zeek/logs/current/json_streaming_*.log
    pos_file /var/log/td-agent/tmp/zeek_json.pos
    <parse>
        @type json
    </parse>
</source>

What is the best course of action here? Is this a case where json-streaming-logs should support giving the output file a different file name that can be uniquely matched by the given glob (and not the extra files), or is this a situation that a log shipper should be able to handle better? Fluentd documentation does state:

You should not use * with log rotation because it may cause the log duplication. In this case, you should separate in_tail plugin configuration.

...but at the same time, Zeek outputs several logs and it's not clear up front which log file may appear at some point, so a glob approach is probably the most correct. If the plugin could be guaranteed to output a single file that wouldn't be picked up again at rotation, that seems the most predictable.

network_time should be current_time

$write_ts = network_time());

When running this script on a PCAP, the value for _write_ts will be the timestamp from the time of the packet in the PCAP and not the current time the log data is being written. According to the docs, network_time should be used to get the time from the last processed packet. Instead, because the log file write time is not related to the time of the traffic, current_time should be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.