The loglang from nicwaller

Output to Elasticsearch could use filter rules to ensure field types are consistent
Output to Slack could use filter rules to populate fields used by the Slack output plugin

Output: websocket

would be really cool to support websockets so that a browser can "tune in" to a realtime firehose of events. browser should be able to provide a filter that is executed on the server.

Output: http_listen

Listen for HTTP requests, and reply with recent events.

Two modes:

buffer always replies with recent events. This is not reliable delivery, but it can be useful for peeking at recent events. The size of the buffer (number of events, number of bytes) is configured part of the output.
cursor allows the client to fetch new events sent after the cursor position. Server replies with an updated cursor position. Due to limited buffer space, events may be dropped. This imperfection is acceptable.

Should probably allow the client to provide a filter. If filtering, then more modes are useful:

buffer-per-client (so that rare events don't get overwhelmed by common ones)
shared-buffer (when we don't trust the client)

Use Case

this can be useful for constructing a simple web interface that shows not-quite-realtime view of logs

if paired with a pipeline that does no filtering, this could also be useful as a way to peek at the recent past and see what the raw events looked like

Scheduled inputs

Some inputs don't run continuously; they run on a schedule or on demand or once at startup.

For example, an input that periodically polls an HTTP API. Probably want to use a channel to trigger the input. Then the channel could be fed by a recurring cron schedule.

Note: It is impossible to run a task every 14 days using cron so other types of recurrence schedules should be supportable.

Or even more interesting, loglang could provide an API that allows on-demand triggering of scheduled inputs. For example, a pipeline that reads from a dead letter queue would only be triggered on-demand.

Heartbeat should use this approach too.

Input: rss

Why not turn this into an RSS reader?

Identifying unique events

A fingerprint filter combined with a unique filter (perhaps using Bloom filters?) would allow identifying new events which could be very useful for alerting.

Framing: dsv

DSV (Delimiter-Separated Values) is mostly known as CSV and TSV for commas and tabs respectively.

Rows of tabular data can be interpreted as events by combining the header with the value. But because the header is stored outside the value, this complicates the framing pattern.

decoder for Prometheus text-based metric format

The Prometheus metrics format looks like this:

http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

Reference: https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#exposition-formats

In combination with an HTTP fetch input type, this could be used to generate events from Prometheus-capable endpoints.

Input: stdin

it should be possible to read events from process standard input

and loglang should exit with status 0 when standard input is closed

no E2E acknowledgement is needed

remember to populate ECS schema fields like hostname

Output: tcp_stream

write events to a tcp stream
connect to arbitrary {ip, port}
should support looking up hostname
- if hostname lookup fails (after a few tries) end the output
should use SetNoDelay, but also be smart about internal buffering strategy
option to bin-pack so that each TCP packet contains whole events
if remote peer closes the stream, reconnect (up to a limit/timeout)
- make sure to lookup the hostname again when reconnecting, in case DNS has changed
ignore anything sent back to us over TCP (maybe call CloseRead?)
how to handle keepalive?
should the stream be torn down when there are no events?

this should be easy

Output: udp

try to respect a batching strategy, while respecting that the max UDP datagram size is 65,515 bytes

this should be easy to implement

Input: exec

exec() a local process
read both stdout and stderr
populate ECS Schema process fields, especially process exit code
should provide default environment that is minimal but indicates invocation from loglang (keep PATH?)
should support arguments to the process
should support shell out too?
no need to send anything to stdin of launched process
should support scheduling to re-run periodically
if not scheduled, loglang should exit with status 0 but only if exec is the only input
what to use for default working directory? a tmp dir that gets cleaned up by loglang?
is there any reason to support parallelism?
provide NOCOLOR in the environment by default

use cases:

scraping of various kinds

@metadata

maybe events should have a separate store of metadata that doesn't get sent by outputs

Logstash uses the @metadata field, but it would be fine to have a separate field in the Event struct

Output: relp

If doing RELP input #24 then should do output as well.

Input: git

Git is super interesting! New branches, new commits, new tags can all be interpreted as events. The reflog (reference log) will probably be important here.

This could be very interesting:

Git -> filters > Slack

Output: File

output to a file, pipe, or socket
should be usable for output to systemd log device
support output batches for writing to files with different names
how to handle naming pattern? for example, organizing by /year/month/day/hour? or log.1, log.2, log.3, ...

Input: relp

RELP (the Reliable Event Logging Protocol) was proposed by Rainer Gerhards, the lead developer of rsyslog, in 2008.

Compared to plain syslog, RELP allows to receiver to send acknowledgements confirming the message was received.

Reference

Specification: https://github.com/rsyslog/librelp/blob/master/doc/relp.html
Mailing List: https://lists.adiscon.net/mailman/listinfo/relp (requires membership)
Implementation: https://github.com/rsyslog/librelp

Protocol Characteristics

Text-based protocol
Transport: always TCP
Framing: content-length header (plus a bit extra to support acks)
Codec: syslog

Pipelining is a key feature (client can send multiple requests without waiting for first response). Responses must be sent by the server in the exact same order as commands where received

Version 1.1 adds support for TLS using STARTTLS.

Output: stdout

Output: redis

Redis is very cool and it would be great to support it.

But not as part of the core suite; we should use an existing Redis module for this.

Modes

Input: tcp

other delimiters for kv style

If kv style can be customized with other delimiters then it should be compatible with LTSV (labelled tab-separated values).

Input: stomp

STOMP (Streaming Text Orientated Messaging Protocol) provides an interoperable wire format so that STOMP clients can communicate with any STOMP message broker to provide easy and widespread messaging interoperability among many languages, platforms and brokers.

https://stomp.github.io

Reference

Specification: https://stomp.github.io/stomp-specification-1.2.html

Protocol Characteristics

Transport: TCP
Framing: sometimes null delimiter, sometimes content-length
Codec: very custom(?)
Integrity: acknowledgement with receipt frames, transaction commit and rollback
Authentication: username/password

GELF UDP chunking

https://archivedocs.graylog.org/en/latest/pages/gelf.html

probably needs to be a dedicated input type

Schema: Splunk CIM

https://docs.splunk.com/Documentation/CIM/5.2.0/User/CIMfields

Providing guarantees about output field types

Elasticsearch is strict about field types within a given index. If you try to add two documents to the same index, like this:

{"status": 200}
{"status": "OK"}

Elasticsearch will refuse to index the second document, and if you're using Logstash that failure is silent. 😱

There are several things that Loglang could do to prepare output for Elasticsearch:

automatic coercion of field types to the first seen type
Automatic coercion of field types to a fixed schema (json schema)
Send failed events to a dead letter queue

Output: exec

exec() a local process
send events to stdin of exec'ed process
what to do with stdout/stderr from that process? drop, but write warning. (use exec input instead)
populate ECS Schema process fields, especially process exit code
should provide default environment that is minimal but indicates invocation from loglang (keep PATH?)
should support arguments to the process
should support shell out too?
what to use for default working directory? a tmp dir that gets cleaned up by loglang?
probably nice to support multiple exec in parallel up to some limit

Input: udp

Input: redis

Redis is very cool and it would be great to support it.

But not as part of the core suite; we should use an existing Redis module for this.

Modes

queue/list using LPOP/RPOP
- alternating between LPOP and RPOP has nice characteristics during overload scenarios. but this behaviour should be configurable (LPOP, RPOP, or alternating)
pub/sub using SUBSCRIBE
set using SPOP
stream using XREAD

Input: unix_socket

Unix domain sockets cannot be read like regular files, so a special input plugin is needed.

Requirements

support two socket modes (listen vs. connect)
populate ECS schema fields, especially host.name and file.path (even though it's not really a file) and network.transport = uds (unix domain socket)
end input if the socket is unavailable (no retry?)
unix domain sockets can be either byte stream (no framing) or datagram (framing)!

Description

Unix sockets are reliable. If the reader doesn't read, the writer blocks. If the socket is a datagram socket, each write is paired with a read. If the socket is a stream socket, the kernel may buffer some bytes between the writer and the reader, but when the buffer is full, the writer will block. Data is never discarded, except for buffered data if the reader closes the connection before reading the buffer.

Motivation

Unix domain sockets are used by traditional syslog and systemd. Supporting socket input would enable direct replacement of rsyslogd.

The GNU C Library provides functions to submit messages to Syslog. They do it by writing to the /dev/log socket. See Submitting Syslog Messages.

Source: https://www.gnu.org/software/libc/manual/html_node/Overview-of-Syslog.html

~ $ ls -lac /dev/log /run/systemd/journal/dev-log
lrwxrwxrwx 1 root root 28 Dec 22  2022 /dev/log -> /run/systemd/journal/dev-log
srw-rw-rw- 1 root root  0 Dec 22  2022 /run/systemd/journal/dev-log

Apparently Docker also uses unix sockets.

Tips

Run netstat -a -p --unix to see all unix sockets on the local system.

Use socat or hookah for development testing.

Input: File

tail or read whole file
- tail mode should keep track of file position between process restarts. use local filesystem for saving bookmark.
- whole file mode should support scheduling
single file, or glob path
support ECS Schema attributes for files (path, etc.)
save bookmark as offset in file, or based on the content of a field that expresses a total order?

preserve original event

To support the ECS schema field event.original it might be worth storing the original bytes (after framing, before codec) and providing an option to automatically include that on each output.

nicwaller / loglang Goto Github PK

loglang's Introduction

Hi there, I'm Nic! 👋

loglang's People

Contributors

Stargazers

Watchers

loglang's Issues

Reference:

Protocol Characteristics

See Also

See Also

Use Cases

Use Case

Reference

Protocol Characteristics

Modes

Reference

Protocol Characteristics

Modes

Requirements

Description

Motivation

Tips

Recommend Projects

Recommend Topics

Recommend Org