fxnn / gowatch Goto Github PK
View Code? Open in Web Editor NEWConfigurable logfile analysis for your server.
License: MIT License
Configurable logfile analysis for your server.
License: MIT License
In #22, we implement wildcards to support rotated logs. This mostly implies that we must be able to handle gzipped log files (transparently, i.e. just as if the file was uncompressed).
Each logentry has a timestamp. One must be able to compare them, at least based on the current time, or even with some given timestamp.
Therefore, we could have predicates as:
timestamp: {before: "-2d"} // not newer than 2 days in past
timestamp: {after: "-1h30m"} // not older than 1 hour and 30 minutes
timestamp: {after: "2015-01-01T00:00:00Z"} // in 2015
Note that the last version would need to use some fixed timestamp format. RFC3339 aka ISO 8601 feels right here.
Turns out that Syslog doesn't log the year.
Apr 5 05:39:01 lvps176-28-9-153 CRON[5773]: pam_unix(cron:session): session opened for user root by (uid=0)
Looks like we need to calculate the date. We could do this
Guess it's bad style to have
log.Fatalf("Error message")
return logentry.AcceptNothingPredicate{} // actually never executed
all over the code. The decision to quit immediately should be made only at one place in code; until then, normal error handling (by return
) should be used.
Currently, on parsing files with grok, one has to say parser: grok
. Equaly, when summarizing log entries with grok, one has to say summarizer: grokcounter
.
Currently, those seem to be the most useful to me, so we could just make them default.
See build #44.2: it fails because of what looks like a bug inside golang.org/x/text, causing the call
collate.New(language.AmericanEnglish, collate.Numeric)
to panic as shown. Currently, we fix it by hard-coding language.Und
.
To be able to read rotated log files, we need to implement wildcards in the log file path.
Now, it would be a problem if we piped log files from several months (or even years) through gowatch just to find out that every single line is filtered out because of a timestamp predicate. Therefore, the parser should apply timestamp predicates to file modification timestamps also.
While currently, one can only add tags to each line of a logfile, it should be possible to add tags only to lines matching a given predicate.
Though this is currently possible by parsing the same file multiple times with different predicates, that's not the way it's supposed to work.
Possible syntax would be
logfiles:
- filename: /path/to/file.log
tags:
- tag_for_each_line
- tag_for_few_lines: {
Message: { contains: some text }
}
Alternatively, we could provide this feature only by using an extended mapping section, in addition to the existing logfiles
and summary
sections.
(Note, that this already uses the new predicate syntax from #4)
Setting up a configuration for many logfiles including expressive summaries is a hard piece of work to do, therefore, users will want to share pieces of configuration.
To facilitate sharing, we should make it as easy as possible. Therefore, one should be able to import configuration from an URL. Except for the remote location, everything should work as described in #6.
Supported protocols should be HTTP and HTTPS for the beginning. For faster execution, gowatch should cache the imported files between invocations. We should keep in mind that some users will want to use gowatch in an offline mode; we could support this by downloading once and never again -- maybe just by using the cache.
Currently, a configuration file looks as follows:
logfiles:
- filename: /var/log/auth.log
tags: ['auth.log']
timelayout: Stamp
config: {pattern: '%{SYSLOGBASE} %{GREEDYDATA:Message}'}
summary:
- summarizer: count
title: auth.log
where: {tags: {contains: 'auth.log'}}
config: {
'sudo [%{user}->%{effective_user}] %{command}': '\s*%{USER:user}\s*: TTY=%{DATA:tty} ; PWD=%{PATH:pwd} ; USER=%{USER:effective_user} ; COMMAND=%{PATH:command}(: %{GREEDYDATA:arguments})?'
}
Parts of it are made to be easy to read, like where: {tags: {contains: 'auth.log'}}
. Everyone should know what's ment, and I also feel that it's quite intuitive and thus easy to write and remember.
This should be done with all keywords in the file (as far as possible). Ideas:
do: count
(instead of summarizer
)with: {pattern: 'abc'}
(instead of config
)The config file should provide a means of adding grok patterns, ether by naming a file in the usual format
PATTERN_NAME (my)?p[at]tern
or by just providing patterns inside the yaml file. The syntax could be
patterns: {
PATTERN_NAME: '(my)?p[at]tern'
}
patternsource: /path/to/patternfile
The main summarizer we currently have is the GrokCounter, allowing to have a set of patterns (each with a name), which counts the occurences of each pattern.
Dovecot: Failed Login Attempts
==============================
5.196.31.23: 1
49.248.147.211: 1
52.6.24.186: 4
52.6.71.222: 3
52.6.130.221: 2
54.208.194.166: 1
Now, what I'd like to see is that we not just only have the number of occurences per pattern, but that we can also see what happened. In the above example, we could list the user names per IP.
Dovecot: Failed Login Attempts
==============================
5.196.31.23: webmaster
49.248.147.211: admin
52.6.24.186: joe, webmaster, admin, adm
52.6.71.222: adm, admin, joe
52.6.130.221: frank, joe
54.208.194.166: user
It's yet unclear to me how to specify the match to be displayed. The configuration for the GrokCounter is
- summarizer: count
config: {
'%{login_host}': 'auth\(%{PROG}\): %{PROG}\(%{USER},%{IPORHOST:login_host}\): unknown user'
}
Guess we need a tuple or something, so that we can specify the pattern and the match to be displayed:
- summarizer: count
config: {
'%{login_host}': ['%{user}', 'auth\(%{PROG}\): %{PROG}\(%{USER:user},%{IPORHOST:login_host}\): unknown user']
}
Unfortunately, tuples are bad to read. So, another map?
- summarizer: count
config: {
'%{login_host}': {
list: '%{user}',
for: 'auth\(%{PROG}\): %{PROG}\(%{USER:user},%{IPORHOST:login_host}\): unknown user'
}
}
Currently, predicates work like
where: {
allof: [{
field: Message,
contains: some text
}, {
field: Message,
matches: '%{SOME_PATTERN}'
}]
}
This could be made a lot more concise by removing the "field" key and mapping fields to conditions instead:
where: {
Message: {contains: some text, matches: '%{SOME_PATTERN}'},
not: {Message: {matches: '%{ANOTHER_PATTERN}'}}
}
This of course is at the expense of prohibiting the custom fields not
, allof
, anyof
. Guess we can live with that.
Within a configuration file, it should be possible to import other configuration files by using an import statement on the top level.
Syntax should be as follows. At first, we should be able to import a single file.
import: /path/to/import.yml
Then, we should be able to import multiple files at once.
import:
- /path/to/import1.yml
- /path/to/import2.yml
The semantics shall be as follows. On importing a configuration file, every logfile defined therein must be added to the logfiles defined in the importing configuration file. Equally, every summary defined in the imported configuration file must be added to those defined in the importing configuration file.
The following summarizer is malfunctioning:
- summarizer: count
title: Hosts of Discarded and Junk Mails
where: {
allof: {
tags: {contains: 'mail.log'},
SYSLOGPROG: {contains: 'dovecot'}
}
}
config: {
'%{msgid_host}': 'deliver\(%{USER:user}\): sieve: msgid=<%{DATA:msgid_nonhost}@%{IPORHOST:msgid_host}>: marked message to be discarded if not explicitly delivered',
'%{msgid_host}': "deliver\\(%{USER:user}\\): sieve: msgid=<%{DATA:msgid_nonhost}@%{IPORHOST:msgid_host}>: stored mail into mailbox 'Junk'"
}
The reason for this is the duplicate key in the config
section. Currently, only one of both entries gets processed.
To resolve the problem, we must use YAML maps instead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.