bucardo / tail_n_mail Goto Github PK
View Code? Open in Web Editor NEWLogfile watcher focused on Postgres logs
Home Page: http://bucardo.org/wiki/Tail_n_mail
License: Other
Logfile watcher focused on Postgres logs
Home Page: http://bucardo.org/wiki/Tail_n_mail
License: Other
Bucardo - a table-based replication system DESCRIPTION: ------------ This is version 5.6.0 of Bucardo. COPYRIGHT: ---------- Copyright (c) 2005-2023 Greg Sabino Mullane <[email protected]> REQUIREMENTS: ------------- build, test, and install Perl 5 (at least 5.8.3) build, test, and install PostgreSQL (at least 8.2) build, test, and install the DBI module (at least 1.51) build, test, and install the DBD::Pg module (at least 2.0.0) build, test, and install the DBIx::Safe module (at least 1.2.4) You must have at least one database that has PL/pgSQL and PL/Perl installed. Target databases may need PL/pgSQL. INSTALLATION: ------------- To install this module type the following: perl Makefile.PL make make test (but see below first) make install EXAMPLES: --------- See the test suite in the t/ subdirectory for some examples. WEBSITE: -------- Please visit https://bucardo.org for complete documentation. DEVELOPMENT: ------------ To follow or participate in the development of Bucardo, use: git clone [email protected]:bucardo/bucardo.git GETTING HELP: ------------- For general questions and troubleshooting, please use the [email protected] mailing list. GitHub issues which are support-oriented will be closed and referred to the mailing list anyway, so help save time for everyone by posting there directly. Post, subscribe, and see previous archives here: https://bucardo.org/mailman/listinfo/bucardo-general
More specifically, these 6 lines came to us in email from cron:
Use of uninitialized value in hash element at tail_n_mail line 3811.
Use of uninitialized value in hash element at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
The logs roll over at midnight. We scan the logs during business hours only so this morning's first run looked at yesterday's log and today's log.
The alert we received by email was missing the filenames in the "From A,B" line, and was missing process and transaction numbers in the first "DETAIL" line from yesterday's log; the second "DETAIL" line in the alert is using transaction and process IDs from the first error instead of the second.
The alert (without the [transaction] to save space) received by email:
[1] From A, B(between lines 81,635 of and 28 of , occurs 2 times) (1, 1)
First: [A] 2021-06-17 19:20:45 CDT userid@database 21064 domainname(46264)
Last: [B] 2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282)
ERROR: deadlock detected
DETAIL: Process ? waits for ShareLock on transaction ?; blocked by process ?. Process ? waits for ShareLock on transaction ?; blocked by process ?. Process ?: [transaction]
HINT: See server log for query details.
CONTEXT: while locking tuple(?) in relation "node"
STATEMENT: [transaction]
...
ERROR: deadlock detected
DETAIL: Process 21064 waits for ShareLock on transaction 2512098105; blocked by process 21065. Process 21065 waits for ShareLock on transaction 2512098104; blocked by process 21064. Process 21064: [transaction] Process 21065: [transaction]
HINT: See server log for query details.
CONTEXT: while locking tuple (2259,58) in relation "node"
STATEMENT: [transaction]
The first error from yesterday's log which is presented as the second error:
2021-06-17 19:20:45 CDT userid@database 21064 domanname(46264) ERROR: deadlock detected
2021-06-17 19:20:45 CDT userid@database 21064 domainname(46264) DETAIL: Process 21064 waits
for ShareLock on transaction 2512098105; blocked by process 21065.
Process 21065 waits for ShareLock on transaction 2512098104; blocked by process 21064.
Process 21064: [transaction]
Process 21065: [transaction]
The second error from today's log:
2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282) ERROR: deadlock detected
2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282) DETAIL: Process 13533 waits
for ShareLock on transaction 2512503071; blocked by process 13532.
Process 13532 waits for ShareLock on transaction 2512503070; blocked by process 13533.
Process 13533: [transaction]
Process 13532: [transaction]
We monitor our pgsql logfiles with tailnmail every hour on the hour. So far so good.
At midnight, pgsql rotates his logfile (log_rotation_age = 1d in postgresql.conf).
What I see this morning in the tailnmail config file is the correct logfile name BUT with a wrong offset.
OFFSET should be 0 as the pgsql logfile is still empty.
It seems like it's the previous one computed in yesterday logfile that is retained after the midnight rotation.
Best regards
In my tnm.config.txt:
LOG_LINE_PREFIX: '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h '
However it's catching and reporting lines like these:
[1] (from line 27,630)
?
2017-10-24 20:21:34 CDT [29585]: [1207-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com LOG: execute S_259: UPDATE quartz_TRIGGERS SET TRIGGER_STATE = $1 WHERE TRIGGER_NAME = $2 AND TRIGGER_GROUP = $3 AND TRIGGER_STATE = $4
[2] (from line 27,631)
?
2017-10-24 20:21:34 CDT [29585]: [1208-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com DETAIL: parameters: $1 = 'ACQUIRED', $2 = 'Smoke Test Email ReportTrigger', $3 = 'DEFAULT', $4 = 'WAITING'
[3] (from line 27,632)
?
2017-10-24 20:21:34 CDT [29585]: [1209-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com LOG: execute S_54: INSERT INTO quartz_FIRED_TRIGGERS (ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE, INSTANCE_NAME, FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL, REQUESTS_RECOVERY, PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
[4] (from line 27,633)
?
2017-10-24 20:21:34 CDT [29585]: [1210-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com DETAIL: parameters: $1 = 'NON_CLUSTERED1508347730715', $2 = 'Smoke Test Email ReportTrigger', $3 = 'DEFAULT', $4 = 'f', $5 = 'NON_CLUSTERED', $6 = '1508894520000', $7 = 'ACQUIRED', $8 = NULL, $9 = NULL, $10 = 'f', $11 = 'f', $12 = '5'
Discussed with @turnstep , he believes it to be due to whitespace in the app %a value "PostgreSQL JDBC Driver" and difficulty with regex. Creating issue to track it.
I 've installed tail_n_mail on the production servers 1 week ago and it was working fine. I 've put it in a cron that runs at every 30 minute. But today at MIdnight, it kept running and the second instance also started at 00:30. Both were consuming 100% CPU. It was a very critical thing. Can u plz tell me why it happened.
The log file was last updated at 11:30 --
EMAIL: [email protected],[email protected]
PGFORMAT: 3
FILE: /edbdata/data/8.3/dbserver_logs/LATEST
LASTFILE: /edbdata/data/8.3/dbserver_logs/enterprisedb-2010-01-14_000000.log
OFFSET: 10636714
MAXSIZE: 80000000
MAILSUBJECT: EDB1 DBServer errors (FILE)
INCLUDE: PANIC:
INCLUDE: FATAL:
EXCLUDE: database ".+" does not exist
and here is the process running...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5429 enterpri 25 0 35224 8244 1572 R 100 0.1 51:43.26 perl /opt/app/edb/bin/tail_n_mail.pl /opt/app/edb/bin/tnm.config.txt
12273 enterpri 25 0 34828 7948 1572 R 100 0.0 21:41.33 perl /opt/app/edb/bin/tail_n_mail.pl /opt/app/edb/bin/tnm.config.txt
The log file size was not too big that time..
plz help.
Bradley Hzuka 2016-05-12 16:06:31 EDT
Given a tail_n_mail configuration like the following:
LOG_LINE_PREFIX: %t [%p]: [%l-1] user=%u, app=%a, host=%r, xid=%x
EMAIL: [email protected]
MAILSUBJECT: HOST Postgres errors for database
INCLUDE: WARNING:
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
EXCLUDE: LOG:
FILE1: /pg_data/9.4/data/pg_log/LATEST
LASTFILE1: /pg_data/9.4/data/pg_log/postgresql-2016-05-12_000000.log
OFFSET1: 339348180
And a postgresql.log file like the following that has the appended application
name from pgBouncer.
2016-05-12 14:49:01 CDT [21813]: [5645-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.047 ms
2016-05-12 14:49:01 CDT [21813]: [5646-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.074 ms
2016-05-12 14:49:01 CDT [21813]: [5647-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.057 ms
2016-05-12 14:49:01 CDT [21813]: [5648-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.017 ms
Results in all lines being found instead of filtering them out. Note that if
the feature is switched off in pgbouncer, the script works as expected since
the lines occur like the following without an appended IP and port to the
application name:
2016-05-12 14:49:01 CDT [21813]: [5645-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.047 ms
2016-05-12 14:49:01 CDT [21813]: [5646-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.074 ms
2016-05-12 14:49:01 CDT [21813]: [5647-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.057 ms
2016-05-12 14:49:01 CDT [21813]: [5648-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.017 ms
Since it is nice to know what IP the actual request was originating from
through the PgBouncer it is preferable to have the option turned on which means
the application name could be something like "adhoc_psql" or "adhoc_psql -
127.0.0.1:1234". Is there a way to configure tail_n_mail to handle the
application name with a optional " - 0.0.0.0:000001" (ip and port)? Note that I
also utilize PgBadger and the additional characters added to the application
name do not appear to affect the parsing of the utility.
The output of the resulting email is something similar to the following:
[1] (from line 1,898,186)
?
2016-05-12 14:27:05 CDT [20034]: [1358-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.173 ms
[2] (from line 1,898,187)
?
2016-05-12 14:27:05 CDT [20034]: [1359-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.162 ms
[3] (from line 1,898,188)
?
2016-05-12 14:27:05 CDT [20034]: [1360-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.157 ms
Upgrading to 3.1.0 from an old 1.31.x. Found note in 3.0 changes doc that "Config files are now static, and will not get rewritten." Ran 3.1.0 for a few days and fresh config.state files are present for each cluster. Compared config with config.state files to determine what's redundant. In my case, I saw
config: MAILSUBJECT, INHERIT, FILE1, LASTFILE1, OFFSET1
config.state: FILE1, LASTFILE1, ROTATEFILEMODTIME1, OFFSET1
Edited config to only have MAILSUBJECT, INHERIT which should make it good for git inclusion. Ran "select 2/0;" in one cluster to generate error. Reran TNM: no errors, mail received about division by zero error, but the config file was rewritten to as it was before.
Hello,
I am using tail_n_mail-1.20.0 on EDB 9.2 version . There is one cron job that executes every 10 minutes.
Getting the following error:
Error in tempfile() using tnmXXXXXXXX.tnm: Could not create temp file tnmbf18kVDR.tnm: Permission denied at /tmp/tail_n_mail.pl line 1692
Also getting mails which has an empty subject and body also no recipients can this mail be related to the above error.
Can you please tell me what I am doing wrong. Really appreciate your help.
Thanks
Hi,
Any of the existing PGFORMAT is not supporting when the timstamp contains miliseconds.
e.g., My postgres log file look like -
2010-07-01 17:27:49.128 UTC backup us1 216.211.116.219(48668) 11185 FATAL: no pg_hba.conf entry for host "216.209.126.218", user "backup", database "us1", SSL off
But this FATAL could not be catched by tail_n_mail, as timestamp contains miliseconds.
Solution -
Adding one more Regex for date formats worked for me.
--- a/tail_n_mail.pl
+++ b/tail_n_mail.pl
@@ -128,6 +128,7 @@ my %pgpidres = (
2 => qr{.+?\d\d:\d\d:\d\d \w\w\w (\d+)},
3 => qr{.+?\d\d:\d\d:\d\d (\w\w\w)}, ## Fake a PID
4 => qr{.+?[(\d+)]},
my $pgpidre = $pgpidres{$pgformat};
@@ -138,6 +139,7 @@ my %pgpiddateres = (
2 => qr{(.+?\d\d:\d\d:\d\d \w\w\w)},
3 => qr{(.+?\d\d:\d\d:\d\d \w\w\w)},
4 => qr{(.+? \w\w\w)},
in tnm_config.txt changed the PGFORMAT = 5 .
Now it works for timestamp with miliseconds also.
Can u please add this to next version.
Hello,
for maintenance reasons, our PG linux server was down from Friday October 27 to Monday October 30.
In our postgresql.conf we use : log_rotation_age = 1d
In tail-nmail config we have : FILE1: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces
So after one year activity, we have 31 PG log files in our LOG directory (from <CLUSTER_NAME>-01.log to <CLUSTER_NAME>-31.log) on which PG rotates each night.
When the server went up, this morning (October 30th), the first run of Tailnmail analyzed files <CLUSTER_NAME>-27 to <CLUSTER_NAME>-30 considering they were all from October but due to the weekend shutdown files 28 and 29 were in fact files from last September. Only files -27 and -30 files should have been analyzed.
Here is the debug output of this tnm run :
Opened config file "/u02/pgsql/admin/9.4/<CLUSTER_NAME>/scripts/tnm_pgsql.cfg"
$opt1 = {
'configfile' => {
'log_line_prefix' => 1,
'include' => 1,
'file.HASH(0x1270388)' => 1,
'file' => 1,
'email' => 1,
'[email protected]' => 1,
'exclude.checkpoint complete:' => 1,
'include.FATAL:' => 1,
'exclude.automatic analyze' => 1,
'include.LOG:' => 1,
'exclude.checkpoint starting:' => 1,
'offset' => 1,
'include.ERROR:' => 1,
'lastfile' => 1,
'customsubject' => 1,
'include.PANIC:' => 1,
'exclude.(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg_|seq)|SET |$))' => 1,
'mailsubject' => 1,
'exclude.automatic vacuum' => 1,
'exclude' => 1
},
'lastfile' => {
'1' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log'
},
'log_line_prefix' => ''user=%u,db=%d,'',
'include' => [
'PANIC:',
'FATAL:',
'LOG:',
'ERROR:'
],
'file' => [
{
'lastfile' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log',
'original' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-%d.log',
'suffix' => '1',
'name' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log',
'offset' => '5870628'
}
],
'customsubject' => 1,
'email' => [
'[email protected]'
],
'mailsubject' => 'HOST (Serveur de PRODUCTION <CLUSTER_NAME>) Postgres errors UNIQUE : NUMBER',
'offset' => {
'1' => '5870628'
},
'exclude' => [
'checkpoint starting:',
'checkpoint complete:',
'automatic analyze',
'automatic vacuum',
'(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg|seq)|SET |$))'
]
};
$arg1 = {
'verbose' => 1,
'log_line_prefix' => ''user=%u,db=%d,'',
'mailserver' => 'example.com',
'find_line_number' => 1,
'file' => [],
'sortby' => 'count',
'maxsize' => 80000000,
'mailuser' => 'example',
'tsepnosub' => 0,
'debug' => 1,
'duration_limit' => 0,
'sqlstate' => 0,
'dryrun' => 0,
'timewarp' => 0,
'pgmode' => 1,
'nomail' => 0,
'duration' => -1,
'statement_size' => 3000,
'mailport' => 465,
'tsep' => undef,
'flatten' => 1,
'type' => 'normal',
'skip_non_parsed' => 1,
'smtp' => 0,
'tempfile_limit' => 0,
'tempfile' => -1,
'hideflatten' => 1,
'mailcom' => '/usr/sbin/sendmail',
'maxemailsize' => 10000000,
'reset' => 0,
'pretty_query' => 1,
'help' => 0,
'mailmode' => 'sendmail',
'mailpass' => 'example',
'offset' => -1,
'quiet' => 0,
'showonly' => 0,
'version' => 0,
'mailsig' => [],
'mailzero' => 0,
'rewind' => 0,
'pglog' => 'pg'
};
Log line prefix regex: (?-xism:^(()()user=[[\w]],db=[[\w]],)(.*))
Log line prefix regex2: (?-xism:^user=,db=,)
Log line prefix regex3: (?-xism:^user=)
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log Offset: 5,870,628 Size: 5,871,159 Maxsize: 80,000,000
Adding exclusion: checkpoint starting:
Adding exclusion: checkpoint complete:
Adding exclusion: automatic analyze
Adding exclusion: automatic vacuum
Adding exclusion: (execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg|seq)|SET |$))
Exclusion: (?-xism:checkpoint starting:)|(?-xism:checkpoint complete:)|(?-xism:automatic analyze)|(?-xism:automatic vacuum)|(?-xism:(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg|_seq)|SET |$)))
Adding inclusion: PANIC:
Adding inclusion: FATAL:
Adding inclusion: LOG:
Adding inclusion: ERROR:
Inclusion: (?-xism:PANIC:)|(?-xism:FATAL:)|(?-xism:LOG:)|(?-xism:ERROR:)
MATCH at line 44725 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44726 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44727 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44728 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44729 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log: 5
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log Offset: 0 Size: 11,310,774 Maxsize: 80,000,000
MATCH at line 6768 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6769 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6770 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6771 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 7363 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 8170 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 8344 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 21717 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log: 67
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log Offset: 0 Size: 6,128,680 Maxsize: 80,000,000
No new lines found in file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log Offset: 0 Size: 554,364 Maxsize: 80,000,000
MATCH at line 955 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 956 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 957 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 958 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 960 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 961 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 995 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1039 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1123 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1175 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1941 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2043 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2098 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2166 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2300 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2443 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2701 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2936 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log: 18
Sending mail to: [email protected]
Performing final cleanup
Saving new config file
Regards
In my config file, I use the following option :
STATEMENT_SIZE : 3000
However, it is lost during config file rewriting.
Regards
Parentheses are still not getting escaped around log line options and it's causing lines not to be matched. I'd reported this back around version 1.19.3, but it's still happening with the latest version (1.25.1)
For example, this does not match
log_line_prefix = '%t (%r) [%p]: [%l-1] user=%u,db=%d '
But this does
log_line_prefix = '%t [%r] [%p]: [%l-1] user=%u,db=%d '
I have a TSEP: 0 in my config file, but tnm doesn't respect it. --tsep=0 on the command line works just fine, though.
Yesterday I received a high load alert from one of my prod DB servers. I logged in to find over 10 tail_n_mail processes running.
One of my devs sent a query with a super long IN list, the log line was 6,893,029 chars long. His connection was lost (FATAL) so I assume tail_n_mail was trying to grab the statement for the email notification and choked itself into a spin. tail_n_mail was then kicking off every 5 minutes via crontab and each subsequent process was choking in the same place.
I don't think tail_n_mail needs to print the entire statement in an email, and really I wouldn't want to print this one and eat up my inbox space. Not sure if you can grab just a subset of 1,000 or 10,000 chars without having to read the whole thing.
For some reason : (\S+) cannot match session ids like : 5b6c0838.1de3c
therefore no ERROR or other configured messages are identified.
By removing %c from postgresql log_line_prefix (and adjusting of course LOG_LINE_PREFIX: in tnm conf file) everything works again.
Using version 1.31.3
$ ./tail_n_mail --version
./tail_n_mail version 1.31.3
Configured to just use the latest log file:
FILE1: /var/lib/pgsql/9.6/data/pg_log/LATEST
LASTFILE1: /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-26_000000.log
OFFSET1: 2551862
Scheduled to run every 5 minutes:
*/5 * * * * /var/lib/pgsql/dba/bin/tail_n_mail /var/lib/pgsql/dba/etc/tnm.conf
However, at 00:05 (12:05 AM) every day, I get one report that lists errors in log files from many days ago. For example, the past 4 days, I've gotten alerts from maintenance work that was done on Feb 22:
Total matches: 25
Matches from [A] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_000000.log: 20
Matches from [B] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_101921.log: 3
Matches from [C] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_102154.log: 2
I'm curious as to the following:
Regardless it seems like a bug. I don't have tail_n_mail scheduled to run from anywhere else.
ID: 113
Version: unspecified
Date: 2012-09-24 08:30 EDT
Author: Jan Urbański ([email protected])
https://gist.github.com/uncleshelby/5852212
patch to fix the issue
The regexp used to capture %u and %d sequences does not allow hyphens.
The attached patch fixes this and includes a test case to demonstrate the issue
(the test case also shows a different bug, that will be filed separately).
Hello,
I am using tail_n_mail-1.27.0.-1.sles12.noarch on linux SLES12 SP2.
I configured LOG_LINE_PREFIX equal to log_prefix_line for postgresql logs as '%t %e'.
So tail_n_mail was able to parse all necessary entries starting by timestamp an SQLSTATE type of:
2018-11-21 12:41:32 FET 00000 LOG: worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1
But after I have updated my SLES12 to SP3 tail_n_mail stopped noticing these entries.
2018-11-21 12:41:32 +03 00000 LOG: worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1
As you can see the reason is in different output of timezone. FET was before and +03 is our current format.
As far as I understand postgreSQL uses system default configuration and it is not possible to configure some particular timestamp format within PostgreSQL. So, I reckon, it can be corrected in tail_n_mail.
I see in tail_n_mail template that if log contains timestamp, timezone is supposed to be corresponding to
\w\w\w\w? pattern.
I would suggest changing this part of template by [+-]?\w+
Thank You in advance,
Andrei
.tailnmailrc
log_line_prefix='%m appname=%a,user=%u,db=%d,pid=%p,txid=%x,vtxid=%v'
Error when running:
Can't use string ("'%m appname=%a,user=%u,db=%d,pid"...) as a HASH ref while "strict refs" in use at tail_n_mail line 3734.
Hi,
Trying out different ways to handle alerting in Production environment. This is to handle
grouping alerts and mail to different mailing groups ie basically grouping Includes and Excludes for certain Email ids. This way the config1.txt would handle multiple lines of EMAIL and its corresponding Includes/Excludes, even in case a null value is passed for EMAIL.
Following where done to achieve the same:
@@ -404,7 +404,8 @@ my $fileinfo = $opt{file}[$filenumber];
}
-process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};
+#process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};
+invoke_process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};
@@ -614,6 +615,8 @@ sub parse_config_file {
## Store locally so we can easily populate %opt at the end
my %localopt;
@@ -629,6 +632,8 @@ sub parse_config_file {
$in_standard_comments = 0;
}
$mail = $1 if (/^EMAIL: (.+)/); # parsing mailid whenever we read a new section
@@ -637,7 +642,8 @@ sub parse_config_file {
## If the exact same line shows up more than once ignore it.
## Failing to do so will confuse the comment hash
if (/^[\w]/ and $seenit{$_}++) {
$seen_mailid{$mail}++ if $mail;
if (/^[\w]/ and $seenit{$_}++ and !$seen_mailid{$mail}++) {
@@ -718,7 +724,11 @@ sub parse_config_file {
}
## Who to send emails to for this file
elsif (/^EMAIL:\s_(.+?)\s_$/) {
push @{$localopt{email}}, $1;
# push @{$localopt{email}}, $1;
my $m = $1;
$localopt{email_hash}{$m} = {};
@{$localopt{include}} = ();
@{$localopt{exclude}} = ();
@@ -774,7 +784,8 @@ sub parse_config_file {
}
## Which lines to exclude from the report
elsif (/^EXCLUDE:\s_(.+?)\s_$/) {
push @{$localopt{exclude}}, $1;
#push @{$localopt{exclude}}, $1;
push @{$localopt{email_hash}{$mail}->{exclude}}, $1 if $mail; # global exclude --> mailid based exclude
## Which lines to include in the report
elsif (/^INCLUDE:\s*(.+)/) {
push @{$localopt{include}}, $1;
#push @{$localopt{include}}, $1;
push @{$localopt{email_hash}{$mail}->{include}}, $1 if $mail; # global include --> mailid based include
@@ -1981,6 +1993,22 @@ sub process_line {
} ## end of process_line
+# Invoke_process_report
+# invoking process_report subroutine for each sections separately
+# ie: looping through email_hash and get each emailids and corresponding includes and excludes,
+# make a call to the original process_report after overriding global include/exclude with mailid specific include/exclude
+sub invoke_process_report {
foreach my $e(keys %{$opt{email_hash}}) {
@{$opt{include}} = @{$opt{email_hash}{$e}->{include}};
@{$opt{exclude}} = @{$opt{email_hash}{$e}->{exclude}};
$opt{email} = [];
push @{$opt{email}}, split(/,/,$e);
process_report();
}
return;
@@ -2414,10 +2447,10 @@ sub lines_of_interest {
|| ($a->{line} <=> $b->{line});
}
elsif ($sorttype eq 'count') {
return ($b->{count} <=> $a->{count})
|| ($fileorder{$a->{filename}} <=> $fileorder{$b->{filename}})
|| ($a->{line} <=> $b->{line});
}
return ($b->{count} cmp $a->{count})
|| ($fileorder{$a->{filename}} cmp $fileorder{$b->{filename}})
|| ($a->{line} cmp $b->{line});
}
@@ -2780,16 +2813,29 @@ sub final_cleanup {
add_comments("INHERIT: $inherit");
print "INHERIT: $inherit\n";
}
next if ! exists $opt{configfile}{"include.$include"};
add_comments("INCLUDE: $include");
print "INCLUDE: $include\n";
next if ! exists $opt{configfile}{"exclude.$exclude"};
add_comments("EXCLUDE: $exclude");
print "EXCLUDE: $exclude\n";
add_comments("EMAIL: $m");
print "\nEMAIL: $m\n";
for my $include (@{$opt{email_hash}->{$m}->{include}}) {
add_comments("INCLUDE: $include");
print "INCLUDE: $include\n";
}
for my $exclude (@{$opt{email_hash}->{$m}->{exclude}}) {
add_comments("EXCLUDE: $exclude");
print "EXCLUDE: $exclude\n";
}
This works as expected but downside is it parses through the pg_logs every time line EMAIL is mentioned in the cfg file. Please let me know if this can be avoided.
Regards
Email from cron contained this one-liner.
Using tail_n_mail.3.2.0
TNM email was still sent at same timestamp as cron email; the "Matches" line corresponds to error:
Date: Sat Jan 30 08:15:02 2021 CST
Host: XXXX
Unique items: 1
Matches from 20210130.log: 1,329
Thanks.
Hello,
I am using tail_n_mail-1.26.0-1 for my postgres systems. I have two conf files one for emailing and one for paging. I have two cron jobs that run in 10 minute interval one using email.conf and one using oncall.conf.
I have a lot of INCLUDE: and EXCLUDE: in the conf files. But it doesnt seem to be excluding the lines/strings. Can you please tell me what I am doing wrong. Really appreciate your help.
Thanks
Smitha
Here is the sample of the conf file.
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
INCLUDE: SIGHUP
INCLUDE: temporary file: path
INCLUDE: archive command failed
INCLUDE: could not be archived
INCLUDE: transaction ID wrap limit
INCLUDE: must be vacuumed within
INCLUDE: terminated
INCLUDE: Segmentation fault
INCLUDE: is missing
EXCLUDE: database ".+" does not exist
EXCLUDE: INFO:
EXCLUDE: CONTEXT:
EXCLUDE: PL/pgSQL function
EXCLUDE: NOTICE:
EXCLUDE: function inline_code_block
EXCLUDE: skipping
EXCLUDE: CONTEXT: SQL statement
EXCLUDE: trust_for_each_subscriber
EXCLUDE_PREFIX: CONTEXT:
EXCLUDE_PREFIX: NOTICE:
EXCLUDE_PREFIX: PL/pgSQL function
Hello,
In tnm 1.31.3, I used to have these exclude_non_parsed directives in my config file
EXCLUDE_NON_PARSED: (records in|records out)$
EXCLUDE_NON_PARSED: bytes .* copied,
Now, the same directives doesn't work in version 2.3.1
Any thought ?
Regards
following a switch to authenticated flow, tail_n_mail generated errors :
Failed to send mail (from): 5.7.3 STARTTLS is required to send mail
I modified the code to make it work again
1/ No SASL mechanism found
at /usr/share/perl5/vendor_perl/Authen/SASL.pm line 77.
at /usr/share/perl5/Net/SMTP.pm line 165.
=> SOLUTION / comment out the following lines
## Attempt to authenticate
2/ Failed to send mail (from): 5.7.3 STARTTLS is required to send mail
=> SOLUTION / add the following 2 lines
$smtp->starttls(); # make sure to install IO::Socket::SSL
$smtp->auth($mailuser, $mailpass); # make sure to install Authen::SASL
$smtp->starttls(); # make sure to install IO::Socket::SSL
$smtp->auth($mailuser, $mailpass); # make sure to install Authen::SASL
$smtp->mail($from_addr) or die 'Failed to send mail (from): ' . $smtp->message;
$smtp->to($emails) or die 'Failed to send mail (to): ' . $smtp->message;
$smtp->data() or die 'Failed to send mail (data): ' . $smtp->message;
I haven't changed anything for this :
3/ Use of uninitialized value in printf at /usr/bin/tail_n_mail line 3789.
==> of %s is not valued
[1] From D, F, C, I, G, E, L, B, A, J, K, H(between lines 1 of and 2,284 of , occurs 24,677 times) (21,709, 582, 546, 474, 407, 403, 400, 119, 23, 6, 6, 2)
printf '(between lines %s of %s and %s of %s, occurs %s times)',
$earliest->{line}, $fab{ $earliest->{file} },
$latest->{line}, $fab{ $latest->{file} },
$pcount;
Name: john deydier
Email: [email protected]
Hello
I've found a bug in tail_n_mail
send_smtp_email($from_addr, $emails, $newsubject, $emailfile); => 4 parameters
sub send_smtp_email {
## Send email via an authenticated SMTP connection
## For Windows, you will need:
# perl 5.10
# http://cpan.uwinnipeg.ca/PPMPackages/10xx/
# ppm install Net_SSLeay.ppd
# ppm install IO-Socket-SSL.ppd
# ppm install Authen-SASL.ppd
# ppm install Net-SMTP-SSL.ppd
## For non-Windows:
# perl-Net-SMTP-SSL package
# perl-Authen-SASL
my ($from_addr,$emails,$tempfile) = @_; => 3 parameters
ID: 114
Version: unspecified
Date: 2012-09-24 08:35 EDT
Author: Jan Urbański ([email protected])
The code responsible for capturing log lines from system messages does not
escape metacharacters before trying to interpret the constructed string as a
regexp.
The attached patch fixes this problem. A test case for this issue and #12 has
been attached to #12.
There are cases where I want to exclude certain errors depending on the host or app of the client. One common example is that I want to ignore "syntax error" errors if they come from our office or VPN subnet, since those will be developers having fat fingers or guessing at syntax. But if those errors were to come from the data center subnet, then it's probably in app code and we'll want to know.
I confirmed with @turnstep that right now this is not possible, but basically I'd be wanting to AND the EXCLUDE_PREFIX and EXCLUDE options somehow.
Similar to [1], TnM is generating emails on INFO events logged in the PostgreSQL logs by pgBackRest. These events do not have any of the INCLUDE keywords, so I'm wondering if the format of the log entries is triggering emails, because the example below reporting LINE_TRUNCATED.
Attempts with EXCLUDE have not prevented the alerts. Changing the log_line_prefix in postgresql.conf has also been unsuccessful.
For example, note the event does not have a time zone (GMT) with the timestamp as standard Postgres log entries do (2020-07-16 15:00:04 GMT
...), this is an email from tail_n_mail, where there are no error conditions in the log:
[2] (from lines 5 - 14636)
2020-07-16 23:05:33.171 P00 INFO: pushed WAL file '000000010000430A000000E7' to the archive asynchronously
2020-07-16 23:05:33.171 P00 INFO: archive-push command end: completed successfully (102ms)
2020-07-16 23:05:33.924 P00 INFO: archive-push command begin 2.27: [pg_wal/000000010000430A000000E8] --archive-async --archive-timeout=90 --compress-type=lz4 --log-level-console=info --log-level-f
[LINE TRUNCATED, original was 651,208 characters and 3,007 lines long]
tnm.config:
LOG_LINE_PREFIX: %t P%p
MAILSUBJECT: TnM - HOST Postgres errors UNIQUE : NUMBER
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
EXCLUDE: canceling autovacuum
EXCLUDE: INFO: archive-push
EXCLUDE: INFO: pushed
FILE1: /localdata/pgdata11/log/LATEST
LASTFILE1: /localdata/pgdata11/log/postgresql-20200716_145958.log
[1] https://bucardo.org/pipermail/tnm/2018-August/000046.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.