Git Product home page Git Product logo

tail_n_mail's Introduction

Bucardo - a table-based replication system

DESCRIPTION:
------------

This is version 5.6.0 of Bucardo.

COPYRIGHT:
----------

    Copyright (c) 2005-2023 Greg Sabino Mullane <[email protected]>

REQUIREMENTS:
-------------

    build, test, and install Perl 5                (at least 5.8.3)
    build, test, and install PostgreSQL            (at least 8.2)
    build, test, and install the DBI module        (at least 1.51)
    build, test, and install the DBD::Pg module    (at least 2.0.0)
    build, test, and install the DBIx::Safe module (at least 1.2.4)
    
    You must have at least one database that has PL/pgSQL and PL/Perl installed.
    Target databases may need PL/pgSQL.


INSTALLATION:
-------------

To install this module type the following:

   perl Makefile.PL
   make
   make test (but see below first)
   make install

EXAMPLES:
---------

See the test suite in the t/ subdirectory for some examples.

WEBSITE:
--------

Please visit https://bucardo.org for complete documentation.

DEVELOPMENT:
------------

To follow or participate in the development of Bucardo, use:

git clone [email protected]:bucardo/bucardo.git

GETTING HELP:
-------------

For general questions and troubleshooting, please use the [email protected]
mailing list.  GitHub issues which are support-oriented will be closed and referred to
the mailing list anyway, so help save time for everyone by posting there directly.

Post, subscribe, and see previous archives here:

https://bucardo.org/mailman/listinfo/bucardo-general

tail_n_mail's People

Contributors

eggyknap avatar fanyeren avatar jonjensen avatar montgomeryb avatar sfrost avatar stasic avatar torsten-deriv avatar turnstep avatar wulczer avatar zimmski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tail_n_mail's Issues

Use of uninitialized value in hash element at tail_n_mail line 3811

More specifically, these 6 lines came to us in email from cron:

Use of uninitialized value in hash element at tail_n_mail line 3811.
Use of uninitialized value in hash element at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.
Use of uninitialized value in printf at tail_n_mail line 3811.

The logs roll over at midnight. We scan the logs during business hours only so this morning's first run looked at yesterday's log and today's log.

The alert we received by email was missing the filenames in the "From A,B" line, and was missing process and transaction numbers in the first "DETAIL" line from yesterday's log; the second "DETAIL" line in the alert is using transaction and process IDs from the first error instead of the second.

The alert (without the [transaction] to save space) received by email:

[1] From A, B(between lines 81,635 of  and 28 of , occurs 2 times) (1, 1)
First: [A] 2021-06-17 19:20:45 CDT userid@database 21064 domainname(46264)
Last:  [B] 2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282)
ERROR: deadlock detected
DETAIL: Process ? waits for ShareLock on transaction ?; blocked by process ?. Process ? waits for ShareLock on transaction ?; blocked by process ?. Process ?: [transaction]
HINT: See server log for query details.
CONTEXT: while locking tuple(?) in relation "node"
STATEMENT: [transaction]
...
ERROR: deadlock detected
DETAIL: Process 21064 waits for ShareLock on transaction 2512098105; blocked by process 21065. Process 21065 waits for ShareLock on transaction 2512098104; blocked by process 21064. Process 21064: [transaction] Process 21065: [transaction]
HINT: See server log for query details.
CONTEXT: while locking tuple (2259,58) in relation "node"
STATEMENT: [transaction]

The first error from yesterday's log which is presented as the second error:

2021-06-17 19:20:45 CDT userid@database 21064 domanname(46264) ERROR:  deadlock detected
2021-06-17 19:20:45 CDT userid@database 21064 domainname(46264) DETAIL:  Process 21064 waits
 for ShareLock on transaction 2512098105; blocked by process 21065.
	Process 21065 waits for ShareLock on transaction 2512098104; blocked by process 21064.
	Process 21064: [transaction]
	Process 21065: [transaction]

The second error from today's log:

2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282) ERROR:  deadlock detected
2021-06-18 00:20:49 CDT userid@database 13533 domainname(59282) DETAIL:  Process 13533 waits
 for ShareLock on transaction 2512503071; blocked by process 13532.
	Process 13532 waits for ShareLock on transaction 2512503070; blocked by process 13533.
	Process 13533: [transaction]
	Process 13532: [transaction]

Offset not reset when postgresql rotates his logfile

We monitor our pgsql logfiles with tailnmail every hour on the hour. So far so good.
At midnight, pgsql rotates his logfile (log_rotation_age = 1d in postgresql.conf).
What I see this morning in the tailnmail config file is the correct logfile name BUT with a wrong offset.
OFFSET should be 0 as the pgsql logfile is still empty.
It seems like it's the previous one computed in yesterday logfile that is retained after the midnight rotation.

Best regards

Failure to parse log line due to app name having spaces

In my tnm.config.txt:
LOG_LINE_PREFIX: '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h '

However it's catching and reporting lines like these:

[1] (from line 27,630)
?
2017-10-24 20:21:34 CDT [29585]: [1207-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com LOG:  execute S_259: UPDATE quartz_TRIGGERS SET TRIGGER_STATE = $1 WHERE TRIGGER_NAME = $2 AND TRIGGER_GROUP = $3 AND TRIGGER_STATE = $4
[2] (from line 27,631)
?
2017-10-24 20:21:34 CDT [29585]: [1208-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com DETAIL:  parameters: $1 = 'ACQUIRED', $2 = 'Smoke Test Email ReportTrigger', $3 = 'DEFAULT', $4 = 'WAITING'
[3] (from line 27,632)
?
2017-10-24 20:21:34 CDT [29585]: [1209-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com LOG:  execute S_54: INSERT INTO quartz_FIRED_TRIGGERS (ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE, INSTANCE_NAME, FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL, REQUESTS_RECOVERY, PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
[4] (from line 27,633)
?
2017-10-24 20:21:34 CDT [29585]: [1210-1] db=xyzfrance,user=svc_usfood,app=PostgreSQL JDBC Driver,client=xyzabcpapp37.disruptfood.com DETAIL:  parameters: $1 = 'NON_CLUSTERED1508347730715', $2 = 'Smoke Test Email ReportTrigger', $3 = 'DEFAULT', $4 = 'f', $5 = 'NON_CLUSTERED', $6 = '1508894520000', $7 = 'ACQUIRED', $8 = NULL, $9 = NULL, $10 = 'f', $11 = 'f', $12 = '5'

Discussed with @turnstep , he believes it to be due to whitespace in the app %a value "PostgreSQL JDBC Driver" and difficulty with regex. Creating issue to track it.

The script kept running and utilized 100% CPU

I 've installed tail_n_mail on the production servers 1 week ago and it was working fine. I 've put it in a cron that runs at every 30 minute. But today at MIdnight, it kept running and the second instance also started at 00:30. Both were consuming 100% CPU. It was a very critical thing. Can u plz tell me why it happened.

The log file was last updated at 11:30 --

Config file for the tail_n_mail.pl program

This file is automatically updated

Last update: Thu Jan 14 23:30:04 2010

EMAIL: [email protected],[email protected]
PGFORMAT: 3

FILE: /edbdata/data/8.3/dbserver_logs/LATEST
LASTFILE: /edbdata/data/8.3/dbserver_logs/enterprisedb-2010-01-14_000000.log
OFFSET: 10636714
MAXSIZE: 80000000
MAILSUBJECT: EDB1 DBServer errors (FILE)
INCLUDE: PANIC:
INCLUDE: FATAL:
EXCLUDE: database ".+" does not exist

and here is the process running...

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5429 enterpri 25 0 35224 8244 1572 R 100 0.1 51:43.26 perl /opt/app/edb/bin/tail_n_mail.pl /opt/app/edb/bin/tnm.config.txt
12273 enterpri 25 0 34828 7948 1572 R 100 0.0 21:41.33 perl /opt/app/edb/bin/tail_n_mail.pl /opt/app/edb/bin/tnm.config.txt

The log file size was not too big that time..
plz help.

BZ #132: Feature in pgbouncer appends host and port to app name causing parsing failure

Bradley Hzuka 2016-05-12 16:06:31 EDT
Given a tail_n_mail configuration like the following:

LOG_LINE_PREFIX: %t [%p]: [%l-1] user=%u, app=%a, host=%r, xid=%x
EMAIL: [email protected]
MAILSUBJECT: HOST Postgres errors for database

INCLUDE: WARNING:
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
EXCLUDE: LOG:

FILE1: /pg_data/9.4/data/pg_log/LATEST
LASTFILE1: /pg_data/9.4/data/pg_log/postgresql-2016-05-12_000000.log
OFFSET1: 339348180

And a postgresql.log file like the following that has the appended application
name from pgBouncer.

2016-05-12 14:49:01 CDT [21813]: [5645-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.047 ms
2016-05-12 14:49:01 CDT [21813]: [5646-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.074 ms
2016-05-12 14:49:01 CDT [21813]: [5647-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.057 ms
2016-05-12 14:49:01 CDT [21813]: [5648-1] user=user, app=adhoc_psql -
127.0.0.1:1234, host=localhost(44176), xid=438911 LOG: duration: 0.017 ms

Results in all lines being found instead of filtering them out. Note that if
the feature is switched off in pgbouncer, the script works as expected since
the lines occur like the following without an appended IP and port to the
application name:

2016-05-12 14:49:01 CDT [21813]: [5645-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.047 ms
2016-05-12 14:49:01 CDT [21813]: [5646-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.074 ms
2016-05-12 14:49:01 CDT [21813]: [5647-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.057 ms
2016-05-12 14:49:01 CDT [21813]: [5648-1] user=user, app=adhoc_psql,
host=localhost(44176), xid=438911 LOG: duration: 0.017 ms

Since it is nice to know what IP the actual request was originating from
through the PgBouncer it is preferable to have the option turned on which means
the application name could be something like "adhoc_psql" or "adhoc_psql -
127.0.0.1:1234". Is there a way to configure tail_n_mail to handle the
application name with a optional " - 0.0.0.0:000001" (ip and port)? Note that I
also utilize PgBadger and the additional characters added to the application
name do not appear to affect the parsing of the utility.

The output of the resulting email is something similar to the following:
[1] (from line 1,898,186)
?
2016-05-12 14:27:05 CDT [20034]: [1358-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.173 ms

[2] (from line 1,898,187)
?
2016-05-12 14:27:05 CDT [20034]: [1359-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.162 ms

[3] (from line 1,898,188)
?
2016-05-12 14:27:05 CDT [20034]: [1360-1] user=pgbouncer, app=adhoc_psql -
127.0.0.1:52545, host=localhost(43916), xid=0 LOG: duration: 0.157 ms

Config files don't remain static?

Upgrading to 3.1.0 from an old 1.31.x. Found note in 3.0 changes doc that "Config files are now static, and will not get rewritten." Ran 3.1.0 for a few days and fresh config.state files are present for each cluster. Compared config with config.state files to determine what's redundant. In my case, I saw

config: MAILSUBJECT, INHERIT, FILE1, LASTFILE1, OFFSET1
config.state: FILE1, LASTFILE1, ROTATEFILEMODTIME1, OFFSET1

Edited config to only have MAILSUBJECT, INHERIT which should make it good for git inclusion. Ran "select 2/0;" in one cluster to generate error. Reran TNM: no errors, mail received about division by zero error, but the config file was rewritten to as it was before.

Could not create temp file

Hello,

I am using tail_n_mail-1.20.0 on EDB 9.2 version . There is one cron job that executes every 10 minutes.

Getting the following error:
Error in tempfile() using tnmXXXXXXXX.tnm: Could not create temp file tnmbf18kVDR.tnm: Permission denied at /tmp/tail_n_mail.pl line 1692

Also getting mails which has an empty subject and body also no recipients can this mail be related to the above error.
Can you please tell me what I am doing wrong. Really appreciate your help.

Thanks

Log parsing failed when timestamp in postgres log contains miliseconds

Hi,
Any of the existing PGFORMAT is not supporting when the timstamp contains miliseconds.
e.g., My postgres log file look like -
2010-07-01 17:27:49.128 UTC backup us1 216.211.116.219(48668) 11185 FATAL: no pg_hba.conf entry for host "216.209.126.218", user "backup", database "us1", SSL off

But this FATAL could not be catched by tail_n_mail, as timestamp contains miliseconds.

Solution -
Adding one more Regex for date formats worked for me.
--- a/tail_n_mail.pl
+++ b/tail_n_mail.pl
@@ -128,6 +128,7 @@ my %pgpidres = (
2 => qr{.+?\d\d:\d\d:\d\d \w\w\w (\d+)},
3 => qr{.+?\d\d:\d\d:\d\d (\w\w\w)}, ## Fake a PID
4 => qr{.+?[(\d+)]},

  • 5 => qr{(.+?\d\d:\d\d:\d\d.\d\d\d \w\w\w)},
    );

my $pgpidre = $pgpidres{$pgformat};
@@ -138,6 +139,7 @@ my %pgpiddateres = (
2 => qr{(.+?\d\d:\d\d:\d\d \w\w\w)},
3 => qr{(.+?\d\d:\d\d:\d\d \w\w\w)},
4 => qr{(.+? \w\w\w)},

  • 5 => qr{(.+?\d\d:\d\d:\d\d.\d\d\d \w\w\w)},
    );

in tnm_config.txt changed the PGFORMAT = 5 .

Now it works for timestamp with miliseconds also.

Can u please add this to next version.

Tailnmail analyzed old logs after server down for some days

Hello,
for maintenance reasons, our PG linux server was down from Friday October 27 to Monday October 30.

In our postgresql.conf we use : log_rotation_age = 1d
In tail-nmail config we have : FILE1: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces

So after one year activity, we have 31 PG log files in our LOG directory (from <CLUSTER_NAME>-01.log to <CLUSTER_NAME>-31.log) on which PG rotates each night.

When the server went up, this morning (October 30th), the first run of Tailnmail analyzed files <CLUSTER_NAME>-27 to <CLUSTER_NAME>-30 considering they were all from October but due to the weekend shutdown files 28 and 29 were in fact files from last September. Only files -27 and -30 files should have been analyzed.

Here is the debug output of this tnm run :
Opened config file "/u02/pgsql/admin/9.4/<CLUSTER_NAME>/scripts/tnm_pgsql.cfg"
$opt1 = {
'configfile' => {
'log_line_prefix' => 1,
'include' => 1,
'file.HASH(0x1270388)' => 1,
'file' => 1,
'email' => 1,
'[email protected]' => 1,
'exclude.checkpoint complete:' => 1,
'include.FATAL:' => 1,
'exclude.automatic analyze' => 1,
'include.LOG:' => 1,
'exclude.checkpoint starting:' => 1,
'offset' => 1,
'include.ERROR:' => 1,
'lastfile' => 1,
'customsubject' => 1,
'include.PANIC:' => 1,
'exclude.(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg_|seq)|SET |$))' => 1,
'mailsubject' => 1,
'exclude.automatic vacuum' => 1,
'exclude' => 1
},
'lastfile' => {
'1' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log'
},
'log_line_prefix' => ''user=%u,db=%d,'',
'include' => [
'PANIC:',
'FATAL:',
'LOG:',
'ERROR:'
],
'file' => [
{
'lastfile' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log',
'original' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-%d.log',
'suffix' => '1',
'name' => '/u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log',
'offset' => '5870628'
}
],
'customsubject' => 1,
'email' => [
'[email protected]'
],
'mailsubject' => 'HOST (Serveur de PRODUCTION <CLUSTER_NAME>) Postgres errors UNIQUE : NUMBER',
'offset' => {
'1' => '5870628'
},
'exclude' => [
'checkpoint starting:',
'checkpoint complete:',
'automatic analyze',
'automatic vacuum',
'(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg
|seq)|SET |$))'
]
};
$arg1 = {
'verbose' => 1,
'log_line_prefix' => ''user=%u,db=%d,'',
'mailserver' => 'example.com',
'find_line_number' => 1,
'file' => [],
'sortby' => 'count',
'maxsize' => 80000000,
'mailuser' => 'example',
'tsepnosub' => 0,
'debug' => 1,
'duration_limit' => 0,
'sqlstate' => 0,
'dryrun' => 0,
'timewarp' => 0,
'pgmode' => 1,
'nomail' => 0,
'duration' => -1,
'statement_size' => 3000,
'mailport' => 465,
'tsep' => undef,
'flatten' => 1,
'type' => 'normal',
'skip_non_parsed' => 1,
'smtp' => 0,
'tempfile_limit' => 0,
'tempfile' => -1,
'hideflatten' => 1,
'mailcom' => '/usr/sbin/sendmail',
'maxemailsize' => 10000000,
'reset' => 0,
'pretty_query' => 1,
'help' => 0,
'mailmode' => 'sendmail',
'mailpass' => 'example',
'offset' => -1,
'quiet' => 0,
'showonly' => 0,
'version' => 0,
'mailsig' => [],
'mailzero' => 0,
'rewind' => 0,
'pglog' => 'pg'
};
Log line prefix regex: (?-xism:^(()()user=[[\w]],db=[[\w]],)(.*))
Log line prefix regex2: (?-xism:^user=,db=,)
Log line prefix regex3: (?-xism:^user=)
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log Offset: 5,870,628 Size: 5,871,159 Maxsize: 80,000,000
Adding exclusion: checkpoint starting:
Adding exclusion: checkpoint complete:
Adding exclusion: automatic analyze
Adding exclusion: automatic vacuum
Adding exclusion: (execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg
|seq)|SET |$))
Exclusion: (?-xism:checkpoint starting:)|(?-xism:checkpoint complete:)|(?-xism:automatic analyze)|(?-xism:automatic vacuum)|(?-xism:(execute pdo_stmt|statement: (BEGIN|COMMIT|COPY|DEALLOCATE|LOCK TABLE .ACCESS SHARE MODE|SELECT set_limit|(WITH|SELECT) .(cre_struc|del_pvc|pg
|_seq)|SET |$)))
Adding inclusion: PANIC:
Adding inclusion: FATAL:
Adding inclusion: LOG:
Adding inclusion: ERROR:
Inclusion: (?-xism:PANIC:)|(?-xism:FATAL:)|(?-xism:LOG:)|(?-xism:ERROR:)
MATCH at line 44725 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44726 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44727 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44728 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
MATCH at line 44729 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log: 5
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Checking for file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log (last was /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-27.log)
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log Offset: 0 Size: 11,310,774 Maxsize: 80,000,000
MATCH at line 6768 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6769 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6770 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 6771 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 7363 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 8170 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 8344 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
MATCH at line 21717 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-28.log: 67
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log Offset: 0 Size: 6,128,680 Maxsize: 80,000,000
No new lines found in file /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-29.log
Parsing file: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
File: /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log Offset: 0 Size: 554,364 Maxsize: 80,000,000
MATCH at line 955 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 956 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 957 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 958 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 960 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 961 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 995 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1039 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1123 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1175 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 1941 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2043 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2098 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2166 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2300 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2443 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2701 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
MATCH at line 2936 of /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log
Lines found in /u02/pgsql/admin/9.4/<CLUSTER_NAME>/traces/<CLUSTER_NAME>-30.log: 18
Sending mail to: [email protected]
Performing final cleanup
Saving new config file

Regards

Not escaping parentheses

Parentheses are still not getting escaped around log line options and it's causing lines not to be matched. I'd reported this back around version 1.19.3, but it's still happening with the latest version (1.25.1)

For example, this does not match
log_line_prefix = '%t (%r) [%p]: [%l-1] user=%u,db=%d '

But this does
log_line_prefix = '%t [%r] [%p]: [%l-1] user=%u,db=%d '

tail_n_mail hangs on super-long log line

Yesterday I received a high load alert from one of my prod DB servers. I logged in to find over 10 tail_n_mail processes running.

One of my devs sent a query with a super long IN list, the log line was 6,893,029 chars long. His connection was lost (FATAL) so I assume tail_n_mail was trying to grab the statement for the email notification and choked itself into a spin. tail_n_mail was then kicking off every 5 minutes via crontab and each subsequent process was choking in the same place.

I don't think tail_n_mail needs to print the entire statement in an email, and really I wouldn't want to print this one and eat up my inbox space. Not sure if you can grab just a subset of 1,000 or 10,000 chars without having to read the whole thing.

Do not get any mail using tail_n_mail

I have put the tail_n_mail file in /var/temp folder for testing on ubuntu.

When I run below command I get below message.
perl tail_n_mail tnm.config.txt

image

However I do not get any mail or output on commandprompt for any errors.
Did I miss anything ?

cannot handle session id %c in log_line_prefix

For some reason : (\S+) cannot match session ids like : 5b6c0838.1de3c
therefore no ERROR or other configured messages are identified.

By removing %c from postgresql log_line_prefix (and adjusting of course LOG_LINE_PREFIX: in tnm conf file) everything works again.

tail_n_mail searches old logs, but only once per day?

Using version 1.31.3

$ ./tail_n_mail --version
./tail_n_mail version 1.31.3

Configured to just use the latest log file:

FILE1: /var/lib/pgsql/9.6/data/pg_log/LATEST
LASTFILE1: /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-26_000000.log
OFFSET1: 2551862

Scheduled to run every 5 minutes:
*/5 * * * * /var/lib/pgsql/dba/bin/tail_n_mail /var/lib/pgsql/dba/etc/tnm.conf

However, at 00:05 (12:05 AM) every day, I get one report that lists errors in log files from many days ago. For example, the past 4 days, I've gotten alerts from maintenance work that was done on Feb 22:

Total matches: 25
Matches from [A] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_000000.log: 20
Matches from [B] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_101921.log:  3
Matches from [C] /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-02-22_102154.log:  2

I'm curious as to the following:

  • Why am I getting these after that event has long passed and it isn't in the LASTFILE1? Why is tail_n_mail even looking in those 2018-02-22 log files?
  • Why do I get these alerts only at 12:05 AM? tail_n_mail runs every 5 minutes throughout the day, and tnm.conf gets updated every time.

Regardless it seems like a bug. I don't have tail_n_mail scheduled to run from anywhere else.

tail_n_mail does not accept different timezone formats

Hello,

I am using tail_n_mail-1.27.0.-1.sles12.noarch on linux SLES12 SP2.
I configured LOG_LINE_PREFIX equal to log_prefix_line for postgresql logs as '%t %e'.
So tail_n_mail was able to parse all necessary entries starting by timestamp an SQLSTATE type of:
2018-11-21 12:41:32 FET 00000 LOG: worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1

But after I have updated my SLES12 to SP3 tail_n_mail stopped noticing these entries.
2018-11-21 12:41:32 +03 00000 LOG: worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1

As you can see the reason is in different output of timezone. FET was before and +03 is our current format.
As far as I understand postgreSQL uses system default configuration and it is not possible to configure some particular timestamp format within PostgreSQL. So, I reckon, it can be corrected in tail_n_mail.

I see in tail_n_mail template that if log contains timestamp, timezone is supposed to be corresponding to
\w\w\w\w? pattern.
I would suggest changing this part of template by [+-]?\w+

Thank You in advance,
Andrei

Error running tail_n_mail with LOG_LINE_PREFIX set

.tailnmailrc

log_line_prefix='%m appname=%a,user=%u,db=%d,pid=%p,txid=%x,vtxid=%v'

Error when running:
Can't use string ("'%m appname=%a,user=%u,db=%d,pid"...) as a HASH ref while "strict refs" in use at tail_n_mail line 3734.

Handling group alerts and mail to different mailing groups

Hi,

Trying out different ways to handle alerting in Production environment. This is to handle
grouping alerts and mail to different mailing groups ie basically grouping Includes and Excludes for certain Email ids. This way the config1.txt would handle multiple lines of EMAIL and its corresponding Includes/Excludes, even in case a null value is passed for EMAIL.

Following where done to achieve the same:

@@ -404,7 +404,8 @@ my $fileinfo = $opt{file}[$filenumber];
}

We're done parsing the message, send an email if needed

-process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};
+#process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};
+invoke_process_report() if $opt{grand_total} or $arg{mailzero} or $opt{mailzero};

@@ -614,6 +615,8 @@ sub parse_config_file {
## Store locally so we can easily populate %opt at the end
my %localopt;

  • my $mail; # variable to keep the email for each section, parse from EMAIL: (mailid)
  • my %seen_mailid; # to avoid duplicate (mailid + entry)

@@ -629,6 +632,8 @@ sub parse_config_file {
$in_standard_comments = 0;
}

  •    $mail = $1 if (/^EMAIL: (.+)/); # parsing mailid whenever we read a new section
    

@@ -637,7 +642,8 @@ sub parse_config_file {

     ## If the exact same line shows up more than once ignore it.
     ## Failing to do so will confuse the comment hash
  •    if (/^[\w]/ and $seenit{$_}++) {
    
  •    $seen_mailid{$mail}++ if $mail;
    
  •    if (/^[\w]/ and $seenit{$_}++ and !$seen_mailid{$mail}++) {
    

@@ -718,7 +724,11 @@ sub parse_config_file {
}
## Who to send emails to for this file
elsif (/^EMAIL:\s_(.+?)\s_$/) {

  •        push @{$localopt{email}}, $1;
    
  •    #    push @{$localopt{email}}, $1;
    
  •            my $m = $1;
    
  •        $localopt{email_hash}{$m} = {};
    
  •            @{$localopt{include}} = ();
    
  •            @{$localopt{exclude}} = ();
    

@@ -774,7 +784,8 @@ sub parse_config_file {
}
## Which lines to exclude from the report
elsif (/^EXCLUDE:\s_(.+?)\s_$/) {

  •        push @{$localopt{exclude}}, $1;
    
  •        #push @{$localopt{exclude}}, $1;
    
  •        push @{$localopt{email_hash}{$mail}->{exclude}}, $1 if $mail;              # global exclude --> mailid based exclude
    
    
     ## Which lines to include in the report
     elsif (/^INCLUDE:\s*(.+)/) {
    
  •        push @{$localopt{include}}, $1;
    
  •        #push @{$localopt{include}}, $1;
    
  •        push @{$localopt{email_hash}{$mail}->{include}}, $1 if $mail;              # global include --> mailid based include
    

@@ -1981,6 +1993,22 @@ sub process_line {

} ## end of process_line

+# Invoke_process_report
+# invoking process_report subroutine for each sections separately
+# ie: looping through email_hash and get each emailids and corresponding includes and excludes,
+# make a call to the original process_report after overriding global include/exclude with mailid specific include/exclude
+sub invoke_process_report {

  •    foreach my $e(keys %{$opt{email_hash}}) {
    
  •            @{$opt{include}} = @{$opt{email_hash}{$e}->{include}};
    
  •            @{$opt{exclude}} = @{$opt{email_hash}{$e}->{exclude}};
    
  •            $opt{email} = [];
    
  •            push @{$opt{email}}, split(/,/,$e);
    
  •            process_report();
    
  •    }
    
  •    return;
    
    +}

@@ -2414,10 +2447,10 @@ sub lines_of_interest {
|| ($a->{line} <=> $b->{line});
}
elsif ($sorttype eq 'count') {

  •        return ($b->{count} <=> $a->{count})
    
  •                || ($fileorder{$a->{filename}} <=> $fileorder{$b->{filename}})
    
  •                || ($a->{line} <=> $b->{line});
    
  •    }
    
  •                return ($b->{count} cmp $a->{count})
    
  •                        || ($fileorder{$a->{filename}} cmp $fileorder{$b->{filename}})
    
  •                        || ($a->{line} cmp $b->{line});
    
  •            }
    

@@ -2780,16 +2813,29 @@ sub final_cleanup {
add_comments("INHERIT: $inherit");
print "INHERIT: $inherit\n";
}

  • for my $include (@{$opt{include}}) {
  •    next if ! exists $opt{configfile}{"include.$include"};
    
  •    add_comments("INCLUDE: $include");
    
  •    print "INCLUDE: $include\n";
    
  • }
  • for my $exclude (@{$opt{exclude}}) {
  •    next if ! exists $opt{configfile}{"exclude.$exclude"};
    
  •    add_comments("EXCLUDE: $exclude");
    
  •    print "EXCLUDE: $exclude\n";
    
  • }
  • for my $m (keys %{$opt{email_hash}}) {
  •    add_comments("EMAIL: $m");
    
  •    print "\nEMAIL: $m\n";
    
  •    for my $include (@{$opt{email_hash}->{$m}->{include}}) {
    
  •            add_comments("INCLUDE: $include");
    
  •            print "INCLUDE: $include\n";
    
  •    }
    
  •    for my $exclude (@{$opt{email_hash}->{$m}->{exclude}}) {
    
  •            add_comments("EXCLUDE: $exclude");
    
  •            print "EXCLUDE: $exclude\n";
    
  •    }
    
  • }
    +# for my $include (@{$opt{include}}) {
    +# next if ! exists $opt{configfile}{"include.$include"};
    +# add_comments("INCLUDE: $include");
    +# print "INCLUDE: $include\n";
    +# }
    +# for my $exclude (@{$opt{exclude}}) {
    +# next if ! exists $opt{configfile}{"exclude.$exclude"};
    +# add_comments("EXCLUDE: $exclude");
    +# print "EXCLUDE: $exclude\n";
    +# }

This works as expected but downside is it parses through the pg_logs every time line EMAIL is mentioned in the cfg file. Please let me know if this can be avoided.

Regards

Argument "1,329" isn't numeric in printf at tail_n_mail line 3803.

Email from cron contained this one-liner.
Using tail_n_mail.3.2.0
TNM email was still sent at same timestamp as cron email; the "Matches" line corresponds to error:

Date: Sat Jan 30 08:15:02 2021 CST
Host: XXXX
Unique items: 1
Matches from 20210130.log: 1,329

Thanks.

Tail_n_mail not excluding the string(s)

Hello,

I am using tail_n_mail-1.26.0-1 for my postgres systems. I have two conf files one for emailing and one for paging. I have two cron jobs that run in 10 minute interval one using email.conf and one using oncall.conf.

I have a lot of INCLUDE: and EXCLUDE: in the conf files. But it doesnt seem to be excluding the lines/strings. Can you please tell me what I am doing wrong. Really appreciate your help.

Thanks
Smitha

Here is the sample of the conf file.
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
INCLUDE: SIGHUP
INCLUDE: temporary file: path
INCLUDE: archive command failed
INCLUDE: could not be archived
INCLUDE: transaction ID wrap limit
INCLUDE: must be vacuumed within
INCLUDE: terminated
INCLUDE: Segmentation fault
INCLUDE: is missing
EXCLUDE: database ".+" does not exist
EXCLUDE: INFO:
EXCLUDE: CONTEXT:
EXCLUDE: PL/pgSQL function
EXCLUDE: NOTICE:
EXCLUDE: function inline_code_block
EXCLUDE: skipping
EXCLUDE: CONTEXT: SQL statement
EXCLUDE: trust_for_each_subscriber
EXCLUDE_PREFIX: CONTEXT:
EXCLUDE_PREFIX: NOTICE:
EXCLUDE_PREFIX: PL/pgSQL function

EXCLUDE_NON_PARSED doesn't work anymore

Hello,

In tnm 1.31.3, I used to have these exclude_non_parsed directives in my config file
EXCLUDE_NON_PARSED: (records in|records out)$
EXCLUDE_NON_PARSED: bytes .* copied,

Now, the same directives doesn't work in version 2.3.1

Any thought ?

Regards

with use authenticated flow, tail_n_mail generates errors

following a switch to authenticated flow, tail_n_mail generated errors :
Failed to send mail (from): 5.7.3 STARTTLS is required to send mail

I modified the code to make it work again

1/ No SASL mechanism found
at /usr/share/perl5/vendor_perl/Authen/SASL.pm line 77.
at /usr/share/perl5/Net/SMTP.pm line 165.
=> SOLUTION / comment out the following lines
## Attempt to authenticate

if (not $smtp->auth($mailuser, $mailpass)) {

die 'Failed to authenticate to mail server: ' . $smtp->message;

}

2/ Failed to send mail (from): 5.7.3 STARTTLS is required to send mail
=> SOLUTION / add the following 2 lines
$smtp->starttls(); # make sure to install IO::Socket::SSL
$smtp->auth($mailuser, $mailpass); # make sure to install Authen::SASL

Prepare to send the message

$smtp->starttls(); # make sure to install IO::Socket::SSL
$smtp->auth($mailuser, $mailpass); # make sure to install Authen::SASL
$smtp->mail($from_addr) or die 'Failed to send mail (from): ' . $smtp->message;
$smtp->to($emails)      or die 'Failed to send mail (to): '   . $smtp->message;
$smtp->data()           or die 'Failed to send mail (data): ' . $smtp->message;

I haven't changed anything for this :
3/ Use of uninitialized value in printf at /usr/bin/tail_n_mail line 3789.
==> of %s is not valued
[1] From D, F, C, I, G, E, L, B, A, J, K, H(between lines 1 of and 2,284 of , occurs 24,677 times) (21,709, 582, 546, 474, 407, 403, 400, 119, 23, 6, 6, 2)

printf '(between lines %s of %s and %s of %s, occurs %s times)',
$earliest->{line}, $fab{ $earliest->{file} },
$latest->{line}, $fab{ $latest->{file} },
$pcount;

send_smtp_email() parameters bug

Name: john deydier
Email: [email protected]

Hello
I've found a bug in tail_n_mail

send_smtp_email($from_addr, $emails, $newsubject, $emailfile); => 4 parameters


sub send_smtp_email {

    ## Send email via an authenticated SMTP connection

    ## For Windows, you will need:
    # perl 5.10
    # http://cpan.uwinnipeg.ca/PPMPackages/10xx/
    # ppm install Net_SSLeay.ppd
    # ppm install IO-Socket-SSL.ppd
    # ppm install Authen-SASL.ppd
    # ppm install Net-SMTP-SSL.ppd

    ## For non-Windows:
    # perl-Net-SMTP-SSL package
    # perl-Authen-SASL

    my ($from_addr,$emails,$tempfile) = @_;  => 3 parameters

log_line_prefix starting with a bracket causes an error

ID: 114
Version: unspecified
Date: 2012-09-24 08:35 EDT
Author: Jan Urbański ([email protected])


The code responsible for capturing log lines from system messages does not
escape metacharacters before trying to interpret the constructed string as a
regexp.

The attached patch fixes this problem. A test case for this issue and #12 has
been attached to #12.

Option to combine EXCLUDE_PREFIX and EXCLUDE

There are cases where I want to exclude certain errors depending on the host or app of the client. One common example is that I want to ignore "syntax error" errors if they come from our office or VPN subnet, since those will be developers having fat fingers or guessing at syntax. But if those errors were to come from the data center subnet, then it's probably in app code and we'll want to know.

I confirmed with @turnstep that right now this is not possible, but basically I'd be wanting to AND the EXCLUDE_PREFIX and EXCLUDE options somehow.

False alarms from pgBackRest events

Similar to [1], TnM is generating emails on INFO events logged in the PostgreSQL logs by pgBackRest. These events do not have any of the INCLUDE keywords, so I'm wondering if the format of the log entries is triggering emails, because the example below reporting LINE_TRUNCATED.

Attempts with EXCLUDE have not prevented the alerts. Changing the log_line_prefix in postgresql.conf has also been unsuccessful.

For example, note the event does not have a time zone (GMT) with the timestamp as standard Postgres log entries do (2020-07-16 15:00:04 GMT ...), this is an email from tail_n_mail, where there are no error conditions in the log:

[2] (from lines 5 - 14636)
2020-07-16 23:05:33.171 P00   INFO: pushed WAL file '000000010000430A000000E7' to the archive asynchronously
2020-07-16 23:05:33.171 P00   INFO: archive-push command end: completed successfully (102ms)
2020-07-16 23:05:33.924 P00   INFO: archive-push command begin 2.27: [pg_wal/000000010000430A000000E8] --archive-async --archive-timeout=90 --compress-type=lz4 --log-level-console=info --log-level-f
[LINE TRUNCATED, original was 651,208 characters and 3,007 lines long]

tnm.config:

LOG_LINE_PREFIX: %t P%p
MAILSUBJECT: TnM - HOST Postgres errors UNIQUE : NUMBER

INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
EXCLUDE: canceling autovacuum
EXCLUDE: INFO: archive-push
EXCLUDE: INFO: pushed

FILE1: /localdata/pgdata11/log/LATEST
LASTFILE1: /localdata/pgdata11/log/postgresql-20200716_145958.log

[1] https://bucardo.org/pipermail/tnm/2018-August/000046.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.