Git Product home page Git Product logo

telegram-stats-bot's Introduction

telegram-stats-bot

PyPi Package Version

Supported Python versions

GitHub Commit Activity

LGPLv3 License

Telegram-stats-bot is a simple bot that lives in your Telegram group, logging messages to a Postgresql database and serving statistical tables and plots to users as Telegram messages.

Note: Version 0.8.0 adds a number of behind the scenes improvements and dependency bumps in preparation for a version 1.0 release. Stay tuned for a new interface using the inline keyboard bot functionality!

Bot conversation example

Table of contents

Introduction

This software is intended to be run on a server, handling updates for a bot user with a single bot per channel (multi-channel support could be added at some point if there is interest), using the excellent Python-telegram-bot library.

The bot is still in active development but at the moment, it features:

  • Message logging to Postgresql database with optional JSON file backup
  • Statistics output for users in group as Telegram messages, with optional filtering by date or limiting to the querying user. Some statistics are more useful than others, but they are mainly intended to be fun for users to play with.
    • Tables:
      • Most active users
      • A user's message time correlation with other users
      • A user's median message time difference with other users
    • Plots:
      • Message activity by hour of day
      • Message activity by day of week
      • Message activity over the week by hour and day
      • Message activity history

Basic Requirements

  • Python 3.8+
  • A Telegram bot token with privacy mode disabled (needed to log messages)
    • See here for details
  • Postgresql (Tested with 12.3, but there shouldn't be anything that won't work with 9.4 or up)
    • This can be on a different system than telegram-stats-bot and requires either table creation permissions on a database or database can be pre-initialized following the setup in db.py

Installation

The easiest way to install or upgrade is with pip:

$ pip install telegram-stats-bot --upgrade

This works directly from the git repository as well:

$ pip install --upgrade git+https://github.com/mkdryden/telegram-stats-bot

Or you can install an entire venv for development using poetry:

$ git clone https://github.com/mkdryden/telegram-stats-bot.git
$ cd telegram-stats-bot
$ poetry install

If you want to be able to run the unit tests, you must install the test dependencies as well, and postgresql must be available in your PATH:

$ poetry install --with test

Docker

A Docker image is available under mkdryden/telegram-stats-bot and a sample docker-compose.yml is in the root of the repository including database setup. Be sure to set the TZ, BOT_TOKEN, and CHAT_ID environment variable appropriately in your docker-run command or the docker-compose.yml file.

Setup

Once installed, you can run the bot by calling the main module with a few required arguments:

$ python -m telegram_stats_bot.main BOT_TOKEN CHAT_ID POSTGRESQL_URL
  • BOT_TOKEN: Your bot's token e.g., 110201543:AAHdqTcvCH1vGWJxfSeofSAs0K5PALDsaw
  • CHAT_ID: The chat id to monitor (will be a large integer, possibly negative, if unknown, set to 0 and see below)
  • POSTGRESQL_URL: Connection information in the form: postgresql://USERNAME:PASSWORD@ADDRESS/DB_NAME
    • if DB_NAME exists, there must not be tables called messages_utc, user_events, or user_names with incorrect columns

Two optional arguments exist as well:

  • json-path: Specifying a path here will log messages to json files in addition to the database. If only a prefix is specified, they will be saved under that prefix in your platform's preferred app data directory. This was mostly for development purposes and is not necessary in normal use.
  • tz: Specify a tz database time zone string here (e.g., America/New_York) to return statistics queries in this time zone. (Defaults to Etc./UTC)

A complete command might look like:

$ python -m telegram_stats_bot.main --tz="America/Toronto" "110201543:AAHdqTcvCH1vGWJxfSeofSAs0K5PALDsaw" "postgresql://telegram:CoolPassword@localhost/telegram_bot"

On startup, the bot will attempt to create the database and tables, if they do not already exist. If you do not know the chat's id and have set it to 0 as mentioned above, you can send the /chatid command inside the group, and the bot will reply with it, then restart the bot with the id. If you have forgotten to disable privacy mode, an error will be logged in the terminal.

The bot will now log all messages in the group, but will only respond to users who have sent a message that has been logged previously (and this list is only updated once an hour, so if you're impatient, you can restart the bot after you've sent a message to trigger the update). You can see if messages are being logged correctly by reviewing the terminal output. You should see a line like 2020-06-04 02:08:39,212 - __main__ - INFO - 8, whenever a message is logged.

Importing Data

Data can be imported from JSON dumps from the desktop client. Hit the three dot button from inside the desired group and select "Export chat history". Make sure you select JSON as the output format. You can also limit the date, as desired. The database will be updated and existing messages will remain, so you can use this feature to fill in gaps when the bot was not running.

To import data, simply call:

$ python -m telegram_stats_bot.json_dump_parser "/some/path/to/dump.json" "postgresql://telegram:CoolPassword@localhost/telegram_bot" --tz="America/Toronto"

Where the first argument is the path to the json dump, the second is the db connection string, as above, and the optional tz argument should be the time zone of the system used to dump the json.

This can be run without stopping a running bot, though it also attempts to set the user id to user name mapping, so will add an extra entry to every user in the dump (this currently only affects the user stats related to user name changes). Before you run this, make sure your db string is correct or you might accidentally mess up other databases on the same server.

Fetching Stats

To fetch stats, simply message the bot, either inside the group being logged, or in a direct message, using the /stats command. /stats with no arguments prints the table of most active users and other statistics are available through various subcommands. All commands are documented and the built in help can be displayed with /stats -h or stats <subcommand> -h.

Most commands have optional arguments that change the behaviour of the output. Nearly all have:

  • -start and -end followed by a timestamp (e.g., 2019, 2019-01, 2019-01-01, "2019-01-01 14:21") specify the range of data to fetch, otherwise all available data will be used. Either or both options can be given.
  • -lquery followed by a lexical query (using Postgres' tsquery syntax) limits results to matching messages.
  • -me calculates statistics for the user sending the command, rather than all chat users.

Sample outputs of each available subcommand follow.

counts

/stats counts returns a list of the most active users in the group.

User  Total Messages  Percent

@ACoolUser 42150 7.0 @NumberOne 37370 6.2

@WinstonChurchill 32668 5.4

@AAAAAAA 32134 5.4

@WhereAreMyManners 30481 5.1 @TheWorstOfTheBest 28705 4.8

count-dist

/stats count-dist returns an ECDF plot of the users in the group by message count.

Example of count-dist plot

hours

/stats hours returns a plot of message frequency for the hours of the day.

Example of hours plot

days

/stats days returns a plot of message frequency for the days of the week.

Example of days plot

week

/stats week returns a plot of total messages over the data period by day of week and hour of day.

Example of week plot

history

/stats history returns a plot of messages versus date.

Example of history plot

titles

/stats titles returns a plot of group titles over time.

Example of title history plot

user

/stats user returns basic statistics for the user.

Messages sent: 16711
Average messages per day: 12.31
First message was 1357.22 days ago.
Usernames on record: 3
Average username lifetime: 452.41 days

joined on 2017-10-01 16:11:08-04:00

corr

/stats corr returns a list of users with the highest and lowest message time correlations with the requesting user.

User Correlations for @TheManWhoWasThursday
HIGHEST CORRELATION:
@MyGoodFriend         0.335
@Rawr                 0.302
@MangesUnePoutine     0.284
@GreenBlood           0.251
@TooMuchVacuum        0.235

LOWEST CORRELATION:
@Shiny                0.146
@BlueDog              0.142
@CoolCat              0.122
@EatMe                0.116
@JustPassingBy        0.106

delta

/stats delta returns a list of users with the shortest differences in message times with the requesting user.

Median message delays for @KingLeer and:
@PolyamorousPasta     00:03:23
@AggressiveArgon      00:04:43
@AdjectiveNoun        00:08:27
@SuperSalad           00:09:05
@ABoredProgrammer     00:09:06

types

/stats types returns a table of messages by type, comparing the requesting user with the full group.

Messages by type, @AUser vs group:
      type  Group Count  Group Percent  User Count  User Percent
      text     528813.0           88.3     13929.0          83.4
   sticker      34621.0            5.8      1226.0           7.3
     photo      25995.0            4.3      1208.0           7.2
 animation       6983.0            1.2       274.0           1.6
     video       1325.0            0.2        48.0           0.3
     voice        475.0            0.1         2.0           0.0
  location        252.0            0.0         2.0           0.0
video_note         84.0            0.0         1.0           0.0
     audio         62.0            0.0         1.0           0.0
      poll         29.0            0.0         1.0           0.0
  document          1.0            0.0         1.0           0.0
     Total     598640.0          100.0     16693.0         100.0

words

/stats words returns a table of the most commonly used lexemes

Most frequently used lexemes:
    Lexeme  Messages  Uses
      like      1265  1334
      well       753   765
    actual       628   645
      make       600   619
      yeah       609   609
      mean       544   553
     thing       473   490
    realli       472   482
    though       467   470
     peopl       415   445
     think       425   433
      know       403   409
      need       396   408
      time       371   389
      want       354   371
     would       345   366
      much       345   357
   probabl       348   356
      even       331   338
     stuff       318   332

random

/stats random prints a random message from the database.

The Future

Telegram-stats-bot is a work in progress. New stats will be added, but no guarantees that the database structure will stay constant if Telegram's message structure changes or I need to change something to make a new statistic work.

License

Telegram-stats-bot is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License v3.0 or later. Derivative works must also be redistributed under the GPL v3 or later.

telegram-stats-bot's People

Contributors

dependabot[bot] avatar dinosaurtirex avatar mkdryden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

telegram-stats-bot's Issues

Titles plot fails for some image dimensions

Apparently /stats titles fails for chats with a small number of titles, resulting in a weird shaped image:

telegram-stats-bot_1  | Traceback (most recent call last):
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/ext/utils/promise.py", line 96, in run
telegram-stats-bot_1  |     self._result = self.pooled_function(*self.args, **self.kwargs)
telegram-stats-bot_1  |   File "/usr/src/app/telegram_stats_bot/main.py", line 173, in print_stats
telegram-stats-bot_1  |     context.bot.send_photo(chat_id=update.effective_chat.id, photo=image)
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 134, in decorator
telegram-stats-bot_1  |     result = func(*args, **kwargs)
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 753, in send_photo
telegram-stats-bot_1  |     return self._message(  # type: ignore[return-value]
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/ext/extbot.py", line 203, in _message
telegram-stats-bot_1  |     result = super()._message(
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 344, in _message
telegram-stats-bot_1  |     result = self._post(endpoint, data, timeout=timeout, api_kwargs=api_kwargs)
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 299, in _post
telegram-stats-bot_1  |     return self.request.post(
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/utils/request.py", line 359, in post
telegram-stats-bot_1  |     result = self._request_wrapper('POST', url, fields=data, **urlopen_kwargs)
telegram-stats-bot_1  |   File "/usr/local/lib/python3.9/site-packages/telegram/utils/request.py", line 279, in _request_wrapper
telegram-stats-bot_1  |     raise BadRequest(message)
telegram-stats-bot_1  | telegram.error.BadRequest: Photo_invalid_dimensions

json import: ignore bot messages

Currently import script includes messages sent by the bot that are ignored during normal operation. Should include an argument to exclude these.

ImportError: cannot import name 'Animation' from 'telegram'

After running your bot like for over 7months without problem i cant start it anymore. I literally changed nothing.

I reinstalled like stated in readme, i even did the poetry way after i couldnt get it fixed, but nothing.

On Python 3.8:

ImportError: cannot import name 'TelegramError' from 'telegram' (....

on Python 3.9:

ImportError: cannot import name 'Animation' from 'telegram' (...

Feature Request: Allow for multiple chats

Most chat owners that would find this useful have multiple chats. i personally have a docker-compose stack of 38 of these containers running to gather statistics on a whole slew of chats i'm interested in, admin, etc. it'd be useful for the bot to support multiple chats in some manner

Somehow some words are changed in /stats words

Hey im using the latest version of your awesome bot on Python 3.8.4 on a raspberry pi 3b+ with an extern postgres12. So far the bot is working well since yesterday and stats are collected. what ive noticed that the output of /stats words has some weird results. Ill give two examples:

Text written in Chat: beautiful
Stats output: beauti

or
Text written in Chat: widgy
Stats output: widgi

shorten usernames in /stats view (or display the displayname instead)?

Hey, sorry to bother you, but i love your bot so much lol.

So usually the /stats output isnt that good readable on mobile phones. Basically because long @usernames will make it look odd

photo_2023-04-18_10-33-17

in most cases the displayname from users is a bit shorter (just a nickname) so would it be possible to show that instead. and maybe add a max limit of .. idk .. 15 letters and then it shortens the name with [...] or sth. i dont mind changing the code by myself, as you might have noticed i also edited the column headers a bit to shorten them but this one long username messes with the look. And i am not that pro at coding.

Have a nice day! Thanks for the work on this one.

Not working when connecting to my cloud db

python -m telegram_stats_bot.main BOT_TOKEN CHAT_ID POSTGRESQL_URL

When i run this command to connect my local machine it connects and works, but when connecting to my cloud db it just connected and doesn't respond to commands or doesn't read messages..

Usernames issue (not updating on import causing duplicated userID entries)

Hello @mkdryden

After importing the chat history the usernames are not updated by UserID accordingly, causing the stats to show not correctly (i.e. with empty fields).
Telegram 2023-10-24 at 20 40 43@2x

When the user appears in chat, and writes some messages the username gets updated (by UserID), but this action is leading to user record duplication
TablePlus 2023-10-24 at 21 55 37@2x

Here is stdout in container when running /stats corr
Safari 2023-10-24 at 21 57 26@2x

Why is this program that enjoyable?

Hello, I found some issue here. When I added this bot to my chat, I feel like this program is too good.

To replicate

  1. Add this bot for you chat
  2. Enjoy

I just wanted to send a letter of appreciation. I like this bot and I'm using it for ethical collection of data. Thanks for your open-source program!

/stats user doesnt work no more

Hey, am running the latest docker image. Unfortunately the /stats user command is broken rn.

This is what the Error log looks like:

2023-04-17 17:11:45,690 - telegram.ext.dispatcher - ERROR - No error handlers are registered, logging exception.
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/telegram/ext/utils/promise.py", line 96, in run
self._result = self.pooled_function(*self.args, **self.kwargs)
File "/usr/src/app/telegram_stats_bot/main.py", line 169, in print_stats
context.bot.send_message(chat_id=update.effective_chat.id,
File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 134, in decorator
result = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 534, in send_message
return self._message( # type: ignore[return-value]
File "/usr/local/lib/python3.9/site-packages/telegram/ext/extbot.py", line 203, in _message
result = super()._message(
File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 344, in _message
result = self._post(endpoint, data, timeout=timeout, api_kwargs=api_kwargs)
File "/usr/local/lib/python3.9/site-packages/telegram/bot.py", line 299, in _post
return self.request.post(
File "/usr/local/lib/python3.9/site-packages/telegram/utils/request.py", line 361, in post
result = self._request_wrapper(
File "/usr/local/lib/python3.9/site-packages/telegram/utils/request.py", line 279, in _request_wrapper
raise BadRequest(message)
telegram.error.BadRequest: Can't parse entities: can't find end of italic entity at byte offset 9

Support to non-ascii-safe names

Hi, I have started using this this bot in a personal group of friends chat and we've noticed it completely mangles the name of one of them. His name happens to contain a ç and that character was completely removed from his logged entry.

I have investigated the code and I've stumbled upon the following line:

df['User'] = df['User'].str.replace(r'[^\x00-\x7F]|[@]', "", regex=True)  # Drop emoji and @

Which basically states it's dropping emoji and the @ symbol. Not sure why that's even necessary but it's doing way more than just dropping emojis, it's dropping everything that's outside ASCII range. So no latin-alphabet extensions like é ü ø æ, and no support at all for non-latin scripts like cyrillic, greek, arabic, chinese, etc...

Is there a particular reason for this line to exist?
Python and Postgres should support UTF8 just fine...

I GET THIS ERROR WHILE CREATING A TELEGRAM BOT. SOMEONE PLEASE HELP ME 😭

raceback (most recent call last):
File "c:\Users\ayush\OneDrive\Desktop\tele\tele.py", line 2, in
from telegram.ext import *
File "C:\Users\ayush\AppData\Local\Programs\Python\Python311\Lib\site-packages\telegram\ext_init_.py", line 21, in
from .extbot import ExtBot
File "C:\Users\ayush\AppData\Local\Programs\Python\Python311\Lib\site-packages\telegram\ext\extbot.py", line 24, in
import telegram.bot
File "C:\Users\ayush\AppData\Local\Programs\Python\Python311\Lib\site-packages\telegram\bot.py", line 56, in
from telegram import (
ImportError: cannot import name 'Animation' from 'telegram' (C:\Users\ayush\AppData\Local\Programs\Python\Python311\Lib\site-packages\telegram_init_.py)

Question: how can i use this for multiple chats?

i'm using the docker version and its great for my community's general chat, but we have like ~30+ different topic chats and i need to get statistics on a few of them at the same time so i can better plan community events and stuff by seeing times people are active etc... how can i do this? should i list the CHAT_IDs? like -100XXXXXXX, -100XXXXXXX or should i copypaste the docker compose to run two bots on the same database?

Importing historical stats

Hi,

Nice application, seems really cool.
For me to use it though, I'd really need some ability to import past stats. I'm not seeing support for this being documented, so I'm assuming it's not built in. Could I manually do it somehow though?

2 bugs

Both errors happens on start of tg bot.

First:

2023-11-16 21:36:28,668 - telegram.ext.Application - ERROR - No error handlers are registered, logging exception.
Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\raveon-tg-bot\venv\lib\site-packages\telegram\ext\_application.py", line 1195, in process_update
    await coroutine
  File "C:\Users\Administrator\Desktop\raveon-tg-bot\venv\lib\site-packages\telegram\ext\_basehandler.py", line 153, in handle_update
    return await self.callback(update, context)
  File "C:\Users\Administrator\Desktop\raveon-tg-bot\venv\lib\site-packages\telegram_stats_bot\main.py", line 80, in log_message
    bak_store.append_data('user_events', i)
AttributeError: 'NoneType' object has no attribute 'append_data'

Second:

sqlalchemy.exc.ProgrammingError: (psycopg.errors.SyntaxError) cannot insert multiple commands into a prepared statement
[SQL:
                UPDATE user_names
                SET username = %(username)s
                WHERE user_id = %(uid)s AND username IS DISTINCT FROM %(username)s;


                         INSERT INTO user_names(user_id, date, username, display_name)
                             VALUES (%(uid)s, current_timestamp, %(username)s, %(display_name)s);
                         ]
[parameters: {'username': '@fokioff', 'uid': 829706162, 'display_name': 'pavlik'}]
(Background on this error at: https://sqlalche.me/e/20/f405)

import script causes db issues

The import script introduced in b8b8eff (see #10) seems to be changing the types of columns in the db.
I think this has to do with using pandas to do the update.
Specifically it is breaking the user command because it is changing the user_id column to text from integer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.