Git Product home page Git Product logo

splunk-telegram's Introduction

Splunk Telegram

This app lets you run Splunk against messages from Telegram groups and generate graphs based on the activity in them.

Splunk Telegram includes a Natural Language Processing (NLP) module which lets you extract things like sentiment, Named Entities, etc.

This app is based on my other app, Splunk Lab, which is a generate Splunk platform build for ingesting data on an ad-hoc basis. You should check it out!

Screenshots

Requirements

  • Docker
  • HTML exports from a Telegram conversation, channel, or group.
    • Exporting is explained further below

Usage

  • First step is to convert Telegram's HTML into JSON that Splunk can understand:
    • bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-telegram/master/1-telegram-html-to-json.sh) path/to/telegram-export/messages\*.html > logs/Group-Name.json
  • Then, run Splunk:
    • SPLUNK_START_ARGS=--accept-license bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-telegram/master/2-start-splunk.sh)
    • You'll be presented with a list of options to confirm, change your environment variables if you like and re-run, otherwise press ENTER to launch Splunk.

By default, Splunk will be listening at https://localhost:8000/.

Exporting Data From Telegram

Telegram has a blog post which explains how to export data over here. However, if you follow those instructions, everything will be exported, a process which will take hours and hours. Instead, we recommend that you export a single channel, group, or conversation at a time. This can be done in the Telegram Desktop App by going into the converstaion or group and manually exporting it:

This will save the converstaion in Telegram's own HTML format, which we can then parse to extract messages.

Licensing

Splunk has its own license. Please abide by it.

The Docker image ships with the NLP Text Analytics app, which is licensed under the MIT License.

TODO/Bugs

  • Only regular messages are supported at this time. If a photo or sticker is found, a note will be made that it was a photo of a specified size. No other media types (including stickers) are supported at this time.
  • Forwarded messages are not counted/supported at this time.
  • Messages that are imported must be in the current directory because of how Docker mounts directories
    • I may revisit this in the future and instead take a directory as a value to Docker's -v argument.
  • I need to add Development instructions and possibly revisit that

Contact

My email is [email protected]. I am also @dmuth on Twitter and Facebook!

splunk-telegram's People

Contributors

dmuth avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.