Git Product home page Git Product logo

slack-archive-bot's Introduction

slack-archive-bot

A bot that can search your slack message history. Makes it possible to search further back than 10,000 messages.

Requirements

  1. Permission to install new apps to your Slack workspace.
  2. python3
  3. A publicly accessible URL to serve the bot from. (Slack recommends using ngrok to get around this.)

Installation

  1. Clone this repo.

  2. Install the requirements:

     pip install -r requirements.txt
    
  3. If you want to include your existing slack messages, export your team's slack history. Download the archive and export it to a directory. Then run import.py on the directory. For example:

     python import.py export
    

    This will create a file slack.sqlite.

  4. Create a new Slack app.

  • Add the following bot token oauth scopes and install it to your workspace:

    • channels:history
    • channels:read
    • chat:write
    • groups:history (if you want to archive/search private channels)
    • groups:read (if you want to archive/search private channels)
    • im:history
    • users:read
  1. Start slack-archive-bot with:

     SLACK_BOT_TOKEN=<BOT_TOKEN> SLACK_SIGNING_SECRET=<SIGNING_SECRET> python archivebot.py
    

Where SIGNING_SECRET is the "Signing Secret" from your app's "Basic Information" page and BOT_TOKEN is the "Bot User OAuth Access Token" from the app's "OAuth & Permissions" page.

Use python archivebot.py -h for a list of all command line options.

  1. Go to the app's "Event Subscriptions" page and add the url to where slack-archive-bot is being served. The default port is 3333. (i.e. http://<ip>:3333/slack/events)
  • Then add the following bot events:

    • channel_rename
    • group_rename (if you want to archive/search private channels)
    • member_joined_channel
    • member_left_channel
    • message.channels
    • message.groups (if you want to archive/search private channels)
    • message.im
    • user_change

Deploying Production Server Using WSGI

By default when you run python archivebot.py it will launch a development server. But they don't recommend using it in production. The following is an example of using Flask and Gunicorn to deploy slack-archive-bot, but it should work equally well with any other WSGI server.

  1. pip install flask gunicorn
  2. SLACK_BOT_TOKEN=<BOT_TOKEN> SLACK_SIGNING_SECRET=<SIGNING_SECRET> gunicorn flask_app:flask_app -c gunicorn_conf.py <other gunicorn args>
  3. flask_app.py provides a thin wrapper around archivebot.app using slack_bolt.adapter.flask.SlackRequestHandler. There are many other adapters provided by bolt. To use them, simply from archivebot import app and wrap app.
  4. gunicorn_conf.py ensures that the local database is updated when the server is started, but that it's not run for each worker.
  5. You can use ARCHIVE_BOT_LOG_LEVEL and ARCHIVE_BOT_DATABASE_PATH to configure slack-archive-bot while running it via gunicorn.

Archiving New Messages

When running, ArchiveBot will continue to archive new messages for any channel it is invited to. To add the bot to your channels:

    /invite @ArchiveBot

If @ArchiveBot is the name you gave your bot user.

Searching

To search the archive, direct message (DM) @ArchiveBot with the search query. For example, sending the word "pizza" will return the first 10 messages that contain the word "pizza". There are a number of parameters that can be provided to the query. The full usage is:

    <query> from:<user> in:<channel> sort:asc|desc limit:<number>

    query: The text to search for.
    user: If you want to limit the search to one user, the username.
    channel: If you want to limit the search to one channel, the channel name.
    sort: Either asc if you want to search starting with the oldest messages,
        or desc if you want to start from the newest. Default asc.
    limit: The number of responses to return. Default 10.

Migrating from slack-archive-bot v0.1

slack-archive-bot v0.1 used the legacy Slack API which Slack ended support for in February 2021. To migrate to the new version:

  • Follow the installation steps above to create a new slack app with all of the required permissions and event subscriptions.
  • The biggest change in requirements with the new version is the move from the Real Time Messaging API to the Events API which necessitates having a publicly-accessible url that Slack can send events to. If you are unable to serve a public endpoint, you can use ngrok.

Contributing

Contributions are more than welcome. From bugs to new features. I threw this together to meet my team's needs, but there's plenty I've overlooked.

License

Code released under the MIT license.

slack-archive-bot's People

Contributors

brainrecursion avatar deluks avatar docmarionum1 avatar gpkyte avatar l3oxy avatar rjbergerud avatar seanbeaton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slack-archive-bot's Issues

Query results are limited and I can't figure out why

I've been trying to figure this out for a few months now in my spare time, and whatever's going on isn't something I'm seeing my way through.

The command

in:a11y sort:desc limit:750

should search the #a11y channel in reverse chronological order and return the most recent 750 messages. As you can see from the attached, it hasn’t even returned 100.

I’ve checked the database and run the exact same query, and I know there are quite a few more results, dating back for years.

I ran the same query but in ascending order and with a ridiculously high number:

in:a11y limit:50000

and received back slightly more results—about 124.

I have to admit, I’m puzzled!

I’ve attached PDFs of what happened with the two queries above.

mr-atoz-descending-dates.pdf
mr-atoz-ascending-dates.pdf

I originally assumed I had a setting wrong with the DigitalOcean app itself, but their support staff didn't seem to think so:


From: [email protected]

From what we see, in general there are no hard limits that we impose on your App. The data you shared doesn't seem to be related to any limitation in-app platform. There are no limitations from our side on the requests. The only limitations are based on tier which includes the outbound data transfer.

After reviewing your app, we did notice that you are using a basic tier. There are few features that are provided in basic tier. Refer to below link for more insights:

https://docs.digitalocean.com/products/app-platform/#feature-comparison-by-tier

We did check your app's hardware metrics and all seems to be well under the control without any issues!


The database is updating just fine. Everything goes in, but not everything comes back out.

I thought maybe it was cutting off at a number of characters, a number of lines, a number of words, or a number of bytes, but none of that seems to be the case.

I am truly, genuinely stumped. Has anyone else encountered this kind of odd behavior, or (better yet) do you see what painfully obvious thing I'm either missing or doing wrong?

Perplexedly,

Al

Looking for New Maintainers

Is there anyone who would be interested in becoming a maintainer of slack archive bot?

I'm not using Slack anymore in my personal life, so I'm looking to roll back the amount of time I devote to this. I can keep fielding pull requests, but don't have as much time for new feature requests.

Update for Python 3 and slackclient 2

See slackapi/python-slack-sdk#183 (comment).

Oh, it seems you're using the old version of the package now. That is, for some reason you have v1 installed. In v1, the module was also named slackclient and featured a class named SlackClient. In v2, the module name changed to slack, (but the package is still named slackclient) and the client was split into WebClient and RtmClient.

Slack API Tester is set off when a message happens in a DM

This has been happening from time to time and I've no just traced it to the slack-archive-bot. When I type a message in a DM, the Slack API Tester rattles off "No results found." See the screenshots below. What might be going on here?

screen shot 2018-05-22 at 3 30 39 pm
screen shot 2018-05-22 at 3 30 20 pm

Clean up codebase

I'm not particularly happy with how messy the codebase is. I'd like to refactor things and add some testing.

Import fails silently upon permission denied

Hello,

When trying to import a slack export on which slack-archive-bot has no permission, the import fails silently: the glob.glob finds no file matching *.json for each channel, and the archive bot thus imports the content of zero files.

Deprecation error when installing requirements

I get this error when installing the requirements in slack-archive-bot and I'm not sure if things are working correctly or not?

Collecting six>=1.10.0 (from -r requirements.txt (line 1))
  Using cached six-1.11.0-py2.py3-none-any.whl
Collecting slackclient>=1.0.4 (from -r requirements.txt (line 2))
  Using cached slackclient-1.0.9.tar.gz
Collecting websocket-client<1.0a0,>=0.35 (from slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached websocket_client-0.44.0-py2.py3-none-any.whl
Collecting requests<3.0a0,>=2.11 (from slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached requests-2.18.4-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests<3.0a0,>=2.11->slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests<3.0a0,>=2.11->slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests<3.0a0,>=2.11->slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests<3.0a0,>=2.11->slackclient>=1.0.4->-r requirements.txt (line 2))
  Using cached certifi-2017.7.27.1-py2.py3-none-any.whl
Installing collected packages: six, websocket-client, urllib3, idna, chardet, certifi, requests, slackclient
  Found existing installation: six 1.4.1
    DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 778, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/var/folders/gq/m3xyqtqs76x264vknt0cg8jr0000gn/T/pip-EmDpZ8-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'


Will slack-archive-bot be updated for Slack's Conversations API?

I've been getting messages from Slack, saying on Feb. 24, 2021 they'll end support for the legacy methods that slack-archive-bot uses. Are there plans to update it? I've come to depend on it and if the plug is about to be pulled I'd best start looking for another solution--if there even is one.

Support using MySQL as storage backend

sqlite works well for smaller workspaces but doesn't scale very well for large workspaces. The sqlite file does grow quite large and queries take a considerable amount of time to complete. Offering the option to use MySQL instead of sqlite would add scalability.

UTF-8 on import

Had an error on import with char 0x9d.
in import.py on line 35.
I changed the open statement to encode in utf-8

 with open(file_name, encoding='utf8') as f:
            messages = json.load(f)

Sort argument broken

Attempting to sort by ASC or DESC both fail to produce the expected result, and/or fail the query all-together.

Thank you

Hey,

I just wanted to thank you for your initial work. I've used your initial work to build a Slack Archive Bot, slightly more complex than yours. You can view the source here:
https://bitbucket.org/SimonErkelens/stripeslackbot-2/src/338a2b507c4d0899efc4f38eb20764945a094df7/slackbot/?at=master

It's combined with a SilverStripe website frontend that allows for logging in with Slack, a Solr powered search and some features like when using @botName, the bot will reply with the first search result, and a link to more search results on the website.

Thanks! 🙇‍♂️🙇‍♂️

Server Randomly Stops Responding

I've had issues recently where the server stopped responding (not receiving new messages and not responding to DMs). The only way to fix was to restart. The only clue was that I got the following error:

213.108.134.156 - - [06/Mar/2021 13:27:58] "^C^@^@/*à^@^@^@^@^@Cookie: mstshash=Administr" 400 -
----------------------------------------
Exception happened during processing of request from ('104.206.128.50', 34247)
Traceback (most recent call last):
  File "/usr/lib/python3.8/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.8/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.8/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.8/http/server.py", line 647, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/lib/python3.8/socketserver.py", line 720, in __init__
    self.handle()
  File "/usr/lib/python3.8/http/server.py", line 427, in handle
    self.handle_one_request()
  File "/usr/lib/python3.8/http/server.py", line 395, in handle_one_request
    self.raw_requestline = self.rfile.readline(65537)
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

I'm hoping that #33 fixes the issue. I'll update after running it for a few days.

Not logging threaded replies which are also sent to channel

Since moving to conversations api
slack-archive-bot is not logging replies to theads which are also sent to channel.
Plain channel replies are logged, threaded - logged too, but if someone replies in thread and also sends to channel - those messages do not logged, therefore cannot be searched or exported from sqlite database.

Fix security vulnerabilities

  1. SQL injection - CVE-2018-17232
    When searching the archive user input is used directly to create an SQL query. This can be exploited to view all messages, including messages from private channels the user is not a member of. Potentially exploitable to gain Remote Code Execution (RCE) depending on the configuration of the server.

  2. Information disclosure.
    If a user searches for a phrase that has been posted in a private channel the user is not a member of, the bot returns nothing rather than "No results found". This leaks to the user that the phrase has been said.

Installation instructions

I can't find any installation instructions, am I looking in the wrong place?
I'm looking to set this up on a Raspberry Pi.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.