cdzombak / groupme-tools Goto Github PK

[NO LONGER MAINTAINED] Tools for fetching and analyzing a GroupMe group chat transcript in its entirety.

License: MIT License

Python 68.79% HTML 26.71% CSS 2.66% JavaScript 1.84%

groupme-tools's Introduction

groupme-tools

Tools to fetch the complete history of a GroupMe group chat and analyze it.

groupme-fetch.py allows you to grab the entire transcript for one of your groups and save it as JSON for backup and analysis. It is documented; run it with --help for help. It also allows you to fetch recent updates in the group to keep your JSON file up to date.

simple-transcript.py processes a JSON file into a human-readable text transcript.

The files in the stat folder are self-explanatory; they allow for learning interesting things about the transcript's content and the group's history.

Finding your access token

nb. there are better ways to do this now; see GroupMe API docs.

Log into GroupMe's web interface and use Chrome or Safari's inspector to monitor the network requests when you load one of your groups.

You'll notice a GET request to an endpoint https://v2.groupme.com/groups/GROUP_ID/messages.

One of the headers sent with that request, X-Access-Token, is your access token.

Finding your group ID

nb. there are better ways to do this now; see GroupMe API docs.

Again, in GroupMe's web interface, the group ID is the numeric ID included in the group's URL (https://web.groupme.com/groups/GROUP_ID).

Requirements/Dependencies/Python

This was written and tested on Python 2.7, because I didn't want to waste time getting my Homebrew installation to install things for Python 3. I suspect this script will break if you run it with Python 3, because Unicode.

The only other dependency is Requests. pip install requests. At the time of writing, the current version was 1.1.0.

Emoji

groupme-fetch.py will store emoji and other non-ASCII characters in the transcript JSON fine, as expected.

Stress testing/performance

These tools have been tested with a transcript containing ~16,000 messages on a 1.7GHz/4GB Macbook Air. It works fine.

Keep your transcript up to date

After your initial fetch with groupme-fetch.py, optionally using the oldest option to fetch older history. You should have a complete transcript up to the last time you fetched. Then...

Note the oldest or newest parameters are message IDs from your transcript JSON file.

python groupme-fetch.py GROUPID ACCESSTOKEN newest $(python newest-id.py transcript-GROUPID.json)

groupme-tools's People

Contributors

Stargazers

Watchers

groupme-tools's Issues

Add LICENSE

This is an awesome project!

Is it be possible you can release it under the MIT license so others can build off of it? (See choosealicense.com for more details.

Thanks!

Just want to say thanks for a great collection of tools! You're the man!

(closing this because there isn't a real issue)

graph not working?

I've downloaded my transcript and its pretty large: over 164,000 messages in a 74mb file. When I open index.html in graph/ what I see is:

Additionally, there is a temp-transcript-groupID.json file that is left over. Is this normal?

Also, Ive used the --resumePrevious and --resumeNext arguments and they both report they are done, however, he first message seems to be continuing a conversation. How can I verify that it is indeed the oldest message?

Error when running simple-transcript.py

Get this error, everything else has worked perfectly...

KeyError: [u'picture_url']

When I run the simple-transcript.py script, I get the following error. I am very new to all this, so I'm pretty much completely stumped. What do I do? Any help would be appreciated.

Where to input Access Token/Group ID?

I'm totally lost with this kind of stuff, I have no clue where I'm supposed to enter in my group's access token and ID. Any help is appreciated, thanks!

--resumePrevious not working all the way

➜  groupme-tools git:(master) python groupme-fetch.py --resumePrevious MYID MYTOKEN 
starting on page 1
starting on page 2
starting on page 3
starting on page 4
starting on page 5
starting on page 6
starting on page 7
starting on page 8
starting on page 9
Reached the end/beginning!
Transcript contains 166127 messages from 2013-08-13 16:53:40 to 2015-09-16 17:17:59

➜  groupme-tools git:(master) python groupme-fetch.py --resumePrevious MYID MYTOKEN
starting on page 1
starting on page 2
starting on page 3
starting on page 4
starting on page 5
starting on page 6
starting on page 7
starting on page 8
starting on page 9
starting on page 10
starting on page 11
Reached the end/beginning!
Transcript contains 166344 messages from 2013-08-12 20:28:14 to 2015-09-16 17:17:59
➜  groupme-tools git:(master)

Notice the numbers are different! If I keep doing it, I keep getting more and more messages, but only a few pages at a time!

diff --git a/stat/posts-by-user.py b/stat/posts-by-user.py
index b0bf462..054153e 100644
--- a/stat/posts-by-user.py
+++ b/stat/posts-by-user.py
@@ -6,6 +6,11 @@ sys.setdefaultencoding("utf-8")
 import json
 import datetime

+def divideWhereDivZeroIsZero(dividend,divisor):
+    try:
+        return dividend/divisor
+    except ZeroDivisionError:
+        return 0

 def main():
     """Usage: posts-by-user.py filename.json
@@ -45,18 +50,22 @@ Assumes filename.json is a JSON GroupMe transcript.

     }
     for id, stats in counts.items():
-        name = names[id]
+       try:
+               name = names[id]
+       except KeyError:
+               names[id] = 'UID ' + str(id)
+               name = names[id]
         count = stats['messages']
         like_given_count = stats['likes_given']
         like_received_count = stats['likes_received']
         output['messages'].append(u'{name}: messages: {count} ({msg_pct:.1f}%)'.format(
-            name=name, count=count, msg_pct=count/float(totalMessages) * 100,
+            name=name, count=count, msg_pct=divideWhereDivZeroIsZero(count,float(totalMessages) * 100),
         ))
         output['likes_received'].append(u'{name}: likes received: {like_count} ({like_pct:.1f} per message)'.format(
-            name=name, like_count=like_received_count, like_pct=like_received_count/float(count),
+            name=name, like_count=like_received_count, like_pct=divideWhereDivZeroIsZero(like_received_count,float(count)),
         ))
         output['likes_given'].append(u'{name}: likes given: {like_count} ({like_pct:.1f}%)'.format(
-            name=name, like_count=like_given_count, like_pct=like_given_count/float(totalLikes) * 100
+            name=name, like_count=like_given_count, like_pct=divideWhereDivZeroIsZero(like_given_count,float(totalLikes) * 100)
         ))
     for category, values in output.items():
         print '\n'

UnicodeEncodeError

So I don't know much about Python, but when running simple-transcript.py I get the following error

I'm using Python 2.7.10 on Windows 8.1 64-bit

Any advice?