Git Product home page Git Product logo

groupme-tools's Introduction

groupme-tools

No Maintenance Intended

Tools to fetch the complete history of a GroupMe group chat and analyze it.

groupme-fetch.py allows you to grab the entire transcript for one of your groups and save it as JSON for backup and analysis. It is documented; run it with --help for help. It also allows you to fetch recent updates in the group to keep your JSON file up to date.

simple-transcript.py processes a JSON file into a human-readable text transcript.

The files in the stat folder are self-explanatory; they allow for learning interesting things about the transcript's content and the group's history.

Finding your access token

nb. there are better ways to do this now; see GroupMe API docs.

Log into GroupMe's web interface and use Chrome or Safari's inspector to monitor the network requests when you load one of your groups.

You'll notice a GET request to an endpoint https://v2.groupme.com/groups/GROUP_ID/messages.

One of the headers sent with that request, X-Access-Token, is your access token.

Finding your group ID

nb. there are better ways to do this now; see GroupMe API docs.

Again, in GroupMe's web interface, the group ID is the numeric ID included in the group's URL (https://web.groupme.com/groups/GROUP_ID).

Requirements/Dependencies/Python

This was written and tested on Python 2.7, because I didn't want to waste time getting my Homebrew installation to install things for Python 3. I suspect this script will break if you run it with Python 3, because Unicode.

The only other dependency is Requests. pip install requests. At the time of writing, the current version was 1.1.0.

Emoji

groupme-fetch.py will store emoji and other non-ASCII characters in the transcript JSON fine, as expected.

Stress testing/performance

These tools have been tested with a transcript containing ~16,000 messages on a 1.7GHz/4GB Macbook Air. It works fine.

Keep your transcript up to date

After your initial fetch with groupme-fetch.py, optionally using the oldest option to fetch older history. You should have a complete transcript up to the last time you fetched. Then...

Note the oldest or newest parameters are message IDs from your transcript JSON file.

python groupme-fetch.py GROUPID ACCESSTOKEN newest $(python newest-id.py transcript-GROUPID.json)

groupme-tools's People

Contributors

bshaibu avatar cdzombak avatar czue avatar xzys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

groupme-tools's Issues

Thanks!

Just want to say thanks for a great collection of tools! You're the man!

(closing this because there isn't a real issue)

Where to input Access Token/Group ID?

I'm totally lost with this kind of stuff, I have no clue where I'm supposed to enter in my group's access token and ID. Any help is appreciated, thanks!

Add LICENSE

This is an awesome project!

Is it be possible you can release it under the MIT license so others can build off of it? (See choosealicense.com for more details.

UnicodeEncodeError

So I don't know much about Python, but when running simple-transcript.py I get the following error
image
I'm using Python 2.7.10 on Windows 8.1 64-bit

Any advice?

Bugs with "like" stats in posts-by-user.py

Here's a patch to fix some issues where users who have ONLY done likes and have not posted messages create problems for the stat script:

With the current script we get KeyError: u'<id>', but when that was fixed some of the division by zero errors came out. This uses a small function to allow anything divided by zero to be zero-- probably close enough for this purpose.


diff --git a/stat/posts-by-user.py b/stat/posts-by-user.py
index b0bf462..054153e 100644
--- a/stat/posts-by-user.py
+++ b/stat/posts-by-user.py
@@ -6,6 +6,11 @@ sys.setdefaultencoding("utf-8")
 import json
 import datetime

+def divideWhereDivZeroIsZero(dividend,divisor):
+    try:
+        return dividend/divisor
+    except ZeroDivisionError:
+        return 0

 def main():
     """Usage: posts-by-user.py filename.json
@@ -45,18 +50,22 @@ Assumes filename.json is a JSON GroupMe transcript.

     }
     for id, stats in counts.items():
-        name = names[id]
+       try:
+               name = names[id]
+       except KeyError:
+               names[id] = 'UID ' + str(id)
+               name = names[id]
         count = stats['messages']
         like_given_count = stats['likes_given']
         like_received_count = stats['likes_received']
         output['messages'].append(u'{name}: messages: {count} ({msg_pct:.1f}%)'.format(
-            name=name, count=count, msg_pct=count/float(totalMessages) * 100,
+            name=name, count=count, msg_pct=divideWhereDivZeroIsZero(count,float(totalMessages) * 100),
         ))
         output['likes_received'].append(u'{name}: likes received: {like_count} ({like_pct:.1f} per message)'.format(
-            name=name, like_count=like_received_count, like_pct=like_received_count/float(count),
+            name=name, like_count=like_received_count, like_pct=divideWhereDivZeroIsZero(like_received_count,float(count)),
         ))
         output['likes_given'].append(u'{name}: likes given: {like_count} ({like_pct:.1f}%)'.format(
-            name=name, like_count=like_given_count, like_pct=like_given_count/float(totalLikes) * 100
+            name=name, like_count=like_given_count, like_pct=divideWhereDivZeroIsZero(like_given_count,float(totalLikes) * 100)
         ))
     for category, values in output.items():
         print '\n'

KeyError: [u'picture_url']

When I run the simple-transcript.py script, I get the following error. I am very new to all this, so I'm pretty much completely stumped. What do I do? Any help would be appreciated.

python error

graph not working?

I've downloaded my transcript and its pretty large: over 164,000 messages in a 74mb file. When I open index.html in graph/ what I see is:
graph

Additionally, there is a temp-transcript-groupID.json file that is left over. Is this normal?

Also, Ive used the --resumePrevious and --resumeNext arguments and they both report they are done, however, he first message seems to be continuing a conversation. How can I verify that it is indeed the oldest message?

--resumePrevious not working all the way

➜  groupme-tools git:(master) python groupme-fetch.py --resumePrevious MYID MYTOKEN 
starting on page 1
starting on page 2
starting on page 3
starting on page 4
starting on page 5
starting on page 6
starting on page 7
starting on page 8
starting on page 9
Reached the end/beginning!
Transcript contains 166127 messages from 2013-08-13 16:53:40 to 2015-09-16 17:17:59

➜  groupme-tools git:(master) python groupme-fetch.py --resumePrevious MYID MYTOKEN
starting on page 1
starting on page 2
starting on page 3
starting on page 4
starting on page 5
starting on page 6
starting on page 7
starting on page 8
starting on page 9
starting on page 10
starting on page 11
Reached the end/beginning!
Transcript contains 166344 messages from 2013-08-12 20:28:14 to 2015-09-16 17:17:59
➜  groupme-tools git:(master)

Notice the numbers are different! If I keep doing it, I keep getting more and more messages, but only a few pages at a time!

Not writing json file

groupme-fetch.py runs and command prompt shows that it reaches the end/beginning! and gives me the total messages, but the file is never created for me to look at.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.