Git Product home page Git Product logo

instagram-screen-scrape's Introduction

Instagram Screen Scrape

Build Status NPM version NPM license

A tool for scraping public data from Instagram, without needing to get permission from Instagram. It can (theoretically) scrape anything that a non-logged-in user can see. But, right now it only supports getting posts for a given username or comments for a given post.

Example

CLI

The CLI operates entirely over STDOUT, and will output posts as it scrapes them. The following example is truncated because the output of the real command is obviously very long... it will end with a closing bracket (making it valid JSON) if you see the full output.

$ instagram-screen-scrape posts --username carrotcreative
[{"id":"0toxcII4Eo","username":"carrotcreative","time":1427420497,"type":"image","likes":82,"comments":3,"text":"Our CTO, @kylemac, speaking on the #LetsTalkCulture panel tonight @paperlesspost.","media":"https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/11055816_398297847022038_803876945_n.jpg"},
{"id":"0qPcnuI4Pr","username":"carrotcreative","time":1427306556,"type":"image","likes":80,"comments":4,"text":"#bitchesbebakin took it to another level today for @nporteschaikin and @slang800's #Carrotversaries today.","media":"https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/10959049_1546104325652055_1320782099_n.jpg"},
{"id":"0WLnjlo4Ft","username":"carrotcreative","time":1426633460,"type":"image","likes":61,"comments":1,"text":"T-shirts speak louder than words. Come find us @sxsw.","media":"https://scontent.cdninstagram.com/hphotos-xfa1/t51.2885-15/e15/11032904_789885121108568_378908081_n.jpg"},

We can also scrape comments:

$ instagram-screen-scrape comments --post 0qPcnuI4Pr
[{"id":"948651188581269518","username":"johnlustina","time":1427308055,"text":"@margeauxlustina"},
{"id":"948682633420963943","username":"rita_xo","time":1427311804,"text":"๐Ÿ‘Œ@emilykalen"},
{"id":"948734454231433861","username":"david_berkhin","time":1427317981,"text":"looks so good!"},
{"id":"948824521079751272","username":"k.kate","time":1427328718,"text":"Macarons or a Petri dish full of cells? ยฏ\\_(ใƒ„)_/ยฏ"}]

By default, there is 1 line per post, making it easy to pipe into other tools. The following example uses wc -l to count how many posts are returned. As you can see, I don't post much.

$ instagram-screen-scrape posts -u slang800 | wc -l
2

JavaScript Module

The following example is in CoffeeScript.

{InstagramPosts} = require 'instagram-screen-scrape'

# create the stream
streamOfPosts = new InstagramPosts(username: 'slang800')

# do something interesting with the stream
streamOfPosts.on('data', (post) ->
  # since it's an object-mode stream, we get objects from it and don't need to
  # parse JSON or anything

  # the time field is represented in UNIX time
  time = new Date(post.time * 1000)

  # output something like "slang800's post from 4/5/2015 got 1 like(s), and 0
  # comment(s)"
  console.log "slang800's post from #{time.toLocaleDateString()} got
  #{post.likes} like(s), and #{post.comments} comment(s)"
)

The following example is the same as the last one, but in JavaScript.

var InstagramPosts, streamOfPosts;
InstagramPosts = require('instagram-screen-scrape').InstagramPosts;

streamOfPosts = new InstagramPosts({
  username: 'slang800'
});

streamOfPosts.on('data', function(post) {
  var time = new Date(post.time * 1000);
  console.log([
    "slang800's post from ",
    time.toLocaleDateString(),
    " got ",
    post.likes,
    " like(s), and ",
    post.comments,
    " comment(s)"
  ].join(''));
});

And we can scrape comments in a similar manner (shown in CoffeeScript):

{InstagramComments} = require 'instagram-screen-scrape'

streamOfComments = new InstagramComments(post: '0qPcnuI4Pr')

# do something interesting with the stream
streamOfComments.on('data', (comment) ->
  # the time field is represented in UNIX time
  time = new Date(comment.time * 1000)

  console.log "#{comment.username} commented on #{time.toLocaleDateString()}:
  #{comment.text}"
)

Why?

The fact that Instagram requires an app to be registered just to access the data that is publicly available on their site is excessively controlling. Scripts should be able to consume the same data as people, and with the same level of authentication. Sadly, Instagram doesn't provide an open, structured, and machine readable API.

So, we're forced to use a method that Instagram cannot effectively shut down without harming themselves: scraping their user-facing site.

Caveats

  • This is probably against the Instagram TOS, so don't use it if that sort of thing worries you.
  • Whenever Instagram updates certain parts of their front-end this scraper will need to be updated to support the new markup.
  • You can't scrape protected accounts or get engagement rates / impression counts (cause it's not public duh).

instagram-screen-scrape's People

Contributors

notslang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

instagram-screen-scrape's Issues

No more max_id param

It looks like Instagram removed the max_id param, breaking this scraper. It seems to just return the first 20 post no matter what max_id is set to, causing an infinite loop.

Setup guide request/Cannot find module 'coffee-script/register'

Hi,

I have a list (I can put it in .csv or .json format) of MediaID's I'd like to get the comment content from.

As there is no setup guide, I may have some of this wrong.

npm g -i

Then when I tried running:
instagram-screen-scrape comments --post 0qPcnuI4Pr

This renders the following error:

MacBook-Pro-3:CommentNPM mac1$ ls
LICENSE     Makefile    README.md   bin     lib     package.json    test
MacBook-Pro-3:CommentNPM mac1$ instagram-screen-scrape 
module.js:327
    throw err;
    ^

Error: Cannot find module 'coffee-script/register'
    at Function.Module._resolveFilename (module.js:325:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/usr/local/lib/node_modules/instagram-screen-scrape/bin/index.js:2:1)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Function.Module.runMain (module.js:441:10)

How can I fix this error?

ECONNRESET problem

I get this whenever I try to get all the posts for an account with a lot of posts (more than 1K).

Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TLSWrap.onread (net.js:572:26)

just run this command in CLI :

instagram-screen-scrape posts -u instagram

which in turn will try to grab all posts for official instagram account. This is a common issue tbh, I tried to write my own scraper today that would count all the likes for all the posts and ECONNRESET comes up all the time. It seems like there is no reliable way to get all likes from the posts unless you do it through the official API

Could not get any comments

I use this: instagram-screen-scrape comments --post post_id to retrieve comments of post_id but always get an empty list.

The provided example also returned nothing: instagram-screen-scrape comments --post 0qPcnuI4Pr

Not pulling all comments

This pretty basic example isn't pulling all of the comments. It grabs ~89 of the 2200+.

const InstagramComments = require("instagram-screen-scrape").InstagramComments;

streamOfComments = new InstagramComments({
  post: "BRB_ooCjlL7"
});
let count = 0;

streamOfComments.on("data", function(comment) {
  console.log(comment);
  console.log("Total comments: ", count++);
});

Add support for retrieving content from hashtags.

Hi!

Recently (this month) Instagram stopped allowing us to get images from multiple users per hashtag....

This: https://www.instagram.com/explore/tags/nofilter/

As you can see, that is a public listing of all images with the hashtag "nofilter", but now it's impossible to do it through the API....

Would it be possible to modify this script to support hastags?

P.S. there are many people searching for a solution right now for the problem described.... I think that this could be a good solution.

Thanks!

Any idea of the error? Tks!

events.js:74
W20160705-21:23:11.568(-6)? (STDERR)         throw TypeError('Uncaught, unspecified "error" event.');
W20160705-21:23:11.569(-6)? (STDERR)               ^
W20160705-21:23:11.570(-6)? (STDERR) TypeError: Uncaught, unspecified "error" event.
W20160705-21:23:11.571(-6)? (STDERR)     at TypeError (<anonymous>)
W20160705-21:23:11.571(-6)? (STDERR)     at InstagramComments.emit (events.js:74:15)
W20160705-21:23:11.571(-6)? (STDERR)     at Stream._getCommentsPage.on.on.hasMoreComments (/.../node_modules/instagram-screen-scrape/lib/comments.js:112:22)
W20160705-21:23:11.571(-6)? (STDERR)     at Stream.emit (events.js:95:17)
W20160705-21:23:11.571(-6)? (STDERR)     at Request.<anonymous> (/.../node_modules/instagram-screen-scrape/lib/util.js:25:24)
W20160705-21:23:11.571(-6)? (STDERR)     at Request.emit (events.js:95:17)
W20160705-21:23:11.572(-6)? (STDERR)     at Request.onRequestResponse (/.../node_modules/request/request.js:977:10)
W20160705-21:23:11.572(-6)? (STDERR)     at ClientRequest.emit (events.js:95:17)
W20160705-21:23:11.572(-6)? (STDERR)     at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1744:21)
W20160705-21:23:11.572(-6)? (STDERR)     at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:152:23)

UserID from comments

I'm trying to get the userID from the comment list, the html doesn't have it, but the react does.
<r className="_iqaka" user={..., id: "XXXX"} ... }>
Is it possible to read react together so I don't have to make another call for each userId?

Retrieve user information (num followers, num posts)

Since this tool uses the public information from the instagram website, could it be possible to get a new enpoint to get the public information of a given account ?

  • profile picture
  • num. posts
  • num. followers
  • num following
  • profile bio

Thanks

Installation

Sorry I am quite new!

How do I use this? Do I need to install it?

Newbie could not get comments

Hi,

First of all my first time on node :) And totally liked the job you did. My problem is I can get posts but not comments. Here is my code : (please do not judge the spagetti :) )

var util = require ('util'),
url = require('url'),
http = require('http'),
qs = require('querystring');

var InstagramPosts, streamOfPosts;
InstagramPosts = require('instagram-screen-scrape');

var InstagramComments, streamOfComments;
InstagramComments = require('instagram-screen-scrape');

http.createServer(function (request, response) {

  // Send the HTTP header 
  // HTTP Status: 200 : OK
  // Content Type: text/plain

  response.writeHead(200, {'Content-Type': 'text/plain'});

  // Send the response body as "Hello World"


  var url_parts = url.parse(request.url,true);
  var user = url_parts.pathname.replace('/', '');

  sendEm(user, response)

}).listen(8081);

var itemToWait = 0;
var res;

function sendEm(user, response) {

  streamOfPosts = new InstagramPosts({
    username: user
  });

  streamOfPosts.on('data', function(post) {

    // each 2 second
    setInterval(function(){

      if(itemToWait == 0) {
        response.end('\n Done');
      }


    }, 2 * 1000);

    // var post;

    // post = streamOfPosts.read();

    itemToWait++;
    if(post != null) {
      if(post.comment > 0) {

        response.write(util.inspect(post) + '\n');

        streamOfComments = new InstagramComments({
          post: post.id
        });

        streamOfComments.on('data', function(comment) {
          // var comment;

          // comment = streamOfComments.read();

          itemToWait++;
          if(comment != null) {
            response.write('\n _1_ \n');
            response.write(util.inspect(comment));
            response.write('\n _2_ \n');

            itemToWait--;
          } else {

            itemToWait--;
            return;
          }

        });

      }
      response.write('\n\n');

      itemToWait--;
    } else {
      itemToWait--;
    }


  });

}

Sample url call : http://127.0.0.1:8081/username

Regards

Automatic bulk scrape from list of values (eg MediaID's)

Hi,

Great work on this. So I've got a .txt file with a list of mediaID's I'd like to scrape the comments from.

However, I can only do them one-at-a time in your script.

I don't know CoffeeScript. How is this possible in your repo?

I tried with a BashScript- however there are often 404 errors on the comment scrapes. It often works if I "re-attempt" the scrape. Is there a "proper" way to do this?

Here's a copy of my Bash file

while read filename;
do instagram-screen-scrape comments --post "$filename" > "$filename.json";
done < list.txt

Add the date?

This is wonderful tool that is great for doing some instagram analytics. I especially appreciate it because their insight platform is currently only for mobile, which is terrible.

I really would love to have the post date and time, if possible, in order to do some additional analyses. Is it easily implementable?

Instagram returns 404

Hi,

trying to scrape but getting this error:

events.js:165
      throw err;
      ^

Error: Uncaught, unspecified "error" event. (Instagram returned status code: 404)
    at InstagramPosts.emit (events.js:163:17)
    at Stream.<anonymous> (/usr/local/lib/node_modules/instagram-screen-scrape/lib/posts.js:69:22)
    at emitOne (events.js:96:13)
    at Stream.emit (events.js:188:7)
    at Request.<anonymous> (/usr/local/lib/node_modules/instagram-screen-scrape/lib/util.js:25:24)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at Request.onRequestResponse (/usr/local/lib/node_modules/instagram-screen-scrape/node_modules/request/request.js:1074:10)
    at emitOne (events.js:96:13)
    at ClientRequest.emit (events.js:188:7)`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.