Git Product home page Git Product logo

Comments (2)

rbeer avatar rbeer commented on May 30, 2024 1

tl;dr
Endpoint sends HTML instead of expected JSON, requesting to set a cookie app_shell_visited. Doing so only sends yet another page with a link for the user to click. This is fixed by setting the Referer header of the request to the requested resource (i.e. the URL), fully simulating the behavior requested by the first response.


The endpoint most likely performs a sanity check on incoming requests, since they are expected to originate from within the Twitter frontend (i.e. a Progressive Web App).

AppShell is a PWA concept. More on that in the always great MDN:
https://developer.mozilla.org/en-US/docs/Web/Apps/Progressive/App_structure#App_shell

Here's the first response:

<!DOCTYPE html>
<html lang="en">
<head>
... Just some styles, nothing important ...
</head>
<body>
    <noscript>
      <center>If you’re not redirected soon, please <a href="/nouswavesle/likes/time
line?include_available_features=1&amp;include_entities=1">use this link</a>.</center
>
    </noscript>
    <script nonce="CrxHUbZqtQTnttFBuO8J6A==">
      document.cookie = "app_shell_visited=1;path=/;max-age=5";
      location.replace(location.href.split("#")[0]);
    </script>
</body>
</html>

The <script> sets the app_shell_visited=1 cookie and immediately reloads the page (location.replace with same URL).
Setting it doesn't cut it, though. All we're getting then is yet another HTML page:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <title>Twitter</title>
... again some styles ...
  </style>
</head>
<body>

    <center>
      <svg viewBox="0 0 24 24"><g><path d="...actual coordinates that draw the Twitter birdy, I guess..."></path></g></svg>
      If you’re not redirected soon, please <a href="/nouswavesle/likes/timeline?inc
lude_available_features=1&amp;include_entities=1">use this link</a>.
    </center>
</body>
</html>

This time, it's not asking for the cookie; so that seems to work. But we're still asked to click a link with the same URL as requested. Since cookies that might have been set by the response are tracked automatically, there wasn't much left than the Referer header, which is being set to the current URL when you click a link.

Well, let's try that:

09:03:54 scrape-twitter:query query on resource: https://twitter.com/nouswaves/likes/timeline?include_available_features=1&include_en
tities=1
09:03:54 scrape-twitter:query response was ok
09:03:54 scrape-twitter:query received html of length: 217039
[
{"screenName":"anfiyj","id":"1058154950004523008","time":"2018-11-02T00:32:54.000Z","isRetweet":false,"isPinned":false,"isReplyTo":false,"text":"A SQ
UID is love","userMentions":[],"hashtags":[],"images":[],"urls":[],"replyCount":0,"retweetCount":4,"favoriteCount":4},
{"screenName":"recborg","id":"1054791420475772928","time":"2018-10-23T17:47:26.000Z","isRetweet":false,"isPinned":false,"isReplyTo":true,"text":"I’m
actually in NY. Was hoping you and Nabeel would meet but I don’t think his shedule permits.","userMentions":[],"hashtags":[],"images":[],"urls":[],"r
eplyCount":0,"retweetCount":0,"favoriteCount":1},
... and so on ...
]
...

🎉 😁

PR will be out in a minute.

from scrape-twitter.

jchook avatar jchook commented on May 30, 2024

I am also getting this issue:

invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
FetchError: invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
    at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

    Access Twitter data without an API key.

When looking at the json and the code, it seems the Twitter internal JSON structure may have changed from { html, _minPosition } to { has_more_items, items_html, new_latent_count }.

from scrape-twitter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.