Comments (2)
tl;dr
Endpoint sends HTML instead of expected JSON, requesting to set a cookie app_shell_visited
. Doing so only sends yet another page with a link for the user to click. This is fixed by setting the Referer
header of the request to the requested resource (i.e. the URL), fully simulating the behavior requested by the first response.
The endpoint most likely performs a sanity check on incoming requests, since they are expected to originate from within the Twitter frontend (i.e. a Progressive Web App).
AppShell is a PWA concept. More on that in the always great MDN:
https://developer.mozilla.org/en-US/docs/Web/Apps/Progressive/App_structure#App_shell
Here's the first response:
<!DOCTYPE html>
<html lang="en">
<head>
... Just some styles, nothing important ...
</head>
<body>
<noscript>
<center>If you’re not redirected soon, please <a href="/nouswavesle/likes/time
line?include_available_features=1&include_entities=1">use this link</a>.</center
>
</noscript>
<script nonce="CrxHUbZqtQTnttFBuO8J6A==">
document.cookie = "app_shell_visited=1;path=/;max-age=5";
location.replace(location.href.split("#")[0]);
</script>
</body>
</html>
The <script>
sets the app_shell_visited=1
cookie and immediately reloads the page (location.replace with same URL).
Setting it doesn't cut it, though. All we're getting then is yet another HTML page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Twitter</title>
... again some styles ...
</style>
</head>
<body>
<center>
<svg viewBox="0 0 24 24"><g><path d="...actual coordinates that draw the Twitter birdy, I guess..."></path></g></svg>
If you’re not redirected soon, please <a href="/nouswavesle/likes/timeline?inc
lude_available_features=1&include_entities=1">use this link</a>.
</center>
</body>
</html>
This time, it's not asking for the cookie; so that seems to work. But we're still asked to click a link with the same URL as requested. Since cookies that might have been set by the response are tracked automatically, there wasn't much left than the Referer
header, which is being set to the current URL when you click a link.
Well, let's try that:
09:03:54 scrape-twitter:query query on resource: https://twitter.com/nouswaves/likes/timeline?include_available_features=1&include_en
tities=1
09:03:54 scrape-twitter:query response was ok
09:03:54 scrape-twitter:query received html of length: 217039
[
{"screenName":"anfiyj","id":"1058154950004523008","time":"2018-11-02T00:32:54.000Z","isRetweet":false,"isPinned":false,"isReplyTo":false,"text":"A SQ
UID is love","userMentions":[],"hashtags":[],"images":[],"urls":[],"replyCount":0,"retweetCount":4,"favoriteCount":4},
{"screenName":"recborg","id":"1054791420475772928","time":"2018-10-23T17:47:26.000Z","isRetweet":false,"isPinned":false,"isReplyTo":true,"text":"I’m
actually in NY. Was hoping you and Nabeel would meet but I don’t think his shedule permits.","userMentions":[],"hashtags":[],"images":[],"urls":[],"r
eplyCount":0,"retweetCount":0,"favoriteCount":1},
... and so on ...
]
...
🎉 😁
PR will be out in a minute.
from scrape-twitter.
I am also getting this issue:
invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
FetchError: invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
Access Twitter data without an API key.
When looking at the json and the code, it seems the Twitter internal JSON structure may have changed from { html, _minPosition }
to { has_more_items, items_html, new_latent_count }
.
from scrape-twitter.
Related Issues (20)
- Network Timeout Error On Large Queries HOT 1
- Feature Request: some way of randomly slowing down queries (so we look less like a bot) HOT 1
- scrape conversation of infinite scrolling HOT 1
- Seems like scrape-twitter is not scraping the latest tweets anymore...?
- Parser stripping suffix from number HOT 2
- Export to json or txt list HOT 1
- Cannot login.
- URLs always prefixed by last word of extraction text HOT 2
- TypeError: Path must be a string. Received undefined HOT 2
- fetchCookie(url, opts) throws unexpected token function HOT 2
- Search –query doesn’t scroll the page anymore.
- UTF-8 Tweets encoding HOT 1
- How to use [EXAMPLE] HOT 2
- Get media from Tweet
- The script doesn't collect complete list of tweets for a conversation
- Could you leave any contact to you?
- The old Twitter mobile is still accessible
- Bug: Profile not loading
- Undefined profile output and timeline error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrape-twitter.