Sometimes pages depend on certain HTTP request headers sent, for rendering the expecte

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I've programmed a few days ago ( ht

Just sent pull request <a class="issue-link js-issue-link" data-error-text="Failed to

Add Unit Tests to pull request <a class="issue-link js-issue-link" data-error-text="F

I like the approach of using a successful request (<a class="user-mention notranslate"

Any news on this? <a class="user-mention notranslate" data-hovercard-type="user" data-

I like <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

any interest in resurrecting this one for Scrapy 1.0 <a class="user-mention notranslat

Probe command about scrapy HOT 17 CLOSED

scrapy commented on May 23, 2024

Probe command

from scrapy.

Comments (17)

pedrofaustino commented on May 23, 2024

Hi @pablohoffman, has anyone started this? We're happy to write it but need to know if we're duplicating efforts.

from scrapy.

pablohoffman commented on May 23, 2024

@pedrofaustino no one has started that I know, and it's still highly needed. I'm happy to review it.

from scrapy.

srmaximiano commented on May 23, 2024

Hi @pablohoffman I'm working with @pedrofaustino.
I have a question:
This command will be similar to shell command,
Where i should create files, in contrib folder or like shell command files.

from scrapy.

csalazar commented on May 23, 2024

I've programmed a script few days ago (https://github.com/csalazar/minreq) similar to the functionality asked here but with other approach. It takes a valid request from Chrome DevTools using the "Copy as Curl" feature and minimize the number of headers to only the required ones. For the internal comparisons it uses MD5 sums.

I think it's the best way since fuzzing user-agent and accept headers don't provide a real solution for the problem. Maybe it's useful for some requests, but a wanted output sometimes depends of a specific cookie value or POST data (in my script support of POST isn't implemented yet, but is an easy task). Working with a valid request is the only solution for that kind of situations.

The script uses MD5 sums but it can be modified to the search string case since a header could change the page very little and still contains a valid output.

from scrapy.

pedrofaustino commented on May 23, 2024

Just sent pull request #413

from scrapy.

srmaximiano commented on May 23, 2024

Add Unit Tests to pull request #413.
Please review and comment.

from scrapy.

nramirezuy commented on May 23, 2024

I like the approach of using a successful request (@csalazar).

What do you think about it @pablohoffman ?

from scrapy.

pedrofaustino commented on May 23, 2024

Any news on this? @nramirezuy and @pablohoffman ?

from scrapy.

barraponto commented on May 23, 2024

I like @csalazar's approach, it's basically what I did for manual probing. I'd like it a lot if it probed url query parameters as well.

from scrapy.

pablohoffman commented on May 23, 2024

any interest in resurrecting this one for Scrapy 1.0 @nramirezuy @kmike @dangra @curita @redapple ?

from scrapy.

kmike commented on May 23, 2024

No interest from me :) scrapy probe looks a bit too specific; I'm not sure there is a single best solution, and it is a feature which can live fine outside Scrapy.

I like @csalazar's approach; it looks a bit similar to what parts of https://github.com/DRMacIver/hypothesis library are doing - given an example, simplify it until it fails.

from scrapy.

nramirezuy commented on May 23, 2024

I'm agree with @kmike

---> Close ? /

from scrapy.

pablohoffman commented on May 23, 2024

I quite like the minreq idea, I think it shares the same goal of the original scrapy probe idea.

What do you think about adding minreq as a scrapy command?. I think it's a useful thing to have in the "scraping toolkit" and thus makes a good fit for a scrapy command (in addition to getting more exposure, usage and contributions).

/cc @csalazar

from scrapy.

csalazar commented on May 23, 2024

@pablohoffman sure, I'll add POST support and tests. The command should be called probe or minreq?

from scrapy.

nramirezuy commented on May 23, 2024

@csalazar I think minreq is fine, since we are just making your app more visible.

from scrapy.

pablohoffman commented on May 23, 2024

minreq sounds good to me, thanks @csalazar!

from scrapy.

cathalgarvey commented on May 23, 2024

Hey @csalazar - Did you ever write the code for this, and are you still interested in contributing? I think this would be a great feature. But if not, I will close this issue as stale, particularly as another solution exists (minreq).

from scrapy.

Probe command about scrapy HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent