Comments (5)
Makes total sense and at first glance I think your solution will work. Can
you send me a pull request?
On Tue, Nov 5, 2013 at 6:34 PM, Eric Sagara [email protected]:
https://github.com/propublica/upton/blob/master/lib/upton.rb#L314-L326
Will cause a stack overflow with large paginations >2300 or so. Possible
solution:def get_instance(url, pagination_index=0, options={})
resp = self.get_page(url, @debug, options)
i = pagination_index.to_i
while !resp.empty?
next_url = self.next_instance_page_url(url, i += 1)
next_resp = self.get_page(next_url, @debug, options)
break if next_url == url
resp += next_resp
end
resp
end—
Reply to this email directly or view it on GitHubhttps://github.com//issues/23
.
from upton.
Yo Eric, did your PR close this issue?
from upton.
I think so, there was an issue somewhere in that where it would get caught
in a loop. I am having another problem though. The
sleep_time_between_requests does not seem to be working. Have you played
around with it at all? Perhaps I am missing something in the syntax.
Eric
On Wed, Dec 18, 2013 at 3:54 PM, Jeremy B. Merrill <[email protected]
wrote:
Yo Eric, did your PR #24 close
this issue?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/23#issuecomment-30879699
.
from upton.
Is it possible that the below line is not evaluating to true? I can see
that both @verbose and @sleep_time_between_requests are being passed to the
scraper, but the sleep time is not being implemented from what I can tell.
https://github.com/propublica/upton/blob/master/lib/upton.rb#L223
Eric
On Wed, Dec 18, 2013 at 9:49 PM, Eric Sagara [email protected] wrote:
I think so, there was an issue somewhere in that where it would get caught
in a loop. I am having another problem though. The
sleep_time_between_requests does not seem to be working. Have you played
around with it at all? Perhaps I am missing something in the syntax.Eric
On Wed, Dec 18, 2013 at 3:54 PM, Jeremy B. Merrill <
[email protected]> wrote:Yo Eric, did your PR #24 close
this issue?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/23#issuecomment-30879699
.
from upton.
I'm not sure exactly what's happening, but I noted it in #28.
Will look into it greater depth shortly. I'll write a test too :)
from upton.
Related Issues (20)
- relative url edge cases HOT 4
- Handle pagination out-of-the-box HOT 2
- find by xpath HOT 5
- Improving url_to_filename HOT 7
- Use content-type to skip non-HTML instance pages HOT 4
- Switch from concatenating HTML to putting it in an array when paginating HOT 2
- Warn users of slug collisions
- pagination doesn't respect sleep time HOT 7
- The example in README.md does not work HOT 2
- Nokogiri::CSS::SyntaxError: unexpected '$' after '' HOT 3
- Helper methods for scraping one page and for scraping multiple HOT 5
- Create ScrapedPage object HOT 1
- HTML Comment on stashed pages with info HOT 1
- Make Scraper instances additive HOT 1
- problem scraping index page (Scraping 0 instances) HOT 1
- Pagination always double-downloads first page HOT 3
- make scrape method return an enumerator
- scrape_to_csv method should write to the CSV incrementally
- New version? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from upton.