Comments (15)
Hi Brian,
Yeah, the tests do take a long time. I'm not a testing expert, and I was mostly interested in getting tests written, rather than making them perfect. As such, I'm not married to test-unit
; if you wanna submit a rspec pull request, I'd totally be open to it. Thanks for contributing!
from upton.
Awesome. Yeah - definitely no criticism intended. Tests > no tests.
from upton.
None taken. :) Looking forward to your PR.
from upton.
My colleague asks: Can we do minitest instead of rspec?
from upton.
Yeah, absolutely. However, I've never used minitest, so I may not be up for
taking the lead on it. Will investigate after work today or tomorrow.
On Tue, Jul 23, 2013 at 3:13 PM, Jeremy B. Merrill <[email protected]
wrote:
My colleague asks: Can we do minitest instead of rspec?
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-21416666
.
from upton.
Okay. Rumor has it that they're similar, but I don't really know from experience. How would you, under RSpec, avoid starting the server so often?
Perhaps I ought to only test the fetching-related methods with the server on, and the rest should just load the page from disk. (Because then I'm not actually testing stuff that needs teh server.)
I'd love to hear your thoughts...
from upton.
@brianflanagan, I converted the tests to RSpec in bcaa857; they now run way faster (31sec on my machine). If you have any other ideas on how to improve the tests, please don't hesitate to let me know (or submit a PR). :)
from upton.
Fantastic - I'll take a look. Apologies for flaking on this; still intend to submit a pull request or two.
from upton.
Awesome, thanks! If you have any RSpec tips wrt to what I wrote, I'm happy to hear them. I'm not a testing expert...
from upton.
It's not the test framework that is slow, it's the use of a webserver. At some point, the dependence on Thin should be removed and replaced with Fakeweb to simulate HTTP responses: https://github.com/chrisk/fakeweb
The HTML parsed should also be vastly simplified...but totally understandable that it's more comfortable with working with HTML you've looked at many times before.
from upton.
+1
There's no need to test thin
in the tests for upton
. The server responses can probably be stubbed.
from upton.
I have some webmock/rspec
stuff written for the downloading and caching part. I suppose after that, the server tests can be removed.
from upton.
Fakeweb looks awesome. Makes sense to replace my silly Thin stuff with it.
As far as simplifying the HTML goes, do y'all suggest bare HTML pages with nothing but the tested element, or just removing the Wikipedia/ProPublica headers, etc. on the test cases and leaving the scraped content mostly the same?
from upton.
Yeah...including the full pages makes things slower on two fronts...one, with just the opening and parsing of the pages, and two, for other contributors to read. For example, some of the headlines scraped on the sample pages are repeated (in sidebars and widgets)...if you write the tests with a single purpose, such as: Does Upton use the given selector to find the expected hrefs, then it shouldn't matter what else is on the page.
A possible strategy at this time is to move the current full-page tests into their own directory...as you write actual unit tests, those full page tests should still pass (until they don't, whenever you've decided to change the API)
from upton.
You all will be pleased to learn that I got rid of that Thin server and used webmocks. :)
Removing the extraneous page structure is still tk.
from upton.
Related Issues (20)
- relative url edge cases HOT 4
- Handle pagination out-of-the-box HOT 2
- find by xpath HOT 5
- Improving url_to_filename HOT 7
- Use content-type to skip non-HTML instance pages HOT 4
- Recursive function causing a stack overflow HOT 5
- Switch from concatenating HTML to putting it in an array when paginating HOT 2
- Warn users of slug collisions
- pagination doesn't respect sleep time HOT 7
- The example in README.md does not work HOT 2
- Nokogiri::CSS::SyntaxError: unexpected '$' after '' HOT 3
- Helper methods for scraping one page and for scraping multiple HOT 5
- Create ScrapedPage object HOT 1
- HTML Comment on stashed pages with info HOT 1
- Make Scraper instances additive HOT 1
- problem scraping index page (Scraping 0 instances) HOT 1
- Pagination always double-downloads first page HOT 3
- make scrape method return an enumerator
- scrape_to_csv method should write to the CSV incrementally
- New version? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from upton.