Comments (7)
That would require a breaking API change, which is doable but I want to make sure we have all the aspects of it thought through:
- Is the error type specified by the user? Something scalpel provided?
- How much effort and what is required to migrate existing code?
from scalpel.
I started working on it locally, but it seems complicated. Would need some architectural changes I don't have the overview for.
The way Parsec reports errors seems pretty appropriate, but I'd have to look up how exactly it works under the hood
from scalpel.
Chiming in here, @KaneTW.
Context: Scraping product from product page.
Two outcomes:
a. We can scrape a Product
from the page.
b. Or we cannot (because the page has changed and our scraper no longer work).
We want to know once the page has changed so we can adopt our scraper accordingly!
Approach. We collect everything for a Product
but don’t construct it within the scraper.
That way we can let return the scraper always something. Let’s call it ProductDto
.
This ProductDto
carries only Maybe
values.
Now, in case our scraper fails to even return a Just ProductDto
, it will return a Nothing
.
Getting that, we could translate it to a null ProductDto
, holding only values of Nothing
.
That’d be a wrapper around our scraper.
So, now that we always get a ProductDto
, we always have a construction kit for constructing a Product
.
If the ProductDto
is valid as to we can construct a Product
from it, we do just that.
If values are invalid or we don’t have everything to construct a Product
, we translate the specific condition to something like ScrapeError
with variants explaining what the condition is. If we have multiple conditions, we might return multiple conditions to be comprehensive here.
Now, the function’s signature could look like:
createProduct :: ProductDto -> Either Product [ScrapeError]
@fimad what’s your take on this?
from scalpel.
related: #8
since then we've got the Monad transformer: #87
from scalpel.
I've defined ScraperT str (Except e) a
and try to implement textComment'''
that should return exceptions for unavailable fields.
Therefore textComment'''
should do a throwError "some sensible message"
when the selector cannot find the specified target.
How can I check whether author <- text authorSel
was not successful?
https://github.com/benjaminweb/scalpel-exceptt/blob/main/src/Lib.hs#L136-L141
from scalpel.
That's a good point about this already being supported via monad transformers.
I took a look at this and you can do this by wrapping Either
in the ScraperT
class. When you unwrap the result, you'll need to check for 3 cases, (1) an explicit error, (2) a failed scraping without an error, and (3) a valid result:
type Error = String
type ScraperWithError a = ScraperT String (Either Error) a
scrapeStringOrError :: String -> ScraperWithError a -> Either Error a
scrapeStringOrError html scraper
| Left error <- result = Left error
| Right Nothing <- result = Left "Unknown error"
| Right (Just a) <- result = Right a
where
result = scrapeStringLikeT html scraper
To add explicit erroring you can use the <|>
operator from Alternative
to throw and error when something fails:
comment :: ScraperWithError Comment
comment = textComment <|> imageComment <|> throwError "Unknown comment type"
With this approach, when you throw an error it will stop all parsing. So if you have a nested throwError
in a
in the expression a <|> b
. Even if b
would be successful, the parsing will fail.
Another approach that would let you accumulate errors would be to use a MonadWriter
and accumulate debugging information in a Monoid
like a list rather than short circuiting on hard errors.
Below is a modified example from the docs that uses this kind of error checking:
{-# LANGUAGE OverloadedStrings #-}
import Text.HTML.Scalpel
import Control.Applicative
import Control.Monad.Error.Class (throwError)
exampleHtml :: String
exampleHtml = "<html>\
\ <body>\
\ <div class='comments'>\
\ <div class='comment container'>\
\ <span class='comment author'>Sally</span>\
\ <div class='comment text'>Woo hoo!</div>\
\ </div>\
\ <div class='comment container'>\
\ <span class='comment author'>Bill</span>\
\ <img class='comment image' src='http://example.com/cat.gif' />\
\ </div>\
\ <div class='comment container'>\
\ <span class='comment author'>Susan</span>\
\ <div class='comment text'>WTF!?!</div>\
\ </div>\
\ <div class='comment container'>\
\ <span class='comment author'>Susan</span>\
\ <div class='comment video'>A video? That's new!</div>\
\ </div>\
\ </div>\
\ </body>\
\</html>"
type Error = String
type Author = String
data Comment
= TextComment Author String
| ImageComment Author URL
deriving (Show, Eq)
type ScraperWithError a = ScraperT String (Either Error) a
scrapeStringOrError :: String -> ScraperWithError a -> Either Error a
scrapeStringOrError html scraper
| Left error <- result = Left error
| Right Nothing <- result = Left "Unknown error"
| Right (Just a) <- result = Right a
where
result = scrapeStringLikeT html scraper
main :: IO ()
main = print $ scrapeStringOrError exampleHtml comments
where
comments :: ScraperWithError [Comment]
comments = chroots ("div" @: [hasClass "container"]) comment
comment :: ScraperWithError Comment
comment = textComment <|> imageComment <|> throwError "Unknown comment type"
textComment :: ScraperWithError Comment
textComment = do
author <- text $ "span" @: [hasClass "author"]
commentText <- text $ "div" @: [hasClass "text"]
return $ TextComment author commentText
imageComment :: ScraperWithError Comment
imageComment = do
author <- text $ "span" @: [hasClass "author"]
imageURL <- attr "src" $ "img" @: [hasClass "image"]
return $ ImageComment author imageURL
This prints out:
Left "Unknown comment type"
from scalpel.
Given that this doesn't require a ton of additional code on the users behalf, I plan to update the documentation with some of these examples but not change the API to add types like ScraperWithError
.
I think it makes sense to keep the API simple and unopinionated about error handling and then the user can choose the types and approach that makes sense for them / works well with their code base.
from scalpel.
Related Issues (20)
- Ignore this, was a mistake, sorry HOT 1
- Remove String and use StringLike HOT 1
- Allow direct manipulation of TagSpec object HOT 3
- \r\n HOT 4
- scalpel-core and scalpel have been removed from stackage HOT 2
- howto extract all attribute pairs of a tag HOT 1
- <|> is defined in hidden module HOT 2
- ScraperT instances for MonadThrow and MonadCatch
- Build failure with mtl-2.3 HOT 5
- `MonadFail`-related build failures in `scalpel-core <= 0.6.0` HOT 1
- Expose internals of AttributePredicate HOT 1
- Highly unintuitive behavior of `inSerial . stepNext . inSerial . stepNext` HOT 2
- Selector behavior depends on the tag name HOT 1
- Suggestion : AttributePredicate combinators HOT 1
- 0.6.2.1 does not compile HOT 1
- build of 0.6.2.1 fails with GHC 9.8.1 (base 4.19.0.0) HOT 1
- Compilation on GHC newer than 9.4.x fails due to missing MonadFix HOT 2
- Allow setting request headers HOT 1
- Faster HTML tokenization
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scalpel.