Git Product home page Git Product logo

Comments (7)

fimad avatar fimad commented on September 17, 2024

That would require a breaking API change, which is doable but I want to make sure we have all the aspects of it thought through:

  • Is the error type specified by the user? Something scalpel provided?
  • How much effort and what is required to migrate existing code?

from scalpel.

KaneTW avatar KaneTW commented on September 17, 2024

I started working on it locally, but it seems complicated. Would need some architectural changes I don't have the overview for.

The way Parsec reports errors seems pretty appropriate, but I'd have to look up how exactly it works under the hood

from scalpel.

benjaminweb avatar benjaminweb commented on September 17, 2024

Chiming in here, @KaneTW.
Context: Scraping product from product page.
Two outcomes:
a. We can scrape a Product from the page.
b. Or we cannot (because the page has changed and our scraper no longer work).

We want to know once the page has changed so we can adopt our scraper accordingly!

Approach. We collect everything for a Product but don’t construct it within the scraper.
That way we can let return the scraper always something. Let’s call it ProductDto.
This ProductDto carries only Maybe values.

Now, in case our scraper fails to even return a Just ProductDto, it will return a Nothing.
Getting that, we could translate it to a null ProductDto, holding only values of Nothing.
That’d be a wrapper around our scraper.

So, now that we always get a ProductDto, we always have a construction kit for constructing a Product.
If the ProductDto is valid as to we can construct a Product from it, we do just that.
If values are invalid or we don’t have everything to construct a Product, we translate the specific condition to something like ScrapeError with variants explaining what the condition is. If we have multiple conditions, we might return multiple conditions to be comprehensive here.

Now, the function’s signature could look like:

createProduct :: ProductDto -> Either Product [ScrapeError]

@fimad what’s your take on this?

from scalpel.

benjaminweb avatar benjaminweb commented on September 17, 2024

related: #8
since then we've got the Monad transformer: #87

from scalpel.

benjaminweb avatar benjaminweb commented on September 17, 2024

I've defined ScraperT str (Except e) a and try to implement textComment''' that should return exceptions for unavailable fields.
Therefore textComment''' should do a throwError "some sensible message" when the selector cannot find the specified target.

How can I check whether author <- text authorSel was not successful?

https://github.com/benjaminweb/scalpel-exceptt/blob/main/src/Lib.hs#L136-L141

from scalpel.

fimad avatar fimad commented on September 17, 2024

That's a good point about this already being supported via monad transformers.

I took a look at this and you can do this by wrapping Either in the ScraperT class. When you unwrap the result, you'll need to check for 3 cases, (1) an explicit error, (2) a failed scraping without an error, and (3) a valid result:

type Error = String
type ScraperWithError a = ScraperT String (Either Error) a

scrapeStringOrError :: String -> ScraperWithError a -> Either Error a
scrapeStringOrError html scraper
        | Left error    <- result  = Left error
        | Right Nothing <- result  = Left "Unknown error"
        | Right (Just a) <- result = Right a
    where
    result = scrapeStringLikeT html scraper

To add explicit erroring you can use the <|> operator from Alternative to throw and error when something fails:

comment :: ScraperWithError Comment
comment = textComment <|> imageComment <|> throwError "Unknown comment type"

With this approach, when you throw an error it will stop all parsing. So if you have a nested throwError in a in the expression a <|> b. Even if b would be successful, the parsing will fail.

Another approach that would let you accumulate errors would be to use a MonadWriter and accumulate debugging information in a Monoid like a list rather than short circuiting on hard errors.

Below is a modified example from the docs that uses this kind of error checking:

{-# LANGUAGE OverloadedStrings #-}

import Text.HTML.Scalpel
import Control.Applicative
import Control.Monad.Error.Class (throwError)


exampleHtml :: String
exampleHtml = "<html>\
\    <body>\
\        <div class='comments'>\
\            <div class='comment container'>\
\                <span class='comment author'>Sally</span>\
\                <div class='comment text'>Woo hoo!</div>\
\            </div>\
\            <div class='comment container'>\
\                <span class='comment author'>Bill</span>\
\                <img class='comment image' src='http://example.com/cat.gif' />\
\            </div>\
\            <div class='comment container'>\
\                <span class='comment author'>Susan</span>\
\                <div class='comment text'>WTF!?!</div>\
\            </div>\
\            <div class='comment container'>\
\                <span class='comment author'>Susan</span>\
\                <div class='comment video'>A video? That's new!</div>\
\            </div>\
\        </div>\
\    </body>\
\</html>"

type Error = String

type Author = String

data Comment
    = TextComment Author String
    | ImageComment Author URL
    deriving (Show, Eq)

type ScraperWithError a = ScraperT String (Either Error) a

scrapeStringOrError :: String -> ScraperWithError a -> Either Error a
scrapeStringOrError html scraper
        | Left error    <- result  = Left error
        | Right Nothing <- result  = Left "Unknown error"
        | Right (Just a) <- result = Right a
    where
    result = scrapeStringLikeT html scraper

main :: IO ()
main = print $ scrapeStringOrError exampleHtml comments
    where
    comments :: ScraperWithError [Comment]
    comments = chroots ("div" @: [hasClass "container"]) comment

    comment :: ScraperWithError Comment
    comment = textComment <|> imageComment <|> throwError "Unknown comment type"

    textComment :: ScraperWithError Comment
    textComment = do
        author      <- text $ "span" @: [hasClass "author"]
        commentText <- text $ "div"  @: [hasClass "text"]
        return $ TextComment author commentText

    imageComment :: ScraperWithError Comment
    imageComment = do
        author   <- text       $ "span" @: [hasClass "author"]
        imageURL <- attr "src" $ "img"  @: [hasClass "image"]
        return $ ImageComment author imageURL

This prints out:

Left "Unknown comment type"

from scalpel.

fimad avatar fimad commented on September 17, 2024

Given that this doesn't require a ton of additional code on the users behalf, I plan to update the documentation with some of these examples but not change the API to add types like ScraperWithError.

I think it makes sense to keep the API simple and unopinionated about error handling and then the user can choose the types and approach that makes sense for them / works well with their code base.

from scalpel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.