Comments (6)
Having used Network.HTTP.Client
a bit, and replaced all of my curl code (not just for scalpel) by it, I'm very happy.
To accomodate my earlier suggestion I would suggest having something like Maybe Manager
in your Opts
, which will make scalpel
create its own manager if it's set to Nothing
, but will reuse yours otherwise.
from scalpel.
If you could PR/or even fork that would be really good for us windows users struggling with cURL as well 😄
from scalpel.
@loopedice Good point.
Maybe in a few days I'll have time to submit a PR, but in the meantime here's my code ported from scalpel
but without the curl dependency. I believe the dependencies of this code are:
- http-client >= 0.5.5 && < 0.6
- http-client-tls >= 0.3.1 && < 0.4
- http-conduit == 2.2.*
- http-types == 0.9.*
- say
- socks >= 0.5.5 && < 0.6 (I needed support for SOCKS5 proxies but you may not)
Disclaimer: I just copied and pasted this code. I have a custom prelude so expect this not to build without some tweaks.
import qualified Data.ByteString.Lazy as Bz
import Data.Default (def)
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Say
import Network.Connection (ProxySettings(SockSettingsSimple))
import Network.HTTP.Client
import Network.HTTP.Client.TLS (mkManagerSettings)
import Network.HTTP.Types.Header (hContentType, hUserAgent)
import Network.Socks5.Types (SocksError)
newtype Url = Url Text deriving (Eq, Show)
data FetchError = DontRetry | DoRetry deriving (Eq, Show)
mkHttpManager :: Text -> Int -> IO Manager
mkHttpManager proxyHost_ proxyPort_ =
newManager $
mkManagerSettings def (Just $ SockSettingsSimple (T.unpack proxyHost_) (fromIntegral proxyPort_))
getWith :: Manager -> Text -> Url -> IO (Either FetchError Text)
getWith manager userAgent (Url url) = do
reqRaw <- parseUrlThrow (T.unpack url)
let req = reqRaw{ requestHeaders = [(hUserAgent, encodeUtf8 userAgent)] }
(Right . defaultDecoder <$> httpLbs req manager) `catches`
[ Handler $ \(e :: HttpException) -> do
say $ "Download failed for URL '" <> url <> "': " <> T.pack (show e)
pure $ Left DontRetry
, Handler $ \(e :: SocksError) -> do
say $ "PROXY ERROR -> '" <> T.pack (show e)
pure $ Left DoRetry
]
-- | The default response decoder. This decoder attempts to infer the character
-- set of the HTTP response body from the `Content-Type` header. If this header
-- is not present, then the character set is assumed to be `ISO-8859-1`.
--defaultDecoder :: CurlResponse -> Text
defaultDecoder :: Response Bz.ByteString -> Text
defaultDecoder response = choosenDecoder (Bz.toStrict $ responseBody response)
where
contentType = [ T.decodeUtf8 x
| (header, x) <- responseHeaders response
, header == hContentType
]
isType :: Text -> Bool
isType t
| [ct] <- contentType = (T.toLower $ "charset=" <> t) `T.isInfixOf` (T.toLower ct)
| otherwise = False
choosenDecoder | isType "utf-8" = T.decodeUtf8
| otherwise = T.decodeLatin1
from scalpel.
I've come to terms with breaking the API and dropping curl :)
One thing that I'd like in a curl replacement though is the flexibility to configure how the connection is made. Currently we can pass in a list of CurlOptions
. Does http-client-tls
provide such flexibility?
Alternatively, now that we have scalpel-core
which doesn't have any HTTP support, would it make sense to have something a bit heaver like wreq
in scalpel
? It seems to use http-client-tls
under the hood so should be equally friendly to windows users while providing an easy way to configure the request.
from scalpel.
Something that's somewhat related:
I find myself wishing for the ability to reuse connections to the same server. At the moment, it seems like scalpel
basically forces me to re-create a new Curl context for every single request; even if I'm sending out millions of them. The only wayt to avoid this would be to ditch the built-in curl support and write my own.
I would have normally suggested making it so that I can pass my own Curl
handle, but if you're moving away from curl anyway, I would just like request the ability to re-use connections out of whatever new abstraction you come up with.
from scalpel.
Curl has been dropped in version 0.6.0
in favor of http-client
and http-client-tls
.
from scalpel.
Related Issues (20)
- 0.6.2.1 does not compile HOT 1
- build of 0.6.2.1 fails with GHC 9.8.1 (base 4.19.0.0) HOT 1
- Compilation on GHC newer than 9.4.x fails due to missing MonadFix HOT 2
- Class for things that can be parsed by Scalpel HOT 5
- Document scalpel's parsing algorithm
- Ignore this, was a mistake, sorry HOT 1
- Remove String and use StringLike HOT 1
- Allow direct manipulation of TagSpec object HOT 3
- \r\n HOT 4
- scalpel-core and scalpel have been removed from stackage HOT 2
- howto extract all attribute pairs of a tag HOT 1
- <|> is defined in hidden module HOT 2
- ScraperT instances for MonadThrow and MonadCatch
- Build failure with mtl-2.3 HOT 5
- `MonadFail`-related build failures in `scalpel-core <= 0.6.0` HOT 1
- Expose internals of AttributePredicate HOT 1
- Highly unintuitive behavior of `inSerial . stepNext . inSerial . stepNext` HOT 2
- Selector behavior depends on the tag name HOT 1
- Suggestion : AttributePredicate combinators HOT 1
- Replace Maybe with a descriptive error type (Either) HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scalpel.