bitemyapp / bloodhound Goto Github PK

View Code? Open in Web Editor NEW

421.0 421.0 118.0 2.39 MB

Haskell Elasticsearch client and query DSL

Home Page: bitemyapp.com

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.65% Haskell 98.75% Shell 0.19% Nix 0.41%

bloodhound's People

Contributors

Stargazers

Watchers

Forkers

mcphersoncreative zhaoyul michaelxavier queertypes ccarter maxgabriel borisyukd chrisguiney latkins unthingable annakopp rowhit bobjflong bnordbo ben-hunter-hansen soostone dmjio silkapp bermanjosh cies sjfloat jasonzoladz superduper sjakobi krystalcode bgamari mailonline jdreutt mhova victorgan maaronking vfiles andrewthad winterland1989 nakaji-dayo alexeiras cortlandd dinnu93 alistair codedmart 23skidoo phadej wraithm justinwhite shmish111 flip111 lpalma vadimich krisajenkins packetloop tmcgilchrist ndcroos lunaris ymyzk chrissound yannick-cw jagare alexeyzab femiagbabiaka iporsut m-matth sergey-kintsel rvl danidiaz reactormonk ashutoshrishi f-o-a-m shulhi plow-technologies ptillemans vlatkob iand675 hexresearch mercurytechnologies s9gf4ult tarsbase marinelli andrewdmeier adetokunbo josed92 chessai bitnomial naglalakk wireapp codygman scrive lyoneer fabfianda formbay treetide blackheaven teto novainsilico ivanbakel betterteamapp glottologist tristancacqueray mirokuratczyk gmarpons mackeyrms

bloodhound's Issues

Server connection establishment - throws multiple errors

I wrote a sample code (to connect to elasticsearch server) that runs in the main module, here - https://github.com/madhavan020985/haskell-examples/blob/master/esearch/es_hello.hs#L17.

It throws at least three errors as,
es_hello.hs:16:1: Couldn't match expected type ‘IO t1’ with actual type ‘a0 -> m0 a0’
es_hello.hs:17:18: Couldn't match expected type ‘a -> t0’ with actual type ‘Server’
es_hello.hs:17:25: Couldn't match expected type ‘Text’ with actual type ‘[Char]’

Does Server constructor expect Text (or) [Char].

This is a code, i took it straight from the Readme page. And i changed it a bit to run inside main instead of ghci....

"fields" isn't fully supported

I was just going to solve this but I ended up not needing this, so I figured I'd write it up here as a known issue and discuss the solution.

We support specifying "fields" in the search. When you do that, the response will have a fields attribute with a map from field name to value. Combining the docs and my experimentation, the fields you specify must be "leaves". If we want to be really precise with the types, the type would be something like HashMap Text ValueWithNoObjectConstructor

Update Index Settings

I've hit a case where it would be useful to have this API so this is an issue do discuss the possible implementation. The documentation is refreshingly detailed about updating index settings:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

My first thoughts on a plan:

We can't use IndexSettings. The docs show a much more rich set of options that can be tweaked than IndexSettings provide.
You will get an error from the server if you provide an empty set of updates. Somewhere in the update type it should have a NonEmpty IndexSettingsUpdate.
IndexSettingUpdate would be a sum type over all the settings shown.
The type would then be something like updateIndexSettings :: MonadBH m => NonEmpty IndexSettingUpdate -> IndexName -> m Reply

Let me know what you think of the plan.

Incorrect mapping API is used

According to the docs, the proper URL to call when installing a mapping is /INDEX/_mapping/MAPPING_NAME. This is true for protocol versions 1.6, 1.5, 1.4, 1.3 and 1.2, which are tested for in .travis.yml. However, the URL being called by putMapping (et al.) is /INDEX/MAPPING_NAME/_mapping, which was last seen in protocol version 0.90.

This occurs for putMapping and deleteMapping.

Use Text instead of String in DocId, Server, IndexName, MappingName, etc.

Use of the String type is discouraged in favor of Text (unicode) in libraries and apps meant for production. It is particularly relevant in database libraries, as many use cases will manipulate millions of items per day, hitting problems with String in essentially a "tight loop".

Proper support for ElasticSearch 2.x

The newer versions of ElasticSearch have improved a lot over older versions.

Is there a plan to provide support for newer versions of ElasticSearch while focusing on aggregations?

Validate bulk id empty string behavior

Either don't provide the column at all (Rely on ES auto-generation) or let the user provide the document ID.

Add support for highlighting

A quick overview of highlighting: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html

It would be nice if bloodhound supported this. It looks like it would be necessary to alter ESResult, and Search, as well as having some new function to create the appropriate Search.

Implement QueryRangeQuery

post-Aeson disaster release

$ stack test bloodhound:doctests
bloodhound-0.10.0.0: unregistering (local file changes: src/Database/Bloodhound/Types.hs)
bloodhound-0.10.0.0: build (lib + test)
Preprocessing library bloodhound-0.10.0.0...
[3 of 5] Compiling Database.Bloodhound.Types ( src/Database/Bloodhound/Types.hs, .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/Database/Bloodhound/Types.o )
In-place registering bloodhound-0.10.0.0...
Preprocessing test suite 'doctests' for bloodhound-0.10.0.0...
bloodhound-0.10.0.0: copy/register
Installing library in
/home/callen/work/bloodhound/.stack-work/install/x86_64-linux/lts-5.1/7.10.3/lib/x86_64-linux-ghc-7.10.3/bloodhound-0.10.0.0-5u94YzqKDWx4wX2dRohqIF
Registering bloodhound-0.10.0.0...
bloodhound-0.10.0.0: test (suite: doctests)


src/Database/Bloodhound/Types.hs:2099:34:
    No instance for (ToJSON (NonEmpty Text)) arising from a use of ‘.=’
    In the expression: fieldName .= terms
    In the expression: [fieldName .= terms]
    In an equation for ‘conjoined’: conjoined = [fieldName .= terms]

src/Database/Bloodhound/Types.hs:2266:32:
    No instance for (ToJSON (NonEmpty SimpleQueryFlag))
      arising from a use of ‘.=’
    In the expression: "flags" .= simpleQueryStringFlags
    In the expression:
      ["fields" .= simpleQueryStringField,
       "default_operator" .= simpleQueryStringOperator,
       "analyzer" .= simpleQueryStringAnalyzer,
       "flags" .= simpleQueryStringFlags, ....]
    In an equation for ‘maybeAdd’:
        maybeAdd
          = ["fields" .= simpleQueryStringField,
             "default_operator" .= simpleQueryStringOperator,
             "analyzer" .= simpleQueryStringAnalyzer, ....]

src/Database/Bloodhound/Types.hs:2286:5:
    No instance for (ToJSON (NonEmpty FieldName))
      arising from a use of ‘toJSON’
    In the expression: toJSON fieldNames
    In an equation for ‘toJSON’:
        toJSON (FofFields fieldNames) = toJSON fieldNames
    In the instance declaration for ‘ToJSON FieldOrFields’

src/Database/Bloodhound/Types.hs:2470:33:
    No instance for (ToJSON (NonEmpty StopWord))
      arising from a use of ‘.=’
    In the expression: "stop_words" .= stopwords
    In the expression:
      ["like_text" .= text, "percent_terms_to_match" .= percent,
       "min_term_freq" .= mtf, "max_query_terms" .= mqt, ....]
    In an equation for ‘base’:
        base
          = ["like_text" .= text, "percent_terms_to_match" .= percent,
             "min_term_freq" .= mtf, ....]

src/Database/Bloodhound/Types.hs:2489:27:
    No instance for (FromJSON (NonEmpty StopWord))
      arising from a use of ‘.:?’
    In the second argument of ‘(<*>)’, namely ‘o .:? "stop_words"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisFieldQuery <$> o .: "like_text" <*> pure fn
       <*> o .:? "percent_terms_to_match"
       <*> o .:? "min_term_freq"
       <*> o .:? "max_query_terms"
       <*> o .:? "stop_words"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisFieldQuery <$> o .: "like_text" <*> pure fn
       <*> o .:? "percent_terms_to_match"
       <*> o .:? "min_term_freq"
       <*> o .:? "max_query_terms"
       <*> o .:? "stop_words"
       <*> o .:? "min_doc_freq"’

src/Database/Bloodhound/Types.hs:2505:29:
    No instance for (ToJSON (NonEmpty FieldName))
      arising from a use of ‘.=’
    In the expression: "fields" .= fields
    In the expression:
      ["like_text" .= text, "fields" .= fields,
       "percent_terms_to_match" .= percent, "min_term_freq" .= mtf, ....]
    In an equation for ‘base’:
        base
          = ["like_text" .= text, "fields" .= fields,
             "percent_terms_to_match" .= percent, ....]

src/Database/Bloodhound/Types.hs:2509:33:
    No instance for (ToJSON (NonEmpty StopWord))
      arising from a use of ‘.=’
    In the expression: "stop_words" .= stopwords
    In the expression:
      ["like_text" .= text, "fields" .= fields,
       "percent_terms_to_match" .= percent, "min_term_freq" .= mtf, ....]
    In an equation for ‘base’:
        base
          = ["like_text" .= text, "fields" .= fields,
             "percent_terms_to_match" .= percent, ....]

src/Database/Bloodhound/Types.hs:2523:27:
    No instance for (FromJSON (NonEmpty FieldName))
      arising from a use of ‘.:?’
    In the second argument of ‘(<*>)’, namely ‘o .:? "fields"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisQuery <$> o .: "like_text" <*> o .:? "fields"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisQuery <$> o .: "like_text" <*> o .:? "fields"
       <*> o .:? "percent_terms_to_match"’

src/Database/Bloodhound/Types.hs:2528:27:
    No instance for (FromJSON (NonEmpty StopWord))
      arising from a use of ‘.:?’
    In the second argument of ‘(<*>)’, namely ‘o .:? "stop_words"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisQuery <$> o .: "like_text" <*> o .:? "fields"
       <*> o .:? "percent_terms_to_match"
       <*> o .:? "min_term_freq"
       <*> o .:? "max_query_terms"
       <*> o .:? "stop_words"’
    In the first argument of ‘(<*>)’, namely
      ‘MoreLikeThisQuery <$> o .: "like_text" <*> o .:? "fields"
       <*> o .:? "percent_terms_to_match"
       <*> o .:? "min_term_freq"
       <*> o .:? "max_query_terms"
       <*> o .:? "stop_words"
       <*> o .:? "min_doc_freq"’

/cc @MichaelXavier @MHova @crough this is where I'm at on things. Not sure why doctests is fucking me over, build and unit tests pass fine rn.

Finished in 21.2762 seconds
171 examples, 0 failures

If anyone has ideas, please pipe up. I don't know why doctests is making this unpleasant.

Build failures due to type mismatch

Probably something misconfigured on my end, but I'm getting this when running make:

» make
cabal test
Building bloodhound-0.1.0.1...
Preprocessing library bloodhound-0.1.0.1...
[2 of 5] Compiling Database.Bloodhound.Types ( Database/Bloodhound/Types.hs, dist/build/Database/Bloodhound/Types.o )

Database/Bloodhound/Types.hs:780:33:
    Couldn't match expected type `text-1.1.0.0:Data.Text.Internal.Text'
                with actual type `Text'
    In the first argument of `(.=)', namely `field'
    In the expression: field .= toJSON value
    In the expression: [field .= toJSON value]

Database/Bloodhound/Types.hs:780:33:
    Couldn't match type `text-1.1.0.0:Data.Text.Internal.Text'
                  with `Text'
    Expected type: (Text, Value)
      Actual type: aeson-0.7.0.1:Data.Aeson.Types.Internal.Pair
    In the expression: field .= toJSON value
    In the expression: [field .= toJSON value]
    In an equation for `maybeJson':
        maybeJson field (Just value) = [field .= toJSON value]

Database/Bloodhound/Types.hs:785:34:
    Couldn't match expected type `text-1.1.0.0:Data.Text.Internal.Text'
                with actual type `Text'
    In the first argument of `(.=)', namely `field'
    In the expression: field .= fmap toJSON value
    In the expression: [field .= fmap toJSON value]

Database/Bloodhound/Types.hs:785:34:
    Couldn't match type `text-1.1.0.0:Data.Text.Internal.Text'
                  with `Text'
    Expected type: (Text, Value)
      Actual type: aeson-0.7.0.1:Data.Aeson.Types.Internal.Pair
    In the expression: field .= fmap toJSON value
    In the expression: [field .= fmap toJSON value]
    In an equation for `maybeJsonF':
        maybeJsonF field (Just value) = [field .= fmap toJSON value]
Makefile:3: recipe for target 'test' failed
make: *** [test] Error 1

All requests should take Manager as an argument instead of creating a fresh one each time

Creating a fresh manager each time is not appropriate in production. Sharing a single Manager, supplied by the API user, for all operations:

Gives the user control over the various arguments in creating Manager (TLS settings, pooling, etc.)
Reuses connections to same server, avoiding lots of overhead
Limits number of concurrent queries inherently to the total count of connections allowed
Preserves/limits FDs used - too many concurrent requests can deplete them otherwise

put function sends POST instead of PUT

bloodhound/src/Database/Bloodhound/Client.hs

Line 203 in 1db8ad7

put = dispatch NHTM.methodPost

Alternative library now available

Could I interest you in http://github.com/bitemyapp/bloodhound/ ?

Let me know if there's any functionality missing or design you think could be improved :)

Expand querying test coverage

Score is nullable

It turns out that at least in maxScore and hitScore in a result set, score can be null. Right now you represent that as a type alias for Double, but when the library parses it, it parses to NaN. I'm guessing the docs aren't that clear on the nullability so it may be appropriate to just universally update the type of Score.

Add ability to install index templates

Index templates are described here, but I don't think there is a way to install them yet.

MonadMask / bracket

Should BH have a MonadMask instance? What's the right thing to do here?

/cc @MichaelXavier

Documentation on advanceScroll is wrong

According to the latest ElasticSearch docs, the terminating condition for scrolling is an empty hits array, not the lack of a _scroll_id

Each call to the scroll API returns the next batch of results until there are no more results left to return, ie the hits array is empty.

In fact, the previous _scroll_id is included in the response so if one relies on the scrollId field being Nothing as the terminating condition, one will get into an infinite loop like I did!

The documentation for advanceScroll should be corrected.

Find out what breaks on what Elasticsearch versions

I'm mostly testing with 1.0, have tested with 0.9X briefly, think one thing didn't work. Would like to see how far back I can go before things start breaking.

TermsQuery is buggy and needs changed

Change to: (NonEmpty Terms) MinimumShouldMatch and don't set tags field.

doco appears to be wrong for advanceScrollId

afaict, a scrollId will always be returned (at least in elasticsearch 2.1.1) - signalling appears to be done by having an empty list of hits.

Support for other search types

Firstly - awesome work, thanks. We're starting to use your package in code moving towards production, and it's made a lot of my SQL -> Elastic migration much, much easier.

I was wondering if we'd be able to open up other potential search-types; I specifically am looking for scan. I'm happy to do do a PR; if that's ok with you, do you have an approach you'd prefer?

My first (very rough) thought is to add a new ADT as a field in the Search type, which would be used to trigger some new logic in dispatchSearch to add some query parameters. This way it wouldn't be a breaking change for any of the search functions, or any smart constructors (once I updated them, obviously). Since you have a manual toJSON, it can be safely ignored for the body.

Thoughts?

Thanks!

dangling link in docs

http://hackage.haskell.org/package/bloodhound-0.7.0.1/docs/Network-HTTP-Types-Method.html#t:Method in http://hackage.haskell.org/package/bloodhound-0.7.0.1/docs/Database-Bloodhound-Types.html#t:Reply is a dead link - might just be a hackage hiccup?

Aeson 0.11 migration

I think this has the same behavior as 0.9, but this is where they're headed long-term.

Options:

Roll back to 0.9 since the semantics are the same and get in LTS 5. Upgrade to 0.11 with the next LTS.
Update to 0.11 now and then bug the Stackage people into doing LTS 6 with Aeson 0.11.

Opinions? Thoughts?

/cc @MichaelXavier @MHova

Discussion: Dealing with ES 2.0

Just thought I'd start an issue for this since the topic has come up a
few times. I personally don't have an immediate need for 2.0 since I
don't run it but I think it will start being a default choice for
self-hosted solutions since ES seem to be burying 1.x documentation.

One way to do it would be to split the module tree and share types and
APIs where possible by moving them to a common internal module. Some
of the APIs seem to be the ame according to @MHova's experiences. To
avoid breaking everything the default Database.Bloodhound module
could alias to Database.Bloodhound.V1.

Another alternative is to keep one module and have the BHEnv know
what version of ES its targeting. Maybe it would be a GADT or phantom
type or something. This could also probably apply to MonadBH, such
that shared operations would run in MonadBH v m and v2-only
operations in MonadBH V2 m.

I think it will end up being a massive amount of work though so it
will probably get done mostly when some enterprise user needs it and
can invest the time. I think serialization will be especially bad
since JSON instances will likely vary between
versions. ToJSON/FromJSON being globally scoped and all makes it
hard to adapt to one version or another.

Lastly, I wonder if a good first step would be to create a branch and
run the test suite with V2 installed. That will give us a good
progress report. If we go down the split module tree route, we'll
probably need to break up the test suite accordingly (which is fine by
me since one large monadic operation puts SHM in a chokehold).

Thoughts?

Add support for reverse_nested aggregations

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html

I'm new to ElasticSearch so I'm just starting to learn things by using the Bloodhound types for guidance. I got a use case where apparently I need to use reverse nested aggregation but I notice it doesn't seem to be implemented in Bloodhound so I'm a bit lost about how it fits together with other search terms.

MaxScore Monoid

https://hackage.haskell.org/package/reactive-0.11.5/docs/Data-Max.html

/cc @MichaelXavier

Sounded kosher, prima facie, to me. I'm willing to wait on a release for this if you'd like this.

Bump versions on semigroups and http-types in 0.8

Any chance you could bump the versions on semigroups and http-types for bloodhound-0.8.0.0 in hackage? Because of the aeson-0.10 mess we've decided not to upgrade aeson yet. This can be done with a simple package metadata edit on hackage, so I decided not to bring this up with a pull request. I can verify that bloodhound-0.8.0.0 does indeed build successfully with semigroups-0.18.0.1 and http-types-0.9.

sniff support plan

Does bloodhound support cluster sniffing? if not, is there any plan for cluster sniffing? This is a quiet important feature since my company's es cluster may add nodes in future.

Optimistic Concurrency Control on Writes

https://www.elastic.co/guide/en/elasticsearch/guide/current/optimistic-concurrency-control.html

It turns out ES has support for sending the version number of a retrieved record when you're doing a write as a concurrency control mechanism. It looks like this feature came around in 0.15. There's a few issues to hammer out. I'll post them here. I'll probably take a first stab on our fork since we need this feature sooner rather than later, but I'll happily update the solution if there's a more appropriate one for bloodhound proper.

Version Types

There seem to be 4 types of concurrency directives when indexing a document:

internal. You're using the versioning scheme ES establishes
external/external_gt. Docs imply these are the same, you provide
your own version as a non-negative long number and the write
succeeds if the version has incremented or there's no doc. This
means that every single write must increment this number, which is
different than internal which expects the version value to stay the
same and will auto-increment if its not provided.
external_gte This is pretty much internal with your own version
numbers.
force No conflict checking is done. Take the value supplied and
use that as the new current version.

So we'd probably have a sum type VersionType that covers these cases

I definitely want to create a newtype Version with a smart
constructor and Bounded instances to bake in this positive number
invariant.

I am also considering adding a newtype over Version named ExternalVersion
and using that type in the external constructors of VersionType,
that way in user code you need to explicitly construct an external
version. When you get a result back from the server, it will be a
plain Version, since ES won't really tell us the disposition of a version.

New function or modify existing

I started leaning towards adding an indexDocumentVersioned but I
think this is a mistake because there are other knobs on indexing
(e.g. operation type)
that we'll probably need to support some day.

Instead I advocate for adding an IndexDocumentSettings type and a
defaultIndexDocumentSettings value. Version type would be mandatory
and default to internal. Supplying a version will be optional. This
will let us only have to rip the band-aid of changing function arity
once.

Handling conflicts

This part is still murky to me. I'm not actually sure what the driver
does right now when you get an error. It looks like you get a
VersionConflictEngineException error if there's a versioning
issue. The user is at the very least going to need documentation of
how to catch this case, as they'll likely want to re-read and retry.

Don't use ESProtocolException for user-caused JSON parsing errors

The docs around ESProtocolException direct the user to write a bug in the issue tracker if this error is ever encountered because it is indicative of an incompatibility between Bloodhound and ElasticSearch.

Well, that's not always the case. Speaking purely hypothetically of course, if somebody were to screw up their own JSON domain object mapping causing SearchResult.searchHits.hits to not parse, an ESProtocolException will be thrown despite the fact that the Bloodhound library is fine.

Would it be a good idea (or even possible) to throw a different kind of Exception in this case and reserve ESProtocolException for things that are truly wrong with the library?

FromJSON

Hello,

Under what circumstances would one want to use FromJSON instance for Queries?

Impossible to get child documents

As it stands, it is impossible for a user to retrieve/test for the existence of individual child documents, since the parent ID must be specified (?parent=...). This is needed so that ES can route the request to the proper shard. How should the API be altered to accommodate this?

Lift Elasticsearch errors into something more explicit

@MichaelXavier I was hesitant to lift errors into a concrete datatype because they didn't document anything and I'd have to figure out how to reproduce everything for testing. Means having an "UnknownError" case is virtually unavoidable.

Seems a good idea to me.

Support Missing Aggregation

Add support for the "Missing Aggregation", which has been available in ElasticSearch since 1.3.

Some errors in README

Some lines in the README are messed up like right around this heading. I think it's due to this commit? 5ccffd3

Change range values to be pairings of Double or date ISO format string

Currently just Double.

Eliminate partiality in ESResult

The ESResult type is described as the structure for when a query is successful. However, in the case where found would be False (i.e. an ID lookup that doesn't exist), only the _index, _type, _id, and found fields are present. From a practical matter, to parse the found=false case without partiality, you'd have to parse it into a raw Value, assert its an Object and then lookup the found field and do something different if its false, because your JSON parse will fail if it is.

I'm not sure if this is the best solution but I propose ESResult to have a field foundResult :: Maybe (ESFoundResult a), which would take up the fields _version and _source. You could even provide a backwards compatibility function found :: ESResult -> Bool; found = isJust . foundResult. This idea only holds together if there aren't further wacky permutations of ESResult that ES may send back, but I haven't encountered such cases yet.

Support more update options

Currently updateDocument only uses "doc" option to partially replace document.
But ElasticSearch also supports these:

"script";
"upsert";
"script_upsert";
"detect_noop";
"doc_as_upsert".

TermsResult cannot parse from numeric or boolean JSON

Specifically because termKey is declared as type Text, non-String bucket terms like numbers and booleans will fail parsing.

We could either keep termKey as type Text and put some more smarts in parseJSON or introduce more data constructors (TextTermsResult, NumericTermsResult, etc) and also put some more smarts in parseJSON.

Expand querying functionality coverage

Requirements for next release

Just wanted to check in and see if we were waiting on any more issues in preparation of the next release. If there are any, I'd like to get a definitive list so we can get those knocked out and ship. Thanks!

Sub-aggregation parsing doesn't work

Sub-aggregations aren't tagged by an "aggregations" key unlike top-level aggregations, which the FromJSON instances for TermsResult and DateHistogramResult rely on. Instead, the name of the sub-aggregation is nested directly in the record for the parent aggregation.

For an example, see the documentation of the significant terms aggregation.

Add support for index aliases

I'm knee deep in this one right now so this is a placeholder. This will add support for creating/updating index aliases as well as reading them back.

One thing important to note is this idea of adding these PUT/POST APIs and then adding the corresponding GET for testing and general use is that bloodhound may no longer get to punt on parsing the response. Most ops now are updates so they just return m Reply. For an API like say getIndexAliases whose sole purpose is to return a well-typed response, it seems unsatisfying to return an m Reply and hint in the docs that you should try parsing it as the right type.

The temporary approach I'm taking while working on this is to return m (Either EsError TheThingYouExpect). It will try to parse the thing first and failing that, will parse an EsError. AFAIK all ElasticSearch errors conform to the EsError format, so misses on this should be rare. If it can parse neither, it throws a StatusCodeException from network client.

So perhaps mull over this approach and see what you think about it. It will also be relevant to that index settings ticket in #76. We'll probably use some subset of the index aliases feature in a Real App before I submit the PR.

Suppport generating HTTP requests without sending

Functions such as bulk both serialize the arguments into raw HTTP requests and send them. It would be nice to be able to generate requests without sending (that is, to hook into the function and intercept requests). My immediate use case is related to performance: I'm able to fully evaluate some HTTP requests while waiting for others to complete. Another use case is for logging or debugging.

Restore bloodhound to stackage

Please release a version of bloodhound that simultaneously supports aeson-0.9.0.1, http-types-0.9, and semigroups-0.18.0.1 in order to be restored to Stackage. Even if lts-5.0 is released without bloodhound, we can get it into lts-5.1 or later, as soon as a compatible version is released.

See: commercialhaskell/stackage#1155

Index new documents using ElasticSearch POST API

indexDocument currently requires that the user provide a valid DocId. However, ElasticSearch provides a facility to auto-generate document ID's using the POST request described on this page.

It would be nice if indexDocument perhaps took a Maybe DocId. However, being pretty new to Haskell, I haven't quite figured out how to get that to work. In the meantime I just added indexDocument' to my code (really hacky, but works for the time being).

indexDocument' :: (ToJSON doc, MonadBH m) => IndexName -> MappingName
                 -> IndexDocumentSettings -> doc -> m Reply
indexDocument' (IndexName indexName)
  (MappingName mappingName) cfg document =
  bindM2 post url (return body)
  where url = addQuery params <$> joinPath [indexName, mappingName]
        versionCtlParams = case idsVersionControl cfg of
          NoVersionControl -> []
          InternalVersion v -> versionParams v "internal"
          ExternalGT (ExternalDocVersion v) -> versionParams v "external_gt"
          ExternalGTE (ExternalDocVersion v) -> versionParams v "external_gte"
          ForceVersion (ExternalDocVersion v) -> versionParams v "force"
        vt = T.pack . show . docVersionNumber
        versionParams v t = [ ("version", Just $ vt v)
                            , ("version_type", Just t)
                            ]
        parentParams = [] -- case idsParent cfg of
          --Nothing -> []
          --Just (DocumentParent (DocId p)) -> [ ("parent", Just p) ]
        params = versionCtlParams ++ parentParams
        body = Just (encode document)

Obviously there are much better ways to do this. I apologize for posting a request without a reasonable possible solution.

Thanks for your work on this library!

BulkIndex claims to upsert but doesn't

src/Database/Bloodhound/Types.hs:

{-| 'BulkOperation' is a sum type for expressing the four kinds of bulk
    operation index, create, delete, and update. 'BulkIndex' behaves like an
    "upsert", 'BulkCreate' will fail if a document already exists at the DocId.

Nope, BulkIndex behaves like an "index".

Provide safety by using URL-encoding

Thank you for publishing this nice library.

Wrapped in its own type, I expected DocId to safely handle any input. However, indexDocument throws an InvalidUrlException at me because I've got slashes and spaces in there. Would you please add automatic URL encoding?

If doing this by default really hampers with performance, could we at least expose encoding functionality as courtesy to users?