As mentioned in #73, execution of SearchService.search
currently incurs a runtime error (formatted for readability):
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[itemStatus] query malformed, no start_object after query name",
"line": 1,
"col": 64
}
],
"type": "parsing_exception",
"reason": "[itemStatus] query malformed, no start_object after query name",
"line": 1,
"col": 64
},
"status": 400
}
The underlying issue is that the query we send to Elasticsearch is malformed.
The broken Elasticsearch query
Here's a full example of a query that might be generated with the current implementation:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"must_not": {
"itemStatus": "Created"
},
"must": [
{
"multi_match": {
"query": "",
"fields": [
"title",
"description"
]
}
},
{
"price": {
"lte": 0
}
},
{
"match": {
"currencyId": "USD"
}
}
]
}
},
"sort": [
{
"auctionEnd": {
"order": "desc",
"unmapped_type": "boolean"
}
},
{
"price": "asc"
}
]
}
There's two problem areas with the this query as it's currently formed, one within the must
clause, and the other within the must_not
clause.
The broken must
clause
Here’s the must
clause in isolation:
{
"must": [
{
"price": {
"lte": 0
}
}
]
}
To fix this up, we need to inject a range
clause to make the price filter valid:
{
"must": [
{
"range": {
"price": {
"gte": 0
}
}
}
]
}
The broken must_not
clause
Here's the must_not
clause in isolation:
{
"must_not": {
"itemStatus": "Created"
}
}
To fix this, we need to inject a match
clause, and also change the field name from itemStatus
to status
:
{
"must_not": {
"match": {
"status": "Created"
}
}
}
Solution Proposal
While the changes we need to see in the JSON sent to Elasticsearch are relatively straightforward, the implementation of these changes surface a few questions. The bulk of these questions centers around our implementation of the AST models that build out the query payload.
In the current implementation, we build out an internal representation of query using a set of models that roughly mirror the AST of the Elasticsearch DSL. Having spent a small amount of time reading their documentation, it appears that a full implementation of this AST / DSL would be quite an involved effort, and one that we would likely consider beyond the scope of the core intentions of this project. Entire projects have devoted themselves to this effort (see https://github.com/sksamuel/elastic4s for an example).
Using A Domain-Specific Query Model
As opposed to creating a data object that mirrors the AST of Elasticsearch, and using the JSON serialization of those objects to build our JSON payload, suppose our model for the query more closely matched the particular domain we’re dealing with here, and we use Elasticsearch-specific JSON serialization to define how the actual query payload is created. Looking at the SearchService.search
method, we find the following case class is the argument to that method:
case class SearchRequest(keywords: Option[String], maxPrice: Option[Int], currency: Option[String])
If we take the approach of modeling a query object after our domain (as opposed to after the Elasticsearch query DSL’s AST), we might end up with a query model that looks something like:
case class ItemQuery(titleDescriptionQueryString: Option[String],
minPrice: Option[Int],
currencyId: Option[String],
pageNumber: Int,
pageSize: Int)
JSON serialization for this class could be implemented as follows:
object ItemQuery {
implicit val format: Format[ItemQuery] = new Format[ItemQuery] {
override def reads(json: JsValue): JsResult[ItemQuery] = ???
override def writes(itemQuery: ItemQuery): JsObject = {
val fromOffset = itemQuery.pageNumber * itemQuery.pageSize
Json.obj(
"query" -> Json.obj(
"bool" -> Json.obj(
"must_not" -> Json.obj(
"match" -> Json.obj(
"status" -> ItemStatus.Created)),
"must" -> Json.arr(
Seq(
itemQuery.titleDescriptionQueryString.map(queryString =>
Json.obj("multi_match" -> Json.obj(
"query" -> queryString,
"fields" -> Json.arr("title", "description")
))),
itemQuery.minPrice.map(minPrice =>
Json.obj("range" -> Json.obj(
"price" -> Json.obj("gte" -> minPrice)
))),
itemQuery.currencyId.map(currencyId =>
Json.obj("match" -> Json.obj("currencyId" -> currencyId)))
).flatten[JsObject]
)
)
),
"from" -> fromOffset,
"size" -> itemQuery.pageSize,
"sort" -> Json.arr(
Json.obj(
"auctionEnd" ->
Json.obj(
"order" -> "desc",
"unmapped_type" -> "boolean"
)
),
Json.obj("price" -> "asc")
)
)
}
}
}
I realize this approach is quite a departure from the current implementation. With this departure, we might argue that the ItemQuery object isn’t as flexible in the Elasticsearch query that it’s able to produce, and that would be true.
As a counterpoint to that argument, we might point out that this application currently strives to satisfy no use cases beyond the one at-hand, and if additional search use cases are added in the future, this approach could be generalized at that time.
If it’s important that we proceed with a clear definition of how that more generalized solution might look, it would be helpful to identify the specific use case(s) that warrant this generalization. Given a concrete set of use cases, we can create an implementation that satisfies a varying set of cases, giving the reader the ability to see why the generalized solution is necessary, and how it supports the particular use cases it was designed for.
@ignasi35 as the author of most of the Elasticsearch integration, what's your thoughts on this? Would you be open to a departure from the current implementations approach?
For an example of how this might look in total, please see #111.
Let me know if you have any questions or suggestions! Thanks!