gisaia / arlas-tagger Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Provide a REST endpoint which can list the tagging jobs (with various parameters: running or not, ...)
As a developer
I want to have the API documentation in the docs directory of the GitHub project
So that I can browse it without the need to import the swagger.json in the Swagger UI
Currently, a tag request modify one field of selected documents.
If I want to add multiple tags in multiple fields of the same documents, I have to make multiple request.
It seems that the documents are reindexed each time, which is very costly.
It would be nice to be able to tag several fields of the same documents in a unique request (same filter).
Update Tagger with column filtering constraints.
See gisaia/ARLAS-server#558
As the maintainer of the ARLAS-tagger service,
I want to provide an environment variable to the docker container or to the docker compose
So that the ARLAS-tagger uses theses properties for configuring the kafka client (like for securing the connection: ssl.endpoint.identification.algorithm=https
, security.protocol=SASL_SSL
, etc.)
Remove support for pwithin/gwithin/...
See the CMD and ENTRYPOINT in https://github.com/gisaia/ARLAS-tagger/blob/develop/Dockerfile-tagger#L39
Only the CMD should be kept.
The endpoint /status/{collection}/_tag should be using a QueryParam for the {id} instead of a PathParam
Add the HTTPS protocol in the swagger definition arlas-tagger-rest/src/main/java/io/arlas/tagger/rest/tag/TagRESTService.java
Add some retry mechanism in case of poll timeout and commit failure.
Support x-pack-security in order to connect to ES cloud
As the operator of the tagger with docker or docker compose or k8s
I want to be able to specify environment variable in the container launch instruction
So that the server starts with my preferences
Thoses should include at least the one found in the ARLAS Server and that are common to the Tagger plus the one specific to the Tagger.
Add the possibility to start multiple Kafka consumers on topic executeTag and partition on id of object.
id of requested tag job status should be reverted to query param instead of path param
The Kafks consumer used by the endpoint "/status/{collection}/_taglist" is used in a multithreaded context which is not supported by the consumer API.
WARN [2019-11-18 16:18:31,890] org.eclipse.jetty.server.HttpChannel: /arlas/tagger/status/collection/_taglist
! java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
! at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2201)
! at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2185)
! at org.apache.kafka.clients.consumer.KafkaConsumer.assignment(KafkaConsumer.java:853)
! at io.arlas.tagger.service.TagExploreService.getTagRefList(TagExploreService.java:51)
! at io.arlas.tagger.rest.tag.TagStatusRESTService.taggingGetList(TagStatusRESTService.java:114)
Update dependency to ARLAS Server 13.7.0 and add new configuration certificate_url
The following line prints in TRACE mode the result of the tagging result:
But, when tagUpdateResponse.failed
is greater than zero, then a message should be printed in the ERROR logs with the content of tagUpdateResponsefailures
in order to spot the failures.
Also, the returned io.arlas.tagger.model.response.UpdateResponse
does not contain the failures.
Upgrade to java 17 +
Dependencies update:
Use auto slicing in _update_by_query operation for unTag.
DEBUG [2021-03-16 10:52:49,824] io.arlas.tagger.kafka.TagKafkaConsumer: [Consumer clientId=fd6d5fdc-27fe-4831-a9c6-fc13a213b972, groupId=execute_tags_consumer_group] Kafka consumer has been closed
Exception in thread "Thread-18" ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.updateByQuery(RestHighLevelClient.java:599)
at io.arlas.tagger.core.FilteredUpdater.doAction(FilteredUpdater.java:67)
at io.arlas.tagger.service.UpdateServices.unTag(UpdateServices.java:48)
at io.arlas.tagger.service.TagExecService.processRecords(TagExecService.java:83)
at io.arlas.tagger.service.KafkaConsumerRunner.run(KafkaConsumerRunner.java:90)
at java.lang.Thread.run(Thread.java:748)
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [https://690a605d3db34f749f1c7bb57e08e45f.europe-west1.gcp.cloud.es.io:9243], URI [/ml_ais_flow/_update_by_query?slices=auto&requests_per_second=-1&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&max_docs=2147483647&timeout=1m], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}],"suppressed":[{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}]},{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ml_ais_flow","node":"Y5lyy8YHRRucKAE2vYcEYw","reason":{"type":"query_shard_exception","reason":"Can only use regexp queries on keyword and text fields - not on [tagging.num1] which is of type [long]","index_uuid":"JdeEIwULT1q6qji2PZghNQ","index":"ml_ais_flow"}}]}]},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 8 more
Migrate to ES High Level REST client, using ARLAS core classes.
As a user who wants to tag objects
I want to be able to replay a previous tag request
So that I don't have to specify again the same request parameters.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.