zentity-io / zentity Goto Github PK
View Code? Open in Web Editor NEWEntity resolution for Elasticsearch.
Home Page: https://zentity.io
License: Apache License 2.0
Entity resolution for Elasticsearch.
Home Page: https://zentity.io
License: Apache License 2.0
I commented on the Elasticsearch discourse about an issue I am having where a simple model with a resolver for name and phone leads to an exception due to a large clause and someone recommended I post it over here.
The exception looks like this:
org.elasticsearch.ElasticsearchException$1: maxClauseCount is set to 1024
at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:639) ~[elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:137) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:264) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:105) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.InitialSearchPhase.access$200(InitialSearchPhase.java:50) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.InitialSearchPhase$2.onFailure(InitialSearchPhase.java:273) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.3.2.jar:7.3.2]
This is on an index with about 13 million entries, and a model with a single resolver that looks at name and phone number. Iterating through a test set of amounting to 1000 records total where each record has a name and phone number, I get the above exceptions thrown periodically.
What's worse, anytime these errors are thrown, it takes around 10-30 seconds to resolve itself, which makes it too slow for processing the full data set (around 70k entries).
Just before the exception, the console dumps part of the query to stderr and it looks like a giant query with all of the different phone numbers in the index.
Is there something I can do to prevent this from happening? Is this a result of something I have configured incorrectly?
How can I include the _version
in the resolution result?
During a resolution job, zentity fails to access attributes whose values appear in an array of objects in the "_source"
field of the matching documents. This is likely due to the use of JsonPointer
to access attributes from documents (see also here), because the JSON Pointer syntax requires the index value for array elements. A potential solution is to replace the use of JsonPointer
with JsonPath
, which supports a syntax that can return all values within an array.
zentity should assume (like Elasticsearch) that each object in an array of objects has the same schema, and then during a resolution job, zentity should obtain attribute values from arrays of objects just like it obtains attribute values from object values or arrays of values.
Step 1. Create an index with a nested object.
PUT my_index
{
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
},
"phone": {
"type": "nested",
"properties": {
"number": {
"type": "keyword"
},
"type": {
"type": "keyword"
}
}
}
}
}
}
Step 2. Index two documents.
POST my_index/_bulk?refresh
{"index":{"_id":1}}
{"first_name":"alice","last_name":"jones","phone":[{"number":"555-123-4567","type":"home"},{"number":"555-987-6543","type":"mobile"}]}
{"index":{"_id":2}}
{"first_name":"allison","last_name":"jones","phone":[{"number":"555-987-6543","type":"mobile"}]}
Step 3. Create an entity model.
PUT _zentity/models/my_entity_model
{
"attributes": {
"first_name": {},
"last_name": {},
"phone": {}
},
"resolvers": {
"name_phone": {
"attributes": [
"last_name",
"phone"
]
}
},
"matchers": {
"exact": {
"clause": {
"term": {
"{{ field }}": "{{ value }}"
}
}
},
"exact_phone": {
"clause": {
"nested": {
"path": "phone",
"query": {
"term": {
"{{ field }}": "{{ value }}"
}
}
}
}
}
},
"indices": {
"my_index": {
"fields": {
"first_name": {
"attribute": "first_name",
"matcher": "exact"
},
"last_name": {
"attribute": "last_name",
"matcher": "exact"
},
"phone.number": {
"attribute": "phone",
"matcher": "exact_phone"
}
}
}
}
}
Step 4. Run a resolution job. Expect the first hop to match the given name and phone number (555-123-4567
), and expect the second hop to match the new phone number (555-987-6543
) from the document in the first hop.
POST _zentity/resolution/my_entity_model?queries
{
"attributes": {
"first_name": [ "alice" ],
"last_name": [ "jones" ],
"phone": [ "555-123-4567" ]
}
}
Step 5. The resolution job fails with the following error message:
io.zentity.model.ValidationException: Expected 'string' attribute data type.
at io.zentity.resolution.input.value.StringValue.validate(StringValue.java:52)
at io.zentity.resolution.input.value.Value.<init>(Value.java:35)
at io.zentity.resolution.input.value.StringValue.<init>(StringValue.java:28)
at io.zentity.resolution.input.value.Value.create(Value.java:57)
at io.zentity.resolution.Job.onSearchComplete(Job.java:755)
at io.zentity.resolution.Job.access$000(Job.java:50)
at io.zentity.resolution.Job$1.onResponse(Job.java:1052)
at io.zentity.resolution.Job$1.onResponse(Job.java:1045)
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:77)
at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:253)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:595)
at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:109)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:372)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:366)
at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:219)
at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:101)
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:107)
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:36)
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:84)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
The following request shows the query that zentity submits to Elasticsearch in the first hop, and the response that zentity receives from Elasticsearch to process. The error occurs when zentity tries to parse the values of the phone numbers, which are inside of an object array.
Request:
GET my_index/_search
{
"_source": true,
"query": {
"bool": {
"filter": [
{
"term": {
"last_name": "jones"
}
},
{
"nested": {
"path": "phone",
"query": {
"term": {
"phone.number": "555-123-4567"
}
}
}
}
]
}
},
"size": 1000
}
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"first_name" : "alice",
"last_name" : "jones",
"phone" : [
{
"number" : "555-123-4567",
"type" : "home"
},
{
"number" : "555-987-6543",
"type" : "mobile"
}
]
}
}
]
}
}
While zentity should run seamlessly with native Elasticsearch security features and has proven to do so in practice, it would be a good idea to write automated tests for zentity operating within the constraints of those security features. The tests will provide assurance that zentity functions as designed in a secured cluster, that zentity does not somehow circumvent those security features, and that zentity properly handles security exceptions.
Implement an API endpoint to submit multiple entity model management operations in bulk, borrowing the functionality for bulk operations introduced in #50. This will enable a more efficient handling of multiple entity model management operations. One envisioned implementation is an entity model management user interface that provides checkboxes to delete multiple models in one request.
Proposed syntax
POST /_zentity/models/_bulk[?PARAMS]
{ PARAMS }
{ PAYLOAD }
...
POST /_zentity/models/ENTITY_TYPE/_bulk[?PARAMS]
{ PARAMS }
{ PAYLOAD }
...
Accepted operations
The bulk endpoint would support operations that create, update, or delete entity models. Unlike the implementation in #50, this will require a field in PARAMS
that indicates which action is to be executed. See the Elasticsearch Bulk API implementation for reference, which requires this convention, too.
Hi,
In Zentity do we have matchers for optional matching - if value is available then match , if not available match without that value.
This plugin was built with an older plugin structure. Contact the plugin author to remove the intermediate "elasticsearch" directory within the plugin zip.
Hey there, I'm currently experiencing an issue running Zentity 1.6.1 w/ Elasticsearch 7.10.1 inside a multi-node cluster, but not on a single-node cluster. When sending alernating setup/ delete requests (as well as with other requests), it sometimes hangs and looks like the Elasticsearch CoordinatorPublication
gets gummed up. I can replicate this both in a local docker-compose setup (attached below) and in Kubernetes with an elastic-on-k8s cluster w/ 3 master and 3 data nodes.
Here are the logs from the docker-compose setup, where I've deleted then created the index, and the coordination hangs for 30+ seconds:
elasticsearch | {"type": "server", "timestamp": "2021-01-22T15:56:16,893Z", "level": "INFO", "component": "o.e.c.m.MetadataDeleteIndexService", "cluster.name": "docker-cluster", "node.name": "primary", "message": "[.zentity-models/kCCUX_6bS3CZeDQzImGi2A] deleting index", "cluster.uuid": "Zi3JrTDvRkmyjizI6z-6QQ", "node.id": "eZpuNPEsRqKPl6bhvojRJQ" }
elasticsearch | {"type": "deprecation", "timestamp": "2021-01-22T15:56:31,234Z", "level": "DEPRECATION", "component": "o.e.d.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "primary", "message": "index name [.zentity-models] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices", "cluster.uuid": "Zi3JrTDvRkmyjizI6z-6QQ", "node.id": "eZpuNPEsRqKPl6bhvojRJQ" }
elasticsearch | {"type": "server", "timestamp": "2021-01-22T15:56:31,309Z", "level": "INFO", "component": "o.e.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "primary", "message": "[.zentity-models] creating index, cause [api], templates [], shards [1]/[1]", "cluster.uuid": "Zi3JrTDvRkmyjizI6z-6QQ", "node.id": "eZpuNPEsRqKPl6bhvojRJQ" }
elasticsearch | {"type": "server", "timestamp": "2021-01-22T15:56:41,313Z", "level": "INFO", "component": "o.e.c.c.C.CoordinatorPublication", "cluster.name": "docker-cluster", "node.name": "primary", "message": "after [10s] publication of cluster state version [928] is still waiting for {es-data-2}{Xjwq8qUrReyh5VUi21l3aQ}{btWNi8GkTJaAjVjbcQxe2g}{172.19.0.2}{172.19.0.2:9300}{dir} [SENT_PUBLISH_REQUEST]", "cluster.uuid": "Zi3JrTDvRkmyjizI6z-6QQ", "node.id": "eZpuNPEsRqKPl6bhvojRJQ" }
elasticsearch | {"type": "server", "timestamp": "2021-01-22T15:57:01,314Z", "level": "WARN", "component": "o.e.c.c.C.CoordinatorPublication", "cluster.name": "docker-cluster", "node.name": "primary", "message": "after [30s] publication of cluster state version [928] is still waiting for {es-data-2}{Xjwq8qUrReyh5VUi21l3aQ}{btWNi8GkTJaAjVjbcQxe2g}{172.19.0.2}{172.19.0.2:9300}{dir} [SENT_PUBLISH_REQUEST]", "cluster.uuid": "Zi3JrTDvRkmyjizI6z-6QQ", "node.id": "eZpuNPEsRqKPl6bhvojRJQ" }
Do you think this originates in the plugin or in a misconfiguration of the clusters?
version: '3.7'x-plugin-volume: &plugin-volume "./target/releases/:/plugins"
x-base-es: &base-es
image: docker.elastic.co/elasticsearch/elasticsearch-oss:${ES_VERSION:-7.10.2}
user: "elasticsearch"install all plugins in mounted /plugin directory and start the elasticsearch server
command:
- /bin/bash
- -c
- elasticsearch-plugin install --batch https://zentity.io/releases/zentity-1.6.1-elasticsearch-7.10.2.zip && elasticsearch
ulimits:
nofile:
soft: 65536
hard: 65536
memlock:
soft: -1
hard: -1
environment: &base-env
cluster.name: docker-cluster
network.host: 0.0.0.0
# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: elastic/elasticsearch#17288
discovery.zen.minimum_master_nodes: "1"
# Reduce virtual memory requirements, see docker/for-win#5202 (comment)
bootstrap.memory_lock: "false"
ES_JAVA_OPTS: "-Xms512m -Xmx512m -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=0.0.0.0:5050"
http.cors.enabled: "true"
http.cors.allow-origin: "*"
cluster.initial_master_nodes: primary
networks:
- elasticx-base-primary-node: &base-primary-node
<<: *base-es
environment:
<<: *base-env
node.name: primary
node.master: "true"
node.data: "false"
node.ingest: "false"x-base-data-node: &base-data-node
<<: *base-es
environment:
<<: *base-env
discovery.zen.ping.unicast.hosts: elasticsearch
node.master: "false"
node.data: "true"
node.ingest: "true"services:
elasticsearch:
<<: *base-primary-node
hostname: elasticsearch
container_name: elasticsearch
volumes:
- *plugin-volume
- es-primary:/usr/share/elasticsearch/data
ports:
- "${ES_PORT:-9200}:9200" # http
- "${DEBUGGER_PORT:-5050}:5050" # debuggeres-data-1:
<<: *base-data-node
hostname: es-data-1
container_name: es-data-1
volumes:
- *plugin-volume
- es-data-1:/usr/share/elasticsearch/data
ports:
- "${DEBUGGER_PORT_DATA_1:-5051}:5050" # debuggeres-data-2:
<<: *base-data-node
hostname: es-data-2
container_name: es-data-2
volumes:
- *plugin-volume
- es-data-2:/usr/share/elasticsearch/data
ports:
- "${DEBUGGER_PORT_DATA_2:-5052}:5050" # debuggerkibana:
image: docker.elastic.co/kibana/kibana-oss:${KIBANA_VERSION:-7.10.1}
hostname: kibana
container_name: kibana
logging:
driver: none
environment:
- server.host=0.0.0.0
- server.name=kibana.local
- elasticsearch.url=http://elasticsearch:9200
ports:
- '${KIBANA_PORT:-5601}:5601'
networks:
- elasticvolumes:
es-primary:
driver: local
es-data-1:
driver: local
es-data-2:
driver: localnetworks:
elastic:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: annotations: common.k8s.elastic.co/controller-version: 1.3.1 elasticsearch.k8s.elastic.co/cluster-uuid: 8xDpRuE4T8ufu_KSJV4hFw creationTimestamp: "2021-01-20T17:28:29Z" generation: 4 labels: app.kubernetes.io/instance: eck-entity-resolution app.kubernetes.io/managed-by: Tiller app.kubernetes.io/name: eck-entity-resolution app.kubernetes.io/part-of: eck app.kubernetes.io/version: 1.1.2 helm.sh/chart: eck-entity-resolution-0.3.0 name: eck-entity-resolution namespace: entity-resolution resourceVersion: "273469952" selfLink: /apis/elasticsearch.k8s.elastic.co/v1/namespaces/entity-resolution/elasticsearches/eck-entity-resolution uid: cff37de2-c6c3-4ebd-a230-e45f00bdc7e7 spec: auth: fileRealm: - secretName: eck-entity-resolution-users roles: - secretName: eck-entity-resolution-roles http: service: metadata: creationTimestamp: null spec: {} tls: certificate: {} selfSignedCertificate: disabled: true nodeSets: - config: node.data: false node.ingest: false node.master: true count: 3 name: primary-node podTemplate: spec: containers: - env: - name: ES_JAVA_OPTS value: -Xms500m -Xmx500m name: elasticsearch resources: limits: cpu: 1 memory: 1Gi requests: cpu: 0.5 memory: 1Gi initContainers: - command: - sh - -c - | bin/elasticsearch-plugin install --batch https://github.com/zentity-io/zentity/releases/download/zentity-1.6.1/zentity-1.6.1-elasticsearch-7.10.1.zip name: install-plugins - command: - sh - -c - sysctl -w vm.max_map_count=262144 name: sysctl securityContext: privileged: true volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi storageClassName: standard-expandable - config: node.data: true node.ingest: true node.master: false count: 3 name: data-node podTemplate: containers: - env: - name: ES_JAVA_OPTS value: -Xms4g -Xmx4g -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=0.0.0.0:5005 name: elasticsearch resources: limits: cpu: 2 memory: 8Gi requests: cpu: 0.5 memory: 8Gi spec: initContainers: - command: - sh - -c - | bin/elasticsearch-plugin install --batch https://github.com/zentity-io/zentity/releases/download/zentity-1.6.1/zentity-1.6.1-elasticsearch-7.10.1.zip name: install-plugins - command: - sh - -c - sysctl -w vm.max_map_count=262144 name: sysctl securityContext: privileged: true volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 25Gi storageClassName: sdd-fast-expandable transport: service: metadata: creationTimestamp: null spec: {} updateStrategy: changeBudget: {} version: 7.10.1 status: availableNodes: 6 health: green phase: Ready version: 7.10.1
Words matter and I intend to rename the default branch from master
to main
in the near future.
We discovered while planning a cluster upgrade that we'd have to remove Zentity functionality completely because the plugin does not register itself as being compatible. Are there any plans to address this?
Background
Many users have requested a way to indicate the confidence of a match for documents returned in the output of a resolution job. Often the request is for a score, where a higher score indicates greater confidence in the match. Some users envisioned a score for each document. Others envisioned a score for specific fields such as the value of a name.
Currently, and with no change in mind for the future, zentity submits boolean queries to find matching documents in Elasticsearch. Therefore, by the standards of Elasticsearch, every matching document has a constant score of 1.
zentity also offers an "_explanation"
field for each document, which describes the resolvers, matchers, and values that caused the document to match.
I believe it would be possible to let users assign scores for various concepts in zentity, to combine those scores to produce an overall confidence score for a matching document, and to implement this in a way that intuitively fits the design of zentity and does not incur a significant performance penalty.
@cmwaters89 deserves recognition for demonstrating the feasibility of this concept. Thank you for your contribution!
Concept
The following zentity components could be extended to support a user-defined score that contributes to an overall confidence score for any matching document:
Examples:
ssn
or email
, are likely to identify a single entity more accurately than other attributes like name
or dob
.The concept of the feature is to let users to define a base score for any of the components listed above in whatever way they see fit for their use case. These base scores could be defined in the entity model and overridden at query time.
Upon query execution, zentity would then, for each document, combine the component base scores to produce an overall match confidence score for the document. This may involve calculating the product or average of the base scores. The way in which zentity combines the scores could also be configured by the user.
Default behavior
The default score for any document should be 1, null, or simply not present in the document at all, both to remain agnostic and to be consistent with past versions of the plugin.
The default base score for any component listed above should not influence the overall confidence score of the document. It should only contribute to that score when the user defines the base score. Potentially one solution for this is to let the default base score be null (i.e. undefined) and to skip any such scores when calculating the overall score.
Deconflicting base scores
It's possible for documents to match for multiple reasons. A document could match multiple resolvers and matchers. A case could be made for doing either of the following:
This behavior could be configured by the user, too.
Score thresholds
It should be easy enough to allow the user to defind a threshold that a document score must pass in order to be kept in the results. This has been another common and related feature request.
Optional and opt-in
The implementation of this feature would depend on information from the "_explanation"
field. This information is only gathered when the client sets _explanation=true
in the URL, because the queries become slightly more complex and incur a slight performance penalty. Likewise, I believe this scoring feature should be made optional and "opt-in" for the users who care about it.
Generic
One of the tenants of zentity is to be generic and not domain-specific. zentity should remain agnostic to the actual scoring process, and instead provide a framework in which the user can define a scoring process that fits their use case.
Scoring based on past scores
If a document matches with a relatively low score, should its values penalize the scores of documents that match it in subsequent queries? I have yet to think this through in detail.
Field level scores
This feature should work at the document level. I'm not confident (no pun intended) that this could be done at the field level without adding a lot of complexity to the plugin. By field level, I mean a score that indicates the confidence that the value of a specific field matches an input value. zentity is able to explain the reason for a match by using named filters in Elasticsearch. Named filters can inform zentity of the matchers that led to a hit, but it doesn't provide details on why they led to a hit.
Users who desire field level scores have a couple alternatives:
"input_value"
and "target_value"
from the "_explanation"
field to derive a confidence score for the value outside of zentity. Perhaps the client application derives a score by comparing the length and edit distance of the two values, for example.Issues like #56 have shown that a multi-node cluster in production mode would be better to run integration tests on than a single-node cluster in development mode. The plugin should be tested in an environment that more closely represents the desired state in which it will operate.
Currently entity type names can be any arbitrary string. As noted in another discussion, this is problematic when implementing API endpoints that could conflict with the names of entity types. There may be other unforeseen issues by using arbitrary strings. Entity type names are meant to be identifiers, not necessarily human readable descriptions, and therefore should be expected to follow some constraints.
Proposal
Enforce the same requirements as the Elasticsearch index name requirements:
Index names must meet the following criteria:
- Lowercase only
- Cannot include
\
,/
,*
,?
,"
,<
,>
,|
,(space character),
,
,#
- Indices prior to 7.0 could contain a colon (
:
), but thatโs been deprecated and wonโt be supported in 7.0+- Cannot start with
-
,_
,+
- Cannot be
.
or..
- Cannot be longer than 255 bytes (note it is bytes, so multi-byte characters will count towards the 255 limit faster)
- Names starting with
.
are deprecated, except for hidden indices and internal indices managed by plugins
Entity type names should follow the same rules (though allowing names to start with .
). This will prevent entity type names from conflicting with reserved API terms such as _bulk
and may help avoid other unforeseen issues related to syntax.
This would introduce a breaking change for existing entity models whose names do not meet this criteria.
1.7.0
from main
branchzentity.version
in pom.xml and version numbers in README.mdzentity-1.7.0
and push to build and deploy its release artifactsAttributes, resolvers, and matchers should have the same naming requirements as entity type names (see #58) for the same reasons listed in that issue.
This change should be included in the same release as #58 so that users can fix deprecated names in their entity models in one release.
Index names and index field names can be excluded from this validation. Elasticsearch validates those, but only zentity can validate attributes, resolvers, and matchers.
Hi!
First of all thanks for this very useful project.
I have a question about performance.
One of our use cases is to create entity groups from a pretty large index (8M documents,~4Gb). The model we use has ~10 attributes, 8 resolvers and use matchers with fuziness. We have fixed the number of hops to 2 to prevent "snowballing".
We call the resolution API ~3M times, and we would like to reduce the total computing time.
Do you think that adding more nodes to our Elasticsearch cluster can improve the response time of the resolution API? For now it's just a single node.
Because of unrelated constraints, we run a pretty old version of Elasticsearch and therefore zentity (6.2.3 and 1.0.0). Do you think upgrading could improve performance?
Do you have any other suggestions?
1.8.1
from 1.8.0
branchzentity.version
in pom.xml and version numbers in README.mdzentity-1.8.1
and push to build and deploy its release artifactsHi,
We want Zentity to support latest Elastic Search version 8.x. Is there any roadmap for the same?
We are trying to use Zentity 1.5.1 with Elasticsearch 7.3.2. for a pilot project. we're not able to make working a chain of resolvers most probably for the problem reported in the title.
I report here the index mapping of the data:
{
"obj-person": {
"aliases": {},
"mappings": {
"dynamic": "strict",
"properties": {
"completeName": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "phonetic_analyzer"
}
}
},
"entry": {
"properties": {
"createdBy": {
"type": "keyword"
},
"createdDate": {
"type": "date"
},
"infoObject": {
"properties": {
"dateOfBirth": {
"properties": {
"originalValue": {
"type": "text",
"copy_to": [
"search"
]
},
"value": {
"type": "text",
"copy_to": [
"search"
]
}
}
},
"firstName": {
"properties": {
"originalValue": {
"type": "text",
"copy_to": [
"search"
]
},
"value": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "phonetic_analyzer"
}
},
"copy_to": [
"search",
"completeName"
]
}
}
},
"lastName": {
"properties": {
"originalValue": {
"type": "text",
"copy_to": [
"search"
]
},
"value": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "phonetic_analyzer"
}
},
"copy_to": [
"search",
"completeName"
]
}
}
}
}
}
}
},
"search": {
"type": "text",
"fields": {
"graphically_similar": {
"type": "text",
"analyzer": "normalize_graphically_similar_analyzer"
},
"normalized": {
"type": "text",
"analyzer": "normalize_alphanum_analyzer"
},
"phonetic": {
"type": "text",
"analyzer": "phonetic_analyzer"
}
}
},
"search_sensitive": {
"type": "text"
},
"type": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"auto_expand_replicas": "1-5",
"provided_name": "obj-person",
"creation_date": "1582637094035",
"analysis": {
"filter": {
"phonetic_filter": {
"replace": "true",
"type": "phonetic",
"encoder": "double_metaphone"
}
},
"analyzer": {
"phonetic_analyzer": {
"filter": [
"phonetic_filter"
],
"tokenizer": "standard"
},
"normalize_graphically_similar_analyzer": {
"filter": [
"uppercase"
],
"char_filter": [
"strip_special_chars",
"replace_graphically_similar"
],
"type": "custom",
"tokenizer": "keyword"
},
"normalize_alphanum_analyzer": {
"filter": [
"uppercase",
"reverse"
],
"char_filter": "strip_special_chars",
"type": "custom",
"tokenizer": "keyword"
}
},
"char_filter": {
"replace_graphically_similar": {
"type": "mapping",
"mappings": [
"O => 0",
"D => 0",
"I => 1",
"B => 8",
"S => 5",
"Z => 2",
"G => 6",
"E => 3",
"o => 0",
"d => 0",
"i => 1",
"b => 8",
"s => 5",
"z => 2",
"g => 6",
"e => 3"
]
},
"strip_special_chars": {
"pattern": "[^\\w]",
"type": "pattern_replace",
"replacement": ""
}
}
},
"number_of_replicas": "1",
"uuid": "JFGNOU6xR4i_BHM8e0nB5Q",
"version": {
"created": "7030199"
}
}
}
}
}
Creating a zentity model like this:
PUT _zentity/models/zentity_test_resolution_person
{
"attributes" : {
"first_name" : {
"type" : "string"
},
"last_name" : {
"type" : "string"
},
"dob" : {
"type" : "string"
}
},
"resolvers" : {
"name_only" : {
"attributes" : [
"first_name",
"last_name"
]
},
"dob" : {
"attributes" : [
"dob"
]
}
},
"matchers" : {
"simple" : {
"clause" : {
"match" : {
"{{ field }}" : "{{ value }}"
}
}
},
"fuzzy" : {
"clause" : {
"match" : {
"{{ field }}" : {
"query" : "{{ value }}",
"fuzziness" : "{{ params.fuzziness }}"
}
}
},
"params" : {
"fuzziness" : "auto"
}
}
},
"indices" : {
"obj-person" : {
"fields" : {
"entry.infoObject.firstName.value" : {
"attribute" : "first_name",
"matcher" : "fuzzy"
},
"entry.infoObject.lastName.value" : {
"attribute" : "last_name",
"matcher" : "fuzzy"
},
"entry.infoObject.dateOfBirth.value" : {
"attribute" : "dob",
"matcher" : "simple"
}
}
}
}
}
Using three objects which the core data is:
Person1:
"firstName" : {
"value" : "Nolan"
},
"lastName" : {
"value" : "Hendricks"
},
"dateOfBirth" : {
"value" : "633-9242"
}
Person2:
"firstName" : {
"value" : "Nolan"
},
"lastName" : {
"value" : "Hendricks"
},
"dateOfBirth" : {
"value" : "677-9999"
}
Person3:
"firstName" : {
"value" : "Noln"
},
"lastName" : {
"value" : "Hendricks"
},
"dateOfBirth" : {
"value" : "677-9999"
}
If we execute this resolution:
POST _zentity/resolution/zentity_test_resolution_person?_source=false&_explanation=false
{
"attributes": {
"first_name": {
"values": ["Nolan"],
"params": {
"fuzziness": "0"
}
},
"last_name": ["Hendricks"]
}
}
the result is this:
{
"took" : 2,
"hits" : {
"total" : 2,
"hits" : [ {
"_index" : "obj-person",
"_type" : "_doc",
"_id" : "2D6F8CCF-227B-4FBF-A749-16C098BB0C0A",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"dob" : [ ],
"first_name" : [ ],
"last_name" : [ ]
}
}, {
"_index" : "obj-person",
"_type" : "_doc",
"_id" : "1CE639AD-B3FD-4FAB-9D9B-A469DE75C943",
"_hop" : 0,
"_query" : 0,
"_attributes" : {
"dob" : [ ],
"first_name" : [ ],
"last_name" : [ ]
}
} ]
}
}
How you can see _attributes are not valorized at all and no all the hops has been performed: I'll expect to see the third result based on the "dob" field.
Can you point me in the right direction?
I am using elastic 6.2.4 and the zentity plugin 6.2.4 (Version 1.0.0).
URL: /_zentity/models/test
My model is:
{ "attributes": { "name": { "type": "string" }, "ssn": { "type": "string" } }, "resolvers": { "name_ssn": { "attributes": [ "name", "ssn" ] }, "name_only": { "attributes": [ "name" ] } }, "matchers": { "exact": { "clause": { "term": { "{{ field }}": "{{ value }}" } } }, "fuzzy": { "clause": { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": 2 } } } } }, "indices": { "test": { "fields": { "FIRST_NAME": { "attribute": "name", "matcher": "fuzzy" }, "LAST_NAME": { "attribute": "name", "matcher": "fuzzy" }, "SSN": { "attribute": "ssn", "matcher": "exact" } } } } }
URL: /_zentity/resolution/test
My resolution is:
{ "attributes": { "name": [ "Muruga Mani" ], "ssn": [ "111-22-3333" ] }, "include": { "indices": ["test"], "resolvers": ["name_only"] } }
or
{ "attributes": { "name": [ "Muruga Mani" ], "ssn": [ "111-22-3333" ] } }
My data in ES is (in index test and type person):
{"LAST_NAME":"Mani","FIRST_NAME":"Muruga","SSN":"111-22-3333"}
But it is not resolving the match.
My response is:
{"took":8,"hits":{"total":0,"hits":[]}}
What am I missing here?
Currently, each document returned by a resolution job has an "_attributes"
object in which each field is the name of an attribute mapped to a single value (source).
Example:
{
"_attributes": {
"name": "Alice Jones",
"phone": "555-123-4567"
},
"_source": {
"indexed_name": "Alice Jones",
"indexed_phone": "555-123-4567"
}
}
However, two common situations could lead to information being lost in the output:
"_source"
object listed "indexed_phone": [ "555-123-4567", "555-987-6543" ]
, then only one of those values would be mapped to the "phone"
attribute in the "_attributes"
object."_source"
object listed "indexed_phone_1"
and "indexed_phone_2"
, it's valid to have an entity model that maps both index fields to the "phone"
attribute.In both example above, the desired behavior would be to ensure that every value is returned as an array in the "_attributes"
object. For example:
{
"_attributes": {
"name": [ "Alice Jones" ],
"phone": [ "555-123-4567", "555-987-6543", "555-000-1234" ]
},
"_source": {
"indexed_name": "Alice Jones",
"indexed_phone_1": [ "555-123-4567", "555-987-6543" ],
"indexed_phone_2": [ "555-000-1234" ]
}
}
Without this enhancement, the completeness or accuracy of resolution outputs can't be guaranteed whenever a matching document has multiple values mapped to the same attribute.
This enhancement would be a breaking chance that affects most users. But it should be easy for most users to adapt to this change.
Hi,
I am using elastic 6.2.4 and the zentity plugin 6.2.4 (Version 1.0.0).
URL: /_zentity/models/test
{ "attributes": { "name": { "type": "string" }, "ssn": { "type": "string" } }, "resolvers": { "name_ssn": { "attributes": [ "name", "ssn" ] }, "name_only": { "attributes": [ "name" ] } }, "matchers": { "exact": { "clause": { "term": { "{{ field }}": "{{ value }}" } } }, "fuzzy": { "clause": { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": 100 } } } } }, "indices": { "test": { "fields": {"firstName": { "attribute": "name", "matcher": "fuzzy" }, "middleName": { "attribute": "name", "matcher": "fuzzy" }, "lastName": { "attribute": "name", "matcher": "fuzzy" }, "otherFirstName": { "attribute": "name", "matcher": "fuzzy" }, "otherLastName": { "attribute": "name", "matcher": "fuzzy" }, "ssn.keyword": { "attribute": "ssn", "matcher": "exact" } } } } }
I want to use firstname, lastname, middleName, otherFirstName, otherLastName considered as name attribute.
I have 5 indices in ELK
[{"_index":"test","_type":"identity","_id":"5","_version":1,"_score":1,"_source":{"firstName":"test","middleName":null,"lastName":"Beena","otherFirstName":null,"otherLastName":"William","ssn":"109520107"}},{"_index":"test","_type":"identity","_id":"2","_version":3,"_score":1,"_source":{"firstName":"test","middleName":null,"lastName":"test","otherFirstName":null,"otherLastName":null,"ssn":"109520107"}},{"_index":"test","_type":"identity","_id":"4","_version":3,"_score":1,"_source":{"firstName":"Williamz","middleName":null,"lastName":"Beena","otherFirstName":null,"otherLastName":null,"ssn":"109520107"}},{"_index":"test","_type":"identity","_id":"1","_version":1,"_score":1,"_source":{"firstName":"Bina","middleName":null,"lastName":"William","otherFirstName":null,"otherLastName":null,"ssn":"109520107"}},{"_index":"test","_type":"identity","_id":"3","_version":2,"_score":1,"_source":{"firstName":"Beena","middleName":null,"lastName":"Williamz","otherFirstName":null,"otherLastName":null,"ssn":"109520107"}}]
When hit the URL: _zentity/resolution/test with the below request
{ "attributes": { "name": [ "BEENA", "", "WILLIAM" ], "ssn": [ "109520107" ] }, "include": { "indices": [ "test" ], "resolvers": [ "name_ssn" ] } }
I got the response with all the indexes
{ "took": 53, "hits": { "total": 5, "hits": [ { "_index": "test", "_type": "identity", "_id": "5", "_hop": 0, "_attributes": { "name": "William", "ssn": "109520107" }, "_source": { "firstName": "test", "middleName": null, "lastName": "Beena", "otherFirstName": null, "otherLastName": "William", "ssn": "109520107" } }, { "_index": "test", "_type": "identity", "_id": "4", "_hop": 0, "_attributes": { "name": null, "ssn": "109520107" }, "_source": { "firstName": "Williamz", "middleName": null, "lastName": "Beena", "otherFirstName": null, "otherLastName": null, "ssn": "109520107" } }, { "_index": "test", "_type": "identity", "_id": "1", "_hop": 0, "_attributes": { "name": null, "ssn": "109520107" }, "_source": { "firstName": "Bina", "middleName": null, "lastName": "William", "otherFirstName": null, "otherLastName": null, "ssn": "109520107" } }, { "_index": "test", "_type": "identity", "_id": "3", "_hop": 0, "_attributes": { "name": null, "ssn": "109520107" }, "_source": { "firstName": "Beena", "middleName": null, "lastName": "Williamz", "otherFirstName": null, "otherLastName": null, "ssn": "109520107" } }, { "_index": "test", "_type": "identity", "_id": "2", "_hop": 1, "_attributes": { "name": null, "ssn": "109520107" }, "_source": { "firstName": "test", "middleName": null, "lastName": "test", "otherFirstName": null, "otherLastName": null, "ssn": "109520107" } } ] } }
I was not expecting the indexes "_id": "2" and "_id": "5" since names are totally off..
Can anyone please check on this.
There has already been some discussion about talking with a Zentity-enabled cluster in Python, but I'm wondering what other languages people would like to see clients built for? We're currently using Node.js and can contribute our extension to the official client if others would find it useful.
I am looking to define a matcher on a date field that has a format of 'yyyy-MM-dd' , I want this matcher to pick those records where the Year part matches and window of 2 is allowed meaning year +-2 is allowed.
1.8.0
from main
branchzentity.version
in pom.xml and version numbers in README.mdzentity-1.8.0
and push to build and deploy its release artifactsHi,
I had created my zentity model which include some nested object of array attributes to be resolved.
However when I run the resolution, the _attributes list that was return from the result only include those attribute that was declared at the root level of the document.
Those nested object attributes were not being detected thus those nested object array attributes could not be used for subsequent recursive resolving traversal.
Is there anyway to do this? Thanks
E.g
{ "attributes": { "firstName": { "type": "string" }, "lastName": { "type": "string" }, "licenseNumber": { "type": "string" } }, "resolvers": { "name": { "attributes": ["lastName", "firstName"] }, "license": { "attributes": ["licenseNumber","firstName"] } }, "matchers": { "exact": { "clause": { "term": { "{{ field }}": "{{ value }}" } } }, "exact_license_nested": { "clause": { "nested": { "path": "license", "query": { "term": { "{{ field }}": "{{ value }}" } } } } }, "fuzzy": { "clause": { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": "auto", "operator": "AND" } } } } }, "indices": { "my_index": { "fields": { "firstName": { "attribute": "firstName", "matcher": "fuzzy" }, "lastName": { "attribute": "lastName", "matcher": "fuzzy" }, "license.number.keyword": { "attribute": "licenseNumber", "matcher": "exact_license_nested" } } } } }
When I run the resolution, the result _attributes portion will only consist of
firstName, lastName but not licenseNumber although license.number is inside the document in the form of
license:[{number:1},{number:2}].
Having this will only result in traversing the "name" resolver but not the "license" resolver for subsequent hops.
This is the query I am looking for,
POST _zentity/resolution/zentity_tutorial_1_person?pretty&_source=false
{
"attributes": {
"first_name": [ "Allie" ],
"last_name": [ "Jones" ]
}
}
but it returns error:
{
"took": 2,
"error": {
"by": "elasticsearch",
"type": "org.elasticsearch.common.ParsingException",
"reason": "[multi_match] query does not support [first_name]",
"stack_trace": "ParsingException[[multi_match] query does not support [first_name]]
The entity model is defined here.
PUT _zentity/models/zentity_tutorial_1_person
{
"attributes": {
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
}
},
"resolvers": {
"name_only": {
"attributes": [ "first_name", "last_name" ]
}
},
"matchers": {
"simple": {
"clause": {
"match": {
"{{ field }}": "{{ value }}"
}
}
},
"multi_match": {
"clause": {
"multi_match": {
"{{ field }}": "{{ value }}",
"type": "cross_fields",
"fields": [ "firs_tname", "last_name" ],
"operator": "and"
}
}
}
},
"indices": {
"zentity_tutorial_1_exact_name_matching": {
"fields": {
"first_name": {
"attribute": "first_name",
"matcher": "multi_match"
},
"last_name": {
"attribute": "last_name",
"matcher": "multi_match"
}
}
}
}
}
Currently the "_attributes"
section of the resolution response is a flat object, where each key is the name of an attribute. Allowing the attributes to be nested will allow users to save results in an index that follows the guidelines and best practices for the Elastic Common Schema (ECS), which encourages nesting by way of prefixes.
If this feature is released at the same time as #73, then it would create one breaking change instead of two.
Allow periods (.
) to be used in the attribute names of entity models, and used them to nest fields in the "_attributes"
section of the resolution response.
Entity model - Attribute names are flat and may contain periods. This example shows attributes which are grouped by prefixes.
{
"attributes": {
"name.first": {},
"name.middle": {},
"name.last": {},
"location.address.street": {},
"location.address.city": {},
"location.address.state": {},
"location.address.zip": {}
}
}
Resolution request - Attribute names are flat and retain their periods. Nesting would not be allowed at this point. Rationale: Attributes may be arrays of values or objects with values and params (source), and allowing nested attributes here would make it difficult to determine whether the nested object was an attribute value or a nested attribute name.
{
"attributes": {
"name.first": [ "Alice" ],
"name.middle": [ "Q" ],
"name.last": [ "Jones" ]
}
}
Resolution response - Attribute names are split and nested by their periods.
{
"_attributes": {
"name": {
"first": [ "Alice" ],
"middle": [ "Quincy" ],
"last": [ "Jones" ]
},
"location": {
"address": {
"street": [ "123 Main St" ],
"city": [ "Washington" ],
"state": [ "DC" ],
"zip": [ "20001" ]
}
}
}
}
Zentity does not currently have a release compatible with the newer versions of Elasticsearch. Will there be a release for zentity that will include support for those versions?
I have a bunch of resolvers:
{
"resolvers": {
"name_full_address": {
"attributes": [
"name", "address_line_1", "address_line_2", "city", "state", "postal_code"
],
"weight": 100
},
"name_full_address_one_line": {
"attributes": [
"name", "address_line_1", "city", "state", "postal_code"
],
"weight": 100
},
"loose_name_exact_address": {
"attributes": [
"loose_name", "address_line_1_exact", "city", "state", "postal_code"
],
"weight": 100
},
"name_phone": {
"attributes": [
"name", "phone_number"
],
"weight": 100
},
"name_address_postal": {
"attributes": [
"name", "address_line_1", "postal_code"
],
"weight": 100
},
"name_address_city_state": {
"attributes": [
"name", "address_line_1", "city", "state"
],
"weight": 100
},
"address_phone": {
"attributes": [
"phone_number", "address_line_1", "city", "state", "postal_code"
],
"weight": 100
},
"name_city_state_postal": {
"attributes": [
"name", "city", "state", "postal_code"
],
"weight": 150
},
}
}
Now I have entity with name, city, state and postal_code set.
name = "Piotr's Restaurant"
state = "NY"
zip = "11217"
city = "New York"
// phone_number and other attributes are null by default
Resolving with just those attributes yields a match due to name_city_state_postal resolver:
// Works just fine
{
"attributes": {
"name": [name],
"city": [city],
"state": [state],
"postal_code": [zip]
}
}
Now, if I add a phone number to the resolution request, it does not match the entity:
// Does not work as expected
{
"attributes": {
"name": [name],
"city": [city],
"state": [state],
"postal_code": [zip],
"phone_number": ["2063108455"]
}
}
My question is whether this is the expected behavior. From the docs (https://zentity.io/docs/entity-models/specification/):
"The weight level of the resolver. Resolvers with higher weight levels take precedence over resolvers with lower weight levels. If a resolution job uses resolvers with different weight levels, then the higher weight resolvers either must match or must not exist. This behavior can help prevent false matches."
Meaning the highest weight resolver name_city_state_postal (weight 150), should match and looks like it does. No other resolver matches (there is no address_line_1 or phone_number linked to the entity). IIUC, this query should also match the defined entity. Am I doing something wrong here?
Currently, requesting to create entity models with empty top-level objects (example below) will result in a validation error:
{
"attributes": {},
"resolvers": {},
"matchers": {},
"indices": {}
}
zentity should allow models like these to be created, and instead validate that they are complete before running a resolution job. This will make it possible to build an application that guides users through the process of creating an entity model from scratch and lets them save their progress on incomplete models.
With Travis CI's new pricing model, Zentity is going to be transitioned to a new plan with limited free use. From their announcement:
[we'll be moved to the] trial (free) plan with a 10K credit allotment (which allows around 1000 minutes in a Linux environment).
When your credit allotment runs out - weโd love for you to consider which of our plans will meet your needs.
We will be offering an allotment of OSS minutes that will be reviewed and allocated on a case by case basis.
Given that, GitHub Actions makes a nice alternative for a few reasons:
GH Actions do have some areas that aren't so nice:
deploy
configurationI think we can both test and release in one workflow, but I'll start with tests.
In one job:
on.push
matrix
to test multiple different versions of Elasticsearch (almost the same as Travis's matrix
)
mvn
cli (same as currently in Travis)
pom.xml
file
This is where the workflow would be quite different than Travis.
In a second job:
*.*.*-rc*
on.push.tags
Optionally, we could also automatically create a changelog for the release body (or to add to a separate file) via something like mikepenz/release-changelog-builder-action.
How does this compare with your current process for releasing Zentity?
Please let me know what you think! I'm happy to adjust/find more resources on/ talk about anything in here!
Hi,
I am trying to run the integration tests but get Integration tests are skipped: got: "Connection refused", expected: not a string containing "Connection refused"
from the io.zentity.resolution.AbstractITCase.startRestClient()
method because it can't connect to elasticsearch. Is there another step I am missing?
I am using Intellij but I am not a Java developer.
Observed in zentity-1.0.0-elasticsearch-6.2.4 from Issue #7.
Example:
POST _zentity/resolution/test
{
"attributes": {
"name": ["Alice Jones"]
},
"foo": {
"bar": "baz"
}
}
This request should fail because "foo"
is an unrecognized field. Currently the request is processed.
Whenever there's an unrecognized field in a request, an error should be raised to prevent any confusion for the client.
Elasticsearch has assertion tests that can be enabled by setting -ea
in the JVM options. This should be included in integrations tests to catch issues such as #56.
Similar to the ES bulk API, do you think it is feasible to add bulk resolution support to limit the number of network requests? It seems rare to want to resolve just a single entity if you're doing NER on a decently large piece of text.
Hello Dave,
Thanks for developing such a wonderful project, I am trying to match a field which is 'array of objects' type but getting ValidationException. I might be missing something here, can you please look into it.
index mapping:
"test" : { "mappings" : { "properties" : { "education" : { "properties" : { "major" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "school" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }, "name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } }
sample doc:
{ "_index" : "test", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "John Wick", "education" : [ { "major" : "Master Of Science In Information Management", "school" : "Syracuse University" }, { "major" : "Certification Of Advanced Study In Data Science" }, { "major" : "Bachelor Of Technology", "school" : "Charotar University Of Science And Technology" } ] } }
zentity model:
PUT _zentity/models/name_education { "attributes" : { "name" : { "type": "string" }, "school" : { "type": "string" } }, "resolvers" : { "name_education" : { "attributes" : ["name", "school"] } }, "matchers" : { "simple" : { "clause" : { "match" : { "{{ field }}" : "{{ value }}" } } }, "fuzzy" : { "clause" : { "match" : { "{{ field }}" : { "query" : "{{ value }}", "fuzziness" : "1" } } } }, "exact" : { "clause" : { "term" : { "{{ field }}" : "{{ value }}" } } } }, "indices" : { "test" : { "fields" : { "name" : { "attribute" : "name", "matcher" : "simple" }, "education.school" : { "attribute" : "school", "matcher" : "simple" } } } } }
resolution request:
POST _zentity/resolution/name_education?pretty&_source=true&_explanation=true&_score=true { "attributes": { "school": [ "Syracuse University", "", "Charotar University Of Science And Technology" ], "name": [ "John Wick" ] } }
Here I am trying to do a simple match for both the attributes name and education.school. Above request results into following error (same error if I use 'fuzzy' matcher):
"error": { "by": "zentity", "type": "io.zentity.model.ValidationException", "reason": "Expected 'string' attribute data type.", "stack_trace": "io.zentity.model.ValidationException: Expected 'string' attribute data type.\n\tat io.zentity.resolution.input.value.StringValue.validate(StringValue.java:35)\n\tat io.zentity.resolution.input.value.Value.<init>(Value.java:18)\n\tat io.zentity.resolution.input.value.StringValue.<init>(StringValue.java:11)\n\tat io.zentity.resolution.input.value.Value.create(Value.java:40)\n\tat io.zentity.resolution.Job.traverse(Job.java:1346)\n\tat io.zentity.resolution.Job.run(Job.java:1539)\n\tat org.elasticsearch.plugin.zentity.ResolutionAction.lambda$prepareRequest$0(ResolutionAction.java:118)\n\tat org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:108)\n\tat org.elasticsearch.xpack.security.rest.SecurityRestFilter.lambda$handleRequest$0(SecurityRestFilter.java:58)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$writeAuthToContext$24(AuthenticationService.java:570)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.writeAuthToContext(AuthenticationService.java:579)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.finishAuthentication(AuthenticationService.java:560)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeUser(AuthenticationService.java:510)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$consumeToken$16(AuthenticationService.java:404)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43)\n\tat org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:120)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$consumeToken$13(AuthenticationService.java:374)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.lambda$authenticateWithCache$1(CachingUsernamePasswordRealm.java:145)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.handleCachedAuthentication(CachingUsernamePasswordRealm.java:196)\n\tat org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.lambda$authenticateWithCache$2(CachingUsernamePasswordRealm.java:137)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:112)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:225)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:106)\n\tat org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:68)\n\tat org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticateWithCache(CachingUsernamePasswordRealm.java:132)\n\tat org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticate(CachingUsernamePasswordRealm.java:103)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$consumeToken$15(AuthenticationService.java:365)\n\tat org.elasticsearch.xpack.core.common.IteratingActionListener.run(IteratingActionListener.java:102)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeToken(AuthenticationService.java:408)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$extractToken$11(AuthenticationService.java:335)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.extractToken(AuthenticationService.java:345)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$checkForApiKey$3(AuthenticationService.java:288)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.xpack.security.authc.ApiKeyService.authenticateWithApiKeyIfPresent(ApiKeyService.java:325)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.checkForApiKey(AuthenticationService.java:269)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:252)\n\tat org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)\n\tat org.elasticsearch.xpack.security.authc.TokenService.getAndValidateToken(TokenService.java:379)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:248)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:306)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:317)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:244)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:196)\n\tat org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:122)\n\tat org.elasticsearch.xpack.security.rest.SecurityRestFilter.handleRequest(SecurityRestFilter.java:55)\n\tat org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:222)\n\tat org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:295)\n\tat org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:166)\n\tat org.elasticsearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:322)\n\tat org.elasticsearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:372)\n\tat org.elasticsearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:301)\n\tat org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:69)\n\tat org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:31)\n\tat io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:58)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)\n\tat io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478)\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1274)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:600)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:554)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:830)\n" }
However if I use 'education.school.keyword' and do exact match, there's no error and it match the record:
PUT _zentity/models/name_education { "attributes" : { "name" : { "type": "string" }, "school" : { "type": "string" } }, "resolvers" : { "name_education" : { "attributes" : ["name", "school"] } }, "matchers" : { "simple" : { "clause" : { "match" : { "{{ field }}" : "{{ value }}" } } }, "fuzzy" : { "clause" : { "match" : { "{{ field }}" : { "query" : "{{ value }}", "fuzziness" : "1" } } } }, "exact" : { "clause" : { "term" : { "{{ field }}" : "{{ value }}" } } } }, "indices" : { "test" : { "fields" : { "name" : { "attribute" : "name", "matcher" : "simple" }, "education.school.keyword" : { "attribute" : "school", "matcher" : "exact" } } } } }
response for same resolution request (used earlier):
"hits" : { "total" : 1, "hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "1", "_hop" : 0, "_query" : 0, "_score" : null, "_attributes" : { "name" : [ "John Wick" ] }, "_explanation" : { "resolvers" : { "name_education" : { "attributes" : [ "name", "school" ] } }, "matches" : [ { "attribute" : "name", "target_field" : "name", "target_value" : "John Wick", "input_value" : "John Wick", "input_matcher" : "simple", "input_matcher_params" : { }, "score" : null }, { "attribute" : "school", "target_field" : "education.school.keyword", "target_value" : null, "input_value" : "Charotar University Of Science And Technology", "input_matcher" : "exact", "input_matcher_params" : { }, "score" : null }, { "attribute" : "school", "target_field" : "education.school.keyword", "target_value" : null, "input_value" : "Syracuse University", "input_matcher" : "exact", "input_matcher_params" : { }, "score" : null } ] }, "_source" : { "name" : "John Wick", "education" : [ { "major" : "Master Of Science In Information Management", "school" : "Syracuse University" }, { "major" : "Certification Of Advanced Study In Data Science" }, { "major" : "Bachelor Of Technology", "school" : "Charotar University Of Science And Technology" } ] } } ] }
Thanks
Abhishek
Whenever I try to perform a resolution request with an embedded entity model, I get a validation_exception
, You must specify either an entity type or an entity model
. The structure of my request follows the pattern described in the doc:
POST _zentity/resolution
{
"attributes": {...},
"model": {...}
}
Looks like the latest version has to match the ES patch version exactly. Any plan to support the most recent ES patch version?
#7 10.81 Exception in thread "main" java.lang.IllegalArgumentException: Plugin [zentity] was built for Elasticsearch version 7.17.0 but version 7.17.5 is running
We've moving from Elasticsearch 7.10.2 to OpenSearch 1.2.4. The OS version is a fork of the OSS build of the ES version plus a bunch of plugins.
After skimming through the plugin migration documentation [(https://github.com/opensearch-project/opensearch-plugins/blob/main/UPGRADING.md)], I managed to build Zenity for OS.
Here is a proof of concept:
https://github.com/netom/zentity/tree/opensearch-1.2.4
I wonder if it would be possible for the project to be built for both Elasticsearch and OpenSearch.
I imagine this would require a complete re-design of the interface between the core functionality and the back end.
zentity should take advantage of the logging architecture of Elasticsearch to aid troubleshooting. This can be implemented as needed instead of creating a dedicated feature branch for logging.
To implement this:
CLASS_NAME
with the name of the class):import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
class MyClass {
private static final Logger logger = LogManager.getLogger(MyClass.class);
}
logger.catching(e);
logger.fatal(message);
logger.error(message);
logger.warn(message);
logger.info(message);
logger.debug(message);
logger.trace(message);
elasticsearch.yml
file of each node to write the log messages to the Elasticsearch log files:logger.org.elasticsearch.plugin.zentity: DEBUG
logger.io.zentity: DEBUG
In some use cases "matching" data is spread between several attributes.
For example, let's look at a use case with 3 indexes: people and cars
people index contains person_name, DOB, and address fields
cars index contains person_name, DOB, and car's license_plate fields
We would like to be able to find all people living at a particular address and all cars connected to these people. At the first glance, two resolvers should suffice:
The problem appears when a person (person-A) who lives at address-A, shares the same name with another person (person-B1) and shares the same DOB with yet another person (person-B2), where person-B1 and person-B2 happen to live at the same address-B. If we search for data starting from address-B, we are going to find person-B1 and person-B2 (correctly), but then by combining name of person-B1 and DOB or person-B2 we'll find person-A (incorrectly).
To avoid this issue there need to be a way for marking person_name and DOB attributes as "compound" or "grouped", so name and DOB of a person to look for would always come from the same record.
are there any plans from your side to support OpenSearch in addition to Elasticsearch in the future?
OpenSearch has been started as a fork off of Elasticsearch 7.10 due to the license change done by elastic. OpenSearch 1.0.0 aims to be fully compatible with Elasticsearch 7.10.2. however, plugins of course need to be adapted to compile against OpenSearch. based on what i've read & heard (i'm not an ES plugin developer) it's quite simple to update a plugin so that it also compiles against OpenSearch (basically run a search & replace).
OpenSearch also provides a documentation on what needs to be done to upgrade a plugin from Elasticsearch to OpenSearch: https://github.com/opensearch-project/opensearch-plugins/blob/main/UPGRADING.md
we plan to move from Elasticsearch to OpenSearch due to the license and are currently looking into using your plugin for a use-case.
The score
and quality
fields in entity models are floats. Currently, zentity strictly requires the inputs of those fields to be floats. If an integer is submitted to one of these fields, zentity will throw a validation exception.
This behavior is too restrictive for some clients. JavaScript's JSON.stringify()
serializer will force any number of 0.0
or 1.0
to be serialized as 0
or 1
and there is no easy way around this (cases: here, here).
zentity should allow integers as inputs to float fields, and then convert those fields to floats for its own purposes.
We are using Zentity 6.2.4. We have processed atleast 30 million records for entity resolution and we are happy with the performance of the Zentity. As of now our records had linear data, but we got a new requirement to store array of objects.
For ex โ License information stored as array.
{ "firstName":"John", "lastName":"Doe", "license" : [ { "number" : "123" }, { "number" : "456" } ] }
We would like to use Zentity to resolve this record with license number as well. But I am able to do it. I have given the index and other details for your assistance.
Index Mapping
http://ELKHOST/my_index
{ "mappings": { "_doc": { "properties": { "firstName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "lastName": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "license": { "type": "nested", "properties": { "number": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } }
Insert the first document
POST http://ELKHOST/my_index/_doc/1
{ "firstName":"John", "lastName":"Doe", "license" : [ { "number" : "123" }, { "number" : "456" } ] }
Zentity Model
http://ELK HOST/_zentity/models/my_index
{ "attributes": { "firstName": { "type": "string" }, "lastName": { "type": "string" }, "licenseNumber": { "type": "string" } }, "resolvers": { "name": { "attributes": [ "lastName", "firstName" ] }, "license": { "attributes": [ "licenseNumber" ] } }, "matchers": { "exact": { "clause": { "term": { "{{ field }}": "{{ value }}" } } }, "fuzzy": { "clause": { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": "auto", "operator": "AND" } } } } }, "indices": { "my_index": { "fields": { "firstName": { "attribute": "firstName", "matcher": "fuzzy" }, "lastName": { "attribute": "lastName", "matcher": "fuzzy" }, "license.number": { "attribute": "licenseNumber", "matcher": "exact" } } } } }
Resolve the document using the below model
http://ELKHOST/_zentity/resolution/my_index
{ "attributes": { "lastName": [ "Doe" ], "firstName":["John"] }, "scope": { "include": { "indices": [ "my_index" ], "resolvers": [ "name" ] } } }
I get the following error
{ "error": { "root_cause": [ { "type": "validation_exception", "reason": "Expected 'string' attribute data type." } ], "type": "validation_exception", "reason": "Expected 'string' attribute data type." }, "status": 400 }
But I see it when i bypass Zentity and query elastic search directly, it works fine.
http://ELKHOST/my_index/_search
{ "query": { "nested": { "path": "license", "query": { "bool": { "must": [ { "match": { "license.number": "456" }} ] } } } } }
First of all, thank you for the awesome work done with the library!
We have a use case where the user profile properties are dynamic and we want to allow resolving the identity using one of these fields:
{
"name": "Marcos",
"last_name": "Passos",
"extra": {
"loyalty_number": "123"
}
}
In this example, loyalty_number
is a dynamic field, unknown at mapping time. However, we would like to match (exactly) this field in some cases.
Is it supported? What is the recommended approach?
I installed zentitty successfully on my windows machine.
While runing http://localhost:9200/_zentity ti thorws below error:
{"error":{"root_cause":[{"type":"invalid_index_name_exception","reason":"Invalid index name [zentity], must not start with ''.","index_uuid":"na","index":"_zentity"}],"type":"invalid_index_name_exception","reason":"Invalid index name [zentity], must not start with ''.","index_uuid":"na","index":"_zentity"},"status":400}
Zentity should run as expected after installation.
Hi Everyone,
Thank you for the fantastic project.
I read the documentation and tried the project with the provided data which was in English.
I was wondering if this can be transferrable to other non-Latin languages out of the box
or are there any modifications that needs to happen?
Regards,
In the Releases section there is no zentity-1.8.1-elasticsearch-7.12.1.zip. Instead, there is a .jar placed. Not sure if that is intended.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.