epam / badgerdoc Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
After migrating from GitLab we did not change readme-s across the project. There are links which non-relevant or point to internal EPAM's portals (e.g. here). We should update/remove them.
DOD:
Add addition converter to export/import between PDF and plain text format.
BadgerDoc is currently working with PDF and bbox, we need also support plain text with offsets as source format (converter) and export PDF + bbox to plain text with offsets
Part 2:
Logs:
14-Dec-22 19:23:18 - [ERROR] - src.utils.minio_utils - (minio_utils.py).remake_thumbnail(111) - File is not an image
INFO: 100.120.37.66:53972 - "GET /download/thumbnail?file_id=8 HTTP/1.1" 404 Not Found
INFO: 100.120.37.66:50964 - "POST /files/search HTTP/1.1" 200 OK
INFO: 100.120.37.66:50964 - "POST /datasets/search HTTP/1.1" 200 OK
INFO: 100.120.37.66:50970 - "POST /files/search HTTP/1.1" 200 OK
14-Dec-22 19:23:26 - [ERROR] - src.utils.minio_utils - (minio_utils.py).remake_thumbnail(111) - File is not an image
INFO: 100.120.37.66:50986 - "GET /download/thumbnail?file_id=8 HTTP/1.1" 404 Not Found
INFO: 100.120.37.66:51004 - "GET /download/thumbnail?file_id=7 HTTP/1.1" 200 OK
Example: 2 annotators annotates 1 document, if they annotation is not match, we need to have algorithm to compare annotations if they not match. Agreement metrics helps to check if annotations are matched.
AGREEMENT_SCORE_MIN_MATCH
float
This task is very close to: #64
Table agreement_scores:
Tasks: 1, 2, 3 <-- one part of document
task_from, task_to, agreement_score
1, 2, 0.95
2, 1, 0.95 <--- doesn't make sense
2, 3, 0.97
1, 3, 0.98
After user selects a taxon, it shows as annotation label, but after a number of savings(Save as Draft) or page refreshing the label switches back to category.
Issue with DataAttributes field?
When running pytest
on the annotation app, pytest returns warning:
/Users/kirill_sosnovskii/Library/Caches/pypoetry/virtualenvs/annotation-MSG9bELe-py3.10/lib/python3.10/site-packages/botocore/httpsession.py:62: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
context = SSLContext(ssl_version or ssl.PROTOCOL_SSLv23)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
Discussion: boto/botocore#2550
User will be able to get report by period. Report must contain:
By annotator:
Example:
annotator_id, task_id, task_status, time_start (earliest event), time_finish (draft or task_finish), agreement score (if task finished and job with extensive coverage and AGREEMENT_SCORE_SERVICE_URL was set)
Front-end:
Back-end:
annotation_statistics
with fields: job_id, task_id, event_type, event_date (current timestamp), additional_data (json)
. event_type must be: document_opened
AGREEMENT_SCORE_SERVICE_URL
. In case of extensive coverage task, when validator finishes validation task, create additional POST request to AGREEMENT_SCORE_SERVICE_URL
(if set) to get agreement score. BadgerDoc must send: annotators_id, job_id, task_id, document url (S3 resource), manifests urls (S3 resource) for each annotatorBack-end:
/taxonomy/taxonomy/
there is no tenant in header (add tenant dependency)Description:
Every resource in this module requires Tenant to be accepted as a header.
Example is here
Every time you make a query to the Taxonomy table in the database, tenant should be validated - compare received external tenant and existed entity's one.
Examples: 1, 2
Examples from the annotation service table Categories - which is also using tenants: 1, 2, 3
Logs:
INFO: 100.120.37.66:32944 - "POST /api/v1/annotation/categories/search HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
.................................
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/usr/local/lib/python3.8/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
return await loop.run_in_executor(None, func, *args)
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/annotation/./app/categories/resources.py", line 134, in search_categories
task_response = filter_category_db(db, request, x_current_tenant)
File "/opt/annotation/./app/categories/services.py", line 350, in filter_category_db
_get_parents(db, child_categories, tenant, job_id),
File "/opt/annotation/./app/categories/services.py", line 281, in _get_parents
uniq_pathes.add(cat.tree.path)
AttributeError: 'NoneType' object has no attribute 'path'
Validator is able to assign own validation task to subject matter expert from validation screen.
FE:
BE:
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/processing/./src/main.py", line 89, in get_preprocessing_result
content=send_preprocess_result(bucket_name, file_id, pages),
File "/processing/./src/send_preprocess_results.py", line 65, in send_preprocess_result
pages = get_pages(bucket, path, pages)
File "/processing/./src/send_preprocess_results.py", line 43, in get_pages
return set(
File "/processing/./src/send_preprocess_results.py", line 44, in
(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 2728, in _list_objects
response = self._execute("GET", bucket_name, query_params=query)
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 394, in _execute
region = self._get_region(bucket_name, None)
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 461, in _get_region
response = self._url_open(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 266, in _url_open
response = self._http.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
[Previous line repeated 2 more times]
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='minio', port=80): Max retries exceeded with url: /some-very-unique-prefix-123456-test?location= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f14d27816d0>: Failed to establish a new connection: [Errno -2] Name or service not known
If you run pytest
in annotation service, it fails in different tests: minio, botocore and others. Seems that docker-compose.yml
doesn't contain all required services as dependency.
During local bootstrap of annotation (with docker-compose) we have this:
annotation | 09-Dec-22 10:33:03 - [INFO] - uvicorn.error - (server.py).serve(84) - Started server process [9]
annotation | INFO: Waiting for application startup.
annotation | 09-Dec-22 10:33:03 - [INFO] - uvicorn.error - (on.py).startup(45) - Waiting for application startup.
annotation | 09-Dec-22 10:33:13 - [WARNING] - kafka.conn - (conn.py).dns_lookup(1527) - DNS lookup failed for kafka:9092, exception was [Errno -3] Temporary failure in name
resolution. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
annotation | 09-Dec-22 10:33:13 - [ERROR] - kafka.conn - (conn.py)._dns_lookup(315) - DNS lookup failed for kafka:9092 (AddressFamily.AF_UNSPEC)
annotation | 09-Dec-22 10:33:13 - [INFO] - kafka.conn - (conn.py).check_version(1205) - Probing node bootstrap-0 broker version
annotation | 09-Dec-22 10:33:23 - [WARNING] - kafka.conn - (conn.py).dns_lookup(1527) - DNS lookup failed for kafka:9092, exception was [Errno -3] Temporary failure in name
resolution. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
annotation | 09-Dec-22 10:33:23 - [ERROR] - kafka.conn - (conn.py)._dns_lookup(315) - DNS lookup failed for kafka:9092 (AddressFamily.AF_UNSPEC)
annotation | 09-Dec-22 10:33:23 - [WARNING] - app.logger - (main.py)._init_search_annotation_producer(957) - Error occurred during kafka producer creating: NoBrokersAvailab
le
annotation | INFO: Application startup complete.
The app tries to find kafka, but doesn't find it
Logs from dev1:
Traceback (most recent call last):
File "/usr/local/bin/alembic", line 5, in
from alembic.config import main
File "/usr/local/lib/python3.8/site-packages/alembic/init.py", line 3, in
from . import context
File "/usr/local/lib/python3.8/site-packages/alembic/context.py", line 1, in
from .runtime.environment import EnvironmentContext
File "/usr/local/lib/python3.8/site-packages/alembic/runtime/environment.py", line 12, in
from .migration import MigrationContext
File "/usr/local/lib/python3.8/site-packages/alembic/runtime/migration.py", line 17, in
from sqlalchemy import Column
ModuleNotFoundError: No module named 'sqlalchemy'
Psycopg2 is used as a backbone for sqlalchemy.
Errors from this lib are hot handled by FastAPI handler even with 'Exception' provided as a base class.
I've met this when table was not updated properly with migration and insert caused errors.
INFO: 10.233.90.227:38012 - "POST /api/v1/jobs/jobs/search HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 216, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.8/site-packages/fastapi/dependencies/utils.py", line 550, in solve_dependencies
solved = await call(**sub_values)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 52, in call
decoded = self.decode_rs256(token)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 99, in decode_rs256
signing_key = self.jwk_client.get_signing_key_from_jwt(token)
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 59, in get_signing_key_from_jwt
return self.get_signing_key(header.get("kid"))
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 41, in get_signing_key
signing_keys = self.get_signing_keys()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 28, in get_signing_keys
jwk_set = self.get_jwk_set()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 24, in get_jwk_set
data = self.fetch_data()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 20, in fetch_data
with urllib.request.urlopen(self.uri) as response:
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
19-Dec-22 12:23:55 - [ERROR] - uvicorn.error - (h11_impl.py).run_asgi(376) - Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 216, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.8/site-packages/fastapi/dependencies/utils.py", line 550, in solve_dependencies
solved = await call(**sub_values)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 52, in call
decoded = self.decode_rs256(token)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 99, in decode_rs256
signing_key = self.jwk_client.get_signing_key_from_jwt(token)
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 59, in get_signing_key_from_jwt
return self.get_signing_key(header.get("kid"))
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 41, in get_signing_key
signing_keys = self.get_signing_keys()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 28, in get_signing_keys
jwk_set = self.get_jwk_set()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 24, in get_jwk_set
data = self.fetch_data()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 20, in fetch_data
with urllib.request.urlopen(self.uri) as response:
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
Task is for annotation of particular pages of the document.
One annotator (and one validator) is assigned to the particular pages of the document.
This logic must be improved:
The tasks should be created and distributed among multiple annotators, so that extensive_coverage (the integer field in the annotation_job) annotators will annotate the page.
Annotator: user
Job: extraction job. Contains annotators, documents, categories + taxonomies, annotation type.
Annotation type: cross validation, hierarchic validation
Cross validation: Automatically assigns some annotators as validators
Hierarchic validation: Assign some users as annotators, some as validators
We need to add new annotation type: extensive coverage
Extensive coverage: One task can be assigned to many annotators
When user creates new job, user can select how much users will be assigned to one task (extensive coverage) N
Task: exact pages in exact document assigned to 1 annotator. This logic must be saved.
We need to create tasks to N annotators.
We have algorithm to assign task between annotators. Count of pages must be multiplied on N. For example, if we have to distribute originally 100 tasks between 4 annotators, and extensive coverage equals 3, so we need to create 3 * 100 = 300
tasks and assign it between 4 annotators EXCLUDING duplications. So that mean, 1 must annotate 1 page exactly once.
Compatibility with current implementation: This is new validation type, all previous validation type must be the same as now
When BD shows progress, it must be recalculated with extensive coverage multiplication
Add in drop down menu new validation type: Extensive coverage
Open form with 1 additional input: count of annotators. All others inputs must be the same as hierarchical validation.
Add GitHub Actions to run CI pipelines for each microservice
We need to change current algorithm, how taxonomies are attached to category.
Front-end
taxonomy
, add selector of taxonomy per categoryBack-end
We should check if current project environment can work with S3 which will replace Minio.
Check if:
Most priority for extraction job creation screen
When new job is creating, user can select categories for this job. If user selects category with parents, all parents must be assigned to this job (annotation service). It means, that if user selects sub-category, all parents are also selected
Add /annotation/categories/search
endpoint with hierarchy support scheme
Add /annotation/jobs/{job_id}/categories/search
endpoint with hierarchy support scheme
Hierarchy support scheme contains:
is_leaf
boolean, to show is current node has childrenparents
list of parent categories, ordered from root to current node without current nodeExample:
"data": [
{
"name": "Child2",
"parent": "Child1",
"type": "box",
"id": "Child2",
"is_leaf": true,
"parents": [
{
"name": "Root",
"type": "box",
"id": "Root",
"is_leaf": false
},
{
"name": "Child1",
"parent": "root",
"type": "box",
"id": "Child1",
"is_leaf": false
}
]
}
]
Supported operations:
children
- returns children of value
, to get only root nodes set value
as root
children_recursive
- returns all children down the tree
parent
- returns parent
parents_recursive
- returns all parents
Current implementation has different config variables for the same things, e.g. S3_ENDPOINT
and MINIO_HOST
. We should generalize it for easier set up.
Fix log warning
Stack trace:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade f44cabeef963 -> 66cd6054c2d0, Add Categories tree
/usr/local/lib/python3.8/site-packages/SQLAlchemy-1.3.23-py3.8-linux-x86_64.egg/sqlalchemy/orm/relationships.py:2273: SAWarning: On Category.parent_id, 'passive_deletes' is normally configured on one-to-many, one-to-one, many-to-many relationships only.
util.warn(
INFO: Started server process [9]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
Back-end:
Design:
Front-end:
Currently, unittest do not check migration state, which could cause many issue.
When I try to upload file I have the following error in assets service:
INFO: 100.120.37.66:33672 - "POST /files HTTP/1.1" 503 Service Unavailable
How to reproduce.
Error into the assets service: `INFO: 100.120.37.66:33672 - "POST /files HTTP/1.1" 503 Service Unavailable`Assets service deployment configuration:
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "4"
meta.helm.sh/release-name: assets
meta.helm.sh/release-namespace: app
creationTimestamp: "2022-11-15T15:50:56Z"
generation: 5
labels:
app: assets
app.kubernetes.io/managed-by: Helm
name: assets
namespace: app
resourceVersion: "133922033"
uid: 55df24fc-2fc9-4918-93c9-e6b0a26e9a61
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: assets
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
creationTimestamp: null
labels:
app: assets
spec:
containers:
- args:
- -c
- alembic upgrade afa33cc83d57 && alembic upgrade fe5926249504 && alembic
upgrade 0f6c859c1d1c && alembic upgrade head && uvicorn src.main:app --host
0.0.0.0 --port 8080 --root-path /api/v1/assets
command:
- /bin/sh
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
key: DATABASE_URL
name: assets
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
key: POSTGRES_USER
name: assets
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: POSTGRES_PASSWORD
name: assets
- name: ENDPOINT
value: expl-trm-tmt-badgerdoc-us-east-2
- name: MINIO_ACCESS_KEY
valueFrom:
secretKeyRef:
key: MINIO_ACCESS_KEY
name: assets
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
key: MINIO_SECRET_KEY
name: assets
- name: JWT_SECRET
valueFrom:
secretKeyRef:
key: JWT_SECRET
name: assets
image: ghcr.io/vertexinc/trm-badgerdoc/assets:0.1.7-sn
imagePullPolicy: IfNotPresent
name: assets
resources:
limits:
cpu: 500m
memory: 1000Mi
requests:
cpu: 200m
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: trm-build-bot-secret
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: badgerdoc
serviceAccountName: badgerdoc
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-11-15T16:04:05Z"
lastUpdateTime: "2022-11-15T16:04:05Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2022-11-15T15:50:56Z"
lastUpdateTime: "2022-11-22T10:33:53Z"
message: ReplicaSet "assets-7c998c59fc" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 5
readyReplicas: 1
replicas: 1
updatedReplicas: 1
There is existing requests link between microservices annotation (Category model) and taxonomy (Taxonomy model). Change its relation from OneToOne to ManyToMany
TBD
Back-end must support tasks to check pairs.
Fixtures should be reworked. Looks like issue is in the way how we delete data from DB in tests
All our database interactions are not atomic. Should we rework it?
Settings.attributes in processing are changed by mistake (see merged PR https://github.com/epam/badgerdoc/pull/30/files) - we need to fix it for correct deployment of processing
data
for /taxon/search[
{
"id": "123",
"is_leaf": true,
"parents: [
// list of taxons from root to current excluding current
]
}
]
When validator proceed validation task in case of extension coverage, so validator requires see multiple revisions in the same screen in different frames.
In validators UI makes request to tasks by document and job_id. Then UI requests latest revisions for selected pages from task. In case of extensive coverage validation, UI must get latest versions by each annotator (we have already this endpoint).
original_annotation_id
to annotation POST endpoint (create new annotation)We need to show multiple frames/windows from every document to show different annotations
By default validator see N + 1 windows. We need to clarify, if validators window must be prefilled. Currently, validators window is empty.
Add functionality to select correct annotation. When validator clicks on any bbox or label in any frames, this bbox copied to validators window. Validator can edit annotations only in own frame. Ctrl + Z must work, if validator miss clicked.
Scroll must be the same for all annotations (frames)
Validators stores own revision, compiling from other annotations
Logs:
15-Dec-22 08:00:00 - [ERROR] - apscheduler.executors.default - (base.py).run_job(131) - Job "delete_file_after_7_days (trigger: cron[hour='*/1'], next run at: 2022-12-15 09:00:00 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/opt/users_filter/./src/utils.py", line 32, in delete_file_after_7_days
buckets = client.list_buckets()
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 647, in list_buckets
response = self._execute("GET")
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 406, in _execute
return self._url_open(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 389, in _url_open
raise response_error
minio.error.S3Error: S3 operation failed; code: AccessDenied, message: Access Denied, resource: None, request_id: QXH4MVJY89W8GZ7H, host_id: ycEdNmVAVIugBtDcgetRJKeavrMebpPdS2ykj2C5HoNDVLWTCOve7ozq0SbK2SvScie+Zw4KI5U=
As a developer I want to check that contributors follows convention of commits naming in the repository. We should add Conventional Commits check to the CI. Here is a good linter for this purpose written in python which supports pre-commit integration as well: commitizen
DOD:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.