Git Product home page Git Product logo

badgerdoc's People

Contributors

aazaliyaa avatar abdulazizi avatar aleksei-egorenko-epam avatar ammb95 avatar anastasiiaplyako avatar andrei-shulaev avatar andrka avatar aslvuwe avatar borisevich-a-v avatar cakeinsauce avatar gi6rgi avatar iogsotot avatar isokrat avatar iurii-topychkanov avatar khyurri avatar magictearsasunder avatar mikemalkhasyanvertex avatar minev-dev avatar mysterionrise avatar nanbratan avatar nosafi avatar sbilevich avatar serereg avatar sokolmask avatar spiridonovfed avatar sunnycapt avatar theoriginmm avatar thinklab avatar tokhirov-abzal avatar ziprion avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

badgerdoc's Issues

Progress report per annotator (doc name, annotator id, date of finishing)

General algorithm

User will be able to get report by period. Report must contain:

By annotator:

  • Finished tasks + AS + (time_start, time_finish to calculate how much time annotator worked in single document)
  • Tasks in progress + job start time

Example:

annotator_id, task_id, task_status, time_start (earliest event), time_finish (draft or task_finish), agreement score (if task finished and job with extensive coverage and AGREEMENT_SCORE_SERVICE_URL was set)

Front-end:

  • send event (post) when document is open by annotator or validator during task
  • add page to download result by user and date_from, date_to

Back-end:

  • add endpoint to save document open timestamp, we need to create new table: annotation_statistics with fields: job_id, task_id, event_type, event_date (current timestamp), additional_data (json). event_type must be: document_opened
  • add new config value (annotation service): AGREEMENT_SCORE_SERVICE_URL. In case of extensive coverage task, when validator finishes validation task, create additional POST request to AGREEMENT_SCORE_SERVICE_URL (if set) to get agreement score. BadgerDoc must send: annotators_id, job_id, task_id, document url (S3 resource), manifests urls (S3 resource) for each annotator
  • Store all metrics between annotator and validator. Each annotator has own AS for each task over validator.

Add tenant to all `taxonomy` endpoints

Back-end:

  • /taxonomy/taxonomy/ there is no tenant in header (add tenant dependency)

Description:

  1. Every resource in this module requires Tenant to be accepted as a header.
    Example is here

  2. Every time you make a query to the Taxonomy table in the database, tenant should be validated - compare received external tenant and existed entity's one.
    Examples: 1, 2
    Examples from the annotation service table Categories - which is also using tenants: 1, 2, 3

botocore lib warning on pytest annotation

When running pytest on the annotation app, pytest returns warning:

  /Users/kirill_sosnovskii/Library/Caches/pypoetry/virtualenvs/annotation-MSG9bELe-py3.10/lib/python3.10/site-packages/botocore/httpsession.py:62: DeprecationWarning: ssl.PROTOCOL_TLS is deprecated
    context = SSLContext(ssl_version or ssl.PROTOCOL_SSLv23)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Discussion: boto/botocore#2550

Bug in release 1.1.0 on Vertex env - minio.error.S3Error on 'users' in regular background task "delete_file_after_7_days"

Logs:
15-Dec-22 08:00:00 - [ERROR] - apscheduler.executors.default - (base.py).run_job(131) - Job "delete_file_after_7_days (trigger: cron[hour='*/1'], next run at: 2022-12-15 09:00:00 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/opt/users_filter/./src/utils.py", line 32, in delete_file_after_7_days
buckets = client.list_buckets()
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 647, in list_buckets
response = self._execute("GET")
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 406, in _execute
return self._url_open(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 389, in _url_open
raise response_error
minio.error.S3Error: S3 operation failed; code: AccessDenied, message: Access Denied, resource: None, request_id: QXH4MVJY89W8GZ7H, host_id: ycEdNmVAVIugBtDcgetRJKeavrMebpPdS2ykj2C5HoNDVLWTCOve7ozq0SbK2SvScie+Zw4KI5U=

Add converter from bbox to offsets in export and import

Add addition converter to export/import between PDF and plain text format.

BadgerDoc is currently working with PDF and bbox, we need also support plain text with offsets as source format (converter) and export PDF + bbox to plain text with offsets

Part 2:

  • From Label Studio format converter must create bdocs manifest and fill database (AnnotationDocs?)

Remove invalid external links in readme-s

After migrating from GitLab we did not change readme-s across the project. There are links which non-relevant or point to internal EPAM's portals (e.g. here). We should update/remove them.

DOD:

  • Removed/updated links in README-s across the project

Error during 'annotation' bootstrap locally

During local bootstrap of annotation (with docker-compose) we have this:

annotation | 09-Dec-22 10:33:03 - [INFO] - uvicorn.error - (server.py).serve(84) - Started server process [9]
annotation | INFO: Waiting for application startup.
annotation | 09-Dec-22 10:33:03 - [INFO] - uvicorn.error - (on.py).startup(45) - Waiting for application startup.
annotation | 09-Dec-22 10:33:13 - [WARNING] - kafka.conn - (conn.py).dns_lookup(1527) - DNS lookup failed for kafka:9092, exception was [Errno -3] Temporary failure in name
resolution. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
annotation | 09-Dec-22 10:33:13 - [ERROR] - kafka.conn - (conn.py)._dns_lookup(315) - DNS lookup failed for kafka:9092 (AddressFamily.AF_UNSPEC)
annotation | 09-Dec-22 10:33:13 - [INFO] - kafka.conn - (conn.py).check_version(1205) - Probing node bootstrap-0 broker version
annotation | 09-Dec-22 10:33:23 - [WARNING] - kafka.conn - (conn.py).dns_lookup(1527) - DNS lookup failed for kafka:9092, exception was [Errno -3] Temporary failure in name
resolution. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
annotation | 09-Dec-22 10:33:23 - [ERROR] - kafka.conn - (conn.py)._dns_lookup(315) - DNS lookup failed for kafka:9092 (AddressFamily.AF_UNSPEC)
annotation | 09-Dec-22 10:33:23 - [WARNING] - app.logger - (main.py)._init_search_annotation_producer(957) - Error occurred during kafka producer creating: NoBrokersAvailab
le
annotation | INFO: Application startup complete.

The app tries to find kafka, but doesn't find it

Annotation review and side-by-side comparison (to manually review the annotations from multiple annotators labeled same doc for extensive coverage

General algorithm

When validator proceed validation task in case of extension coverage, so validator requires see multiple revisions in the same screen in different frames.

In validators UI makes request to tasks by document and job_id. Then UI requests latest revisions for selected pages from task. In case of extensive coverage validation, UI must get latest versions by each annotator (we have already this endpoint).

Back-end:

Design and Front-end

We need to show multiple frames/windows from every document to show different annotations
By default validator see N + 1 windows. We need to clarify, if validators window must be prefilled. Currently, validators window is empty.
Add functionality to select correct annotation. When validator clicks on any bbox or label in any frames, this bbox copied to validators window. Validator can edit annotations only in own frame. Ctrl + Z must work, if validator miss clicked.
Scroll must be the same for all annotations (frames)

Validators stores own revision, compiling from other annotations

Minio replacement with S3

We should check if current project environment can work with S3 which will replace Minio.
Check if:

  • Services can work with S3
    • Minio Python SDK is compatible with S3
    • Bucket URL replacement is enough for bootstrap
  • Services can connect to S3 k8s pod without credentials (internally)

Filters applied from session storage haven't been shown

When applying a filter and updating a page, a filter taken from session storage works correctly, but hasn't been shown in table.
Need to check and fix all filters from session storage:

  • Documents
  • Tasks
  • Basements
  • Models
  • Jobs

image

Document labelling: document pair classification (for example, determining if the texts are similar or not)

Front-end

  • Design is required

Front-end

  • Add view to switch between document labels and document links
  • Add component which shows document links(categories) in accordion
  • Linked documents should be shown under selected category
  • Show documents side-by-side when selecting linked document
  • Add an ability to select relation(category) between shown documents
  • Add api calls to get and send selected links

Back-end

  • Have to support relations between documents: similar, identical, etc. One document may have multiple relations. Back-end must support different types of relationships
  • Add linking between documents by link type. Link and documents must be in revision
  • Add support (or check support) of initial load, when we upload revision for all annotators (check with Mikhail)

General algorithm

Back-end must support tasks to check pairs.

Alembic Categories relation log warning

Fix log warning

Stack trace:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade f44cabeef963 -> 66cd6054c2d0, Add Categories tree
/usr/local/lib/python3.8/site-packages/SQLAlchemy-1.3.23-py3.8-linux-x86_64.egg/sqlalchemy/orm/relationships.py:2273: SAWarning: On Category.parent_id, 'passive_deletes' is normally configured on one-to-many, one-to-one, many-to-many relationships only.
  util.warn(
INFO:     Started server process [9]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080/ (Press CTRL+C to quit)

Taxonomy attach in job creation

We need to change current algorithm, how taxonomies are attached to category.

Front-end

  • In Job creation, if selected category has attribute with type taxonomy, add selector of taxonomy per category
  • Add additional call to get taxonomy id by job_id, category_id

Back-end

  • Category -> Taxonomy link must exists ONLY when Job is created and only for this exact job
  • Check endpoint: category -> taxonomy -> job_id link. Extend JOB creation functionality and add linking to back-end.
  • Need to create endpoint to get taxonomy by job_id + category_id
  • Need to create search all taxonomies (using filter_lib)

Annotation: ability to assign multiple (separate) tasks for multiple annotators to annotate the same page (for extensive coverage)

Task is for annotation of particular pages of the document.
One annotator (and one validator) is assigned to the particular pages of the document.

This logic must be improved:
The tasks should be created and distributed among multiple annotators, so that extensive_coverage (the integer field in the annotation_job) annotators will annotate the page.


Annotator: user
Job: extraction job. Contains annotators, documents, categories + taxonomies, annotation type.
Annotation type: cross validation, hierarchic validation
Cross validation: Automatically assigns some annotators as validators
Hierarchic validation: Assign some users as annotators, some as validators

We need to add new annotation type: extensive coverage
Extensive coverage: One task can be assigned to many annotators

General algorithm

When user creates new job, user can select how much users will be assigned to one task (extensive coverage) N

Task: exact pages in exact document assigned to 1 annotator. This logic must be saved.
We need to create tasks to N annotators.

We have algorithm to assign task between annotators. Count of pages must be multiplied on N. For example, if we have to distribute originally 100 tasks between 4 annotators, and extensive coverage equals 3, so we need to create 3 * 100 = 300 tasks and assign it between 4 annotators EXCLUDING duplications. So that mean, 1 must annotate 1 page exactly once.

Compatibility with current implementation: This is new validation type, all previous validation type must be the same as now

When BD shows progress, it must be recalculated with extensive coverage multiplication

Front-end

Add in drop down menu new validation type: Extensive coverage
Open form with 1 additional input: count of annotators. All others inputs must be the same as hierarchical validation.

Bug in release 1.1.0 on Vertex env - Internal Server Error in 'annotation' on 'categories/search'

Logs:
INFO: 100.120.37.66:32944 - "POST /api/v1/annotation/categories/search HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
.................................
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/usr/local/lib/python3.8/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
return await loop.run_in_executor(None, func, *args)
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/annotation/./app/categories/resources.py", line 134, in search_categories
task_response = filter_category_db(db, request, x_current_tenant)
File "/opt/annotation/./app/categories/services.py", line 350, in filter_category_db
_get_parents(db, child_categories, tenant, job_id),
File "/opt/annotation/./app/categories/services.py", line 281, in _get_parents
uniq_pathes.add(cat.tree.path)
AttributeError: 'NoneType' object has no attribute 'path'

Bug in release 1.1.0 on Vertex env - 404 on 'GET /download/thumbnail' in 'assets'

Logs:
14-Dec-22 19:23:18 - [ERROR] - src.utils.minio_utils - (minio_utils.py).remake_thumbnail(111) - File is not an image
INFO: 100.120.37.66:53972 - "GET /download/thumbnail?file_id=8 HTTP/1.1" 404 Not Found
INFO: 100.120.37.66:50964 - "POST /files/search HTTP/1.1" 200 OK
INFO: 100.120.37.66:50964 - "POST /datasets/search HTTP/1.1" 200 OK
INFO: 100.120.37.66:50970 - "POST /files/search HTTP/1.1" 200 OK
14-Dec-22 19:23:26 - [ERROR] - src.utils.minio_utils - (minio_utils.py).remake_thumbnail(111) - File is not an image
INFO: 100.120.37.66:50986 - "GET /download/thumbnail?file_id=8 HTTP/1.1" 404 Not Found
INFO: 100.120.37.66:51004 - "GET /download/thumbnail?file_id=7 HTTP/1.1" 200 OK

All services have identical config variables to set up

Current implementation has different config variables for the same things, e.g. S3_ENDPOINT and MINIO_HOST. We should generalize it for easier set up.

  • Config variables of identical things are the same for all services
  • Adjusted k8s deployments

Annotation service: add hierarchy support to categories

Back-end

  • When new job is creating, user can select categories for this job. If user selects category with parents, all parents must be assigned to this job (annotation service). It means, that if user selects sub-category, all parents are also selected

  • Add /annotation/categories/search endpoint with hierarchy support scheme

  • Add /annotation/jobs/{job_id}/categories/search endpoint with hierarchy support scheme

Hierarchy support scheme contains:

  • is_leaf boolean, to show is current node has children
  • parents list of parent categories, ordered from root to current node without current node

Example:

  "data": [
    {
      "name": "Child2",
      "parent": "Child1",
      "type": "box",
      "id": "Child2",
      "is_leaf": true,
      "parents": [
        {
          "name": "Root",
          "type": "box",
          "id": "Root",
          "is_leaf": false
        },
        {
          "name": "Child1",
          "parent": "root",
          "type": "box",
          "id": "Child1",
          "is_leaf": false
        }
      ]
    }
  ]

Front-end

  • [ ]

Filterlib LTREE support

  • Add filter_lib support LTREE for filtering data

Supported operations:
children - returns children of value, to get only root nodes set value as root
children_recursive - returns all children down the tree
parent - returns parent
parents_recursive - returns all parents

Others

  • Add annotation service PUT | POST, migrate Category table to LTREE structure
  • Add endpoint /annotation/jobs/job_id/categoiries/search (search by jobs categories)
  • Add Front-end support: get hierarchy by job_id and search in categories
  • Add Front-end add categories, update and delete?
  • Setup dev internal environment

Bug in release 1.1.0 on Vertex env - 'processing' does not work properly

return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/processing/./src/main.py", line 89, in get_preprocessing_result
content=send_preprocess_result(bucket_name, file_id, pages),
File "/processing/./src/send_preprocess_results.py", line 65, in send_preprocess_result
pages = get_pages(bucket, path, pages)
File "/processing/./src/send_preprocess_results.py", line 43, in get_pages
return set(
File "/processing/./src/send_preprocess_results.py", line 44, in
(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 2728, in _list_objects
response = self._execute("GET", bucket_name, query_params=query)
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 394, in _execute
region = self._get_region(bucket_name, None)
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 461, in _get_region
response = self._url_open(
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 266, in _url_open
response = self._http.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 813, in urlopen
return self.urlopen(
[Previous line repeated 2 more times]
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='minio', port=80): Max retries exceeded with url: /some-very-unique-prefix-123456-test?location= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f14d27816d0>: Failed to establish a new connection: [Errno -2] Name or service not known

Integration of custom agreement metrics on the batch level (to assess the overall level of agreement between annotators)

General algorithm

Example: 2 annotators annotates 1 document, if they annotation is not match, we need to have algorithm to compare annotations if they not match. Agreement metrics helps to check if annotations are matched.

  • Add config value AGREEMENT_SCORE_MIN_MATCH float
  • When job task transitions to validation state, BD makes request to external service for getting is validation required
  • Metrics must be stored in task level into database

This task is very close to: #64

Table agreement_scores:

Tasks: 1, 2, 3 <-- one part of document
task_from, task_to, agreement_score
1, 2, 0.95
2, 1, 0.95 <--- doesn't make sense
2, 3, 0.97
1, 3, 0.98

Taxonomies support and versioning

  • [FE] New attribute type "taxonomy" for the category entity (not the type of category)
  • [FE] Fix bug: if there is attributes of the category, the "data" tab should be opened right after the new bbox (text) is created (it is not every time for now)
  • [FE] On the Data tab for the attribute taxonomy there should be a tree (exactly the same visual design as for hierarchical categories including ability to search). The tree should be requested from the "taxonomy" microservice by job_id, category_id. When the taxonomy is selected it is set to the value in the "data" filed inside the bbox (as it is done for text, latex etc.)
  • [FE] When the taxon is selected for the category, for which there is a attribute with taxonomy type, the "UI label" (the one which is shown on UI above the bbox) should be changed to the name of the taxon. The label for such bboxes should be shown instantly.
  • [BE] The "taxonomy" microservice should be created. It should contain entity: Taxonomy (id, name, version, category_id (e.g. jurisdiction)), Taxon (id, name, taxonomy_id, parent_id, ltree). Taxon has parent_id.
  • [BE] Response data for /taxon/search
[
    {
        "id": "123",
        "is_leaf": true,
        "parents: [ 
            // list of taxons from root to current excluding current
        ] 
    }
]
  • [BE] Mapping ManyToMany job to taxonomy. And API to get taxonomies for job_id (more than one). POST to add them.

Release 1.1.0: Annotation POST jobs/search fails - leads to empty 'documents' page on UI

INFO: 10.233.90.227:38012 - "POST /api/v1/jobs/jobs/search HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 216, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.8/site-packages/fastapi/dependencies/utils.py", line 550, in solve_dependencies
solved = await call(**sub_values)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 52, in call
decoded = self.decode_rs256(token)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 99, in decode_rs256
signing_key = self.jwk_client.get_signing_key_from_jwt(token)
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 59, in get_signing_key_from_jwt
return self.get_signing_key(header.get("kid"))
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 41, in get_signing_key
signing_keys = self.get_signing_keys()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 28, in get_signing_keys
jwk_set = self.get_jwk_set()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 24, in get_jwk_set
data = self.fetch_data()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 20, in fetch_data
with urllib.request.urlopen(self.uri) as response:
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
19-Dec-22 12:23:55 - [ERROR] - uvicorn.error - (h11_impl.py).run_asgi(376) - Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in call
raise exc from None
File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
response = await func(request)
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 216, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.8/site-packages/fastapi/dependencies/utils.py", line 550, in solve_dependencies
solved = await call(**sub_values)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 52, in call
decoded = self.decode_rs256(token)
File "/usr/local/lib/python3.8/site-packages/tenant_dependency-0.1.3-py3.8.egg/tenant_dependency/dependency.py", line 99, in decode_rs256
signing_key = self.jwk_client.get_signing_key_from_jwt(token)
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 59, in get_signing_key_from_jwt
return self.get_signing_key(header.get("kid"))
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 41, in get_signing_key
signing_keys = self.get_signing_keys()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 28, in get_signing_keys
jwk_set = self.get_jwk_set()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 24, in get_jwk_set
data = self.fetch_data()
File "/usr/local/lib/python3.8/site-packages/PyJWT-2.3.0-py3.8.egg/jwt/jwks_client.py", line 20, in fetch_data
with urllib.request.urlopen(self.uri) as response:
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

Create CI on Github Actions

Add GitHub Actions to run CI pipelines for each microservice

Annotation service

  • run pre-commits on annotation service
  • run pytest on annotation service

Filter-lib

  • run pre-commits on Filter-lib
  • run pytest on Filter-lib

Tenants

  • run pre-commits on Tenants
  • run pytest on Tenants

Assets

  • run pre-commits on Assets
  • run pytest on Assets

'Processing' service does not work

Logs from dev1:

Traceback (most recent call last):
File "/usr/local/bin/alembic", line 5, in
from alembic.config import main
File "/usr/local/lib/python3.8/site-packages/alembic/init.py", line 3, in
from . import context
File "/usr/local/lib/python3.8/site-packages/alembic/context.py", line 1, in
from .runtime.environment import EnvironmentContext
File "/usr/local/lib/python3.8/site-packages/alembic/runtime/environment.py", line 12, in
from .migration import MigrationContext
File "/usr/local/lib/python3.8/site-packages/alembic/runtime/migration.py", line 17, in
from sqlalchemy import Column
ModuleNotFoundError: No module named 'sqlalchemy'

Upload to S3 bucket does not work.

When I try to upload file I have the following error in assets service:
INFO: 100.120.37.66:33672 - "POST /files HTTP/1.1" 503 Service Unavailable

How to reproduce.

  1. Login to BadgerDoc
  2. Upload file
    image
  3. Getting an error

image

Error into the assets service: `INFO: 100.120.37.66:33672 - "POST /files HTTP/1.1" 503 Service Unavailable`

Assets service deployment configuration:

kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "4"
    meta.helm.sh/release-name: assets
    meta.helm.sh/release-namespace: app
  creationTimestamp: "2022-11-15T15:50:56Z"
  generation: 5
  labels:
    app: assets
    app.kubernetes.io/managed-by: Helm
  name: assets
  namespace: app
  resourceVersion: "133922033"
  uid: 55df24fc-2fc9-4918-93c9-e6b0a26e9a61
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: assets
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      creationTimestamp: null
      labels:
        app: assets
    spec:
      containers:
      - args:
        - -c
        - alembic upgrade afa33cc83d57 && alembic upgrade fe5926249504 && alembic
          upgrade 0f6c859c1d1c && alembic upgrade head && uvicorn src.main:app --host
          0.0.0.0 --port 8080 --root-path /api/v1/assets
        command:
        - /bin/sh
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              key: DATABASE_URL
              name: assets
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              key: POSTGRES_USER
              name: assets
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: POSTGRES_PASSWORD
              name: assets
        - name: ENDPOINT
          value: expl-trm-tmt-badgerdoc-us-east-2
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: MINIO_ACCESS_KEY
              name: assets
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: MINIO_SECRET_KEY
              name: assets
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              key: JWT_SECRET
              name: assets
        image: ghcr.io/vertexinc/trm-badgerdoc/assets:0.1.7-sn
        imagePullPolicy: IfNotPresent
        name: assets
        resources:
          limits:
            cpu: 500m
            memory: 1000Mi
          requests:
            cpu: 200m
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: trm-build-bot-secret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: badgerdoc
      serviceAccountName: badgerdoc
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2022-11-15T16:04:05Z"
    lastUpdateTime: "2022-11-15T16:04:05Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-11-15T15:50:56Z"
    lastUpdateTime: "2022-11-22T10:33:53Z"
    message: ReplicaSet "assets-7c998c59fc" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 5
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.