Git Product home page Git Product logo

kustvakt's People

Contributors

abcpro1 avatar akron avatar bodmo avatar dependabot[bot] avatar kupietz avatar margaretha avatar michaelhanl avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kustvakt's Issues

Handling "no resource found"

Responses for "no result found" and "no resource found" cases should be differentiated. NO_RESOURCE_FOUND or RESOURCE_NOT_FOUND status code should be added and the response should be sent with HTTP 404 (not found).

User settings

User settings are supposed to be flexible and extensible. Thus, it is designed to be stored as JSON.

  • Migrate user_settings table to new database schemes
  • Update associated implementations using Hibernate
  • Set often-used attributes as table columns
  • Adds metadata fields in the user settings see KorAP/Kalamar#41

User-group member invitation

When a user creates a user group (UG) , he/she is automatically becomes a UG admin and a virtual corpus access (VCA) admin. To add members to the group, a UG admin may sent invitations to some users by listing their usernames. These users may accept or reject the invitations.

If a user have rejected an invitation or he has been deleted from a group, a new invitation to the group can be sent to him/her.

Support OAuth2 authorization via PIN

Kustvakt should support OAuth2 authorization via PIN (or other kind of code) activation, for instance to facilitate login in "IDS Wortraumstation" using virtual reality (VR) tools. The authorization process involves two KorAP front-end instances, e.g. VR app and Kalamar for the browser. The general flow is:

  1. VR app shows PIN and sends a poll request with username & PIN to Kustvakt.
  2. User login to Kalamar and activate PIN in Kalamar.
  3. Kustvakt matches username & PIN from VR app & Kalamar, and sends response with an access token to the VR app

To achieve this, two additional web-service are needed for:

  1. poll requesting oauth2 token using pin and username
  2. pin activation requiring user authentication in the authorization header

This issue is related to KorAP/Kalamar#101.

Support for reloading Krill-index reader

Krill-indexer may be run separately from a Kustvakt server to add or update documents to an existing index. To recognize these changes in the running KorAP system, the Krill index reader within Kustvakt must be reloaded.

Kustvakt can accommodate this function through an API request to close the index reader. Krill automatically open the index reader next time the index is to be read.

Added VC re-caching to the index reload api

When Krill-Index is updated, LeafReaderContext used as key reference in the VC cache, are changed. Thus, the cache should be updated, i.e. the predefined virtual corpora should be re-cached.

Replace virtual corpus id in URI

As an alternative to an auto-generated virtual corpus id, a unique combination of username and vcname can better represent a virtual corpus. This representation should be adopted in URI in place of VC id.

Wrong user credentials lead to server failure in OAuth2 password grant flow

When sending wrong username or password info to /oauth2/token, the server response with:

HTTP/1.1 500 Server Error
Cache-Control: must-revalidate,no-cache,no-store
Content-Type: text/html;charset=iso-8859-1
Content-Length: 4654
Connection: close
Server: Jetty(9.4.z-SNAPSHOT)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /api/v1.0/oauth2/token. Reason:
<pre>    Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NullPointerException
\x09at de.ids_mannheim.korap.web.OAuth2ResponseHandler.throwit(OAuth2ResponseHandler.java:90)
\x09at de.ids_mannheim.korap.web.OAuth2ResponseHandler.throwit(OAuth2ResponseHandler.java:83)
\x09at de.ids_mannheim.korap.web.controller.OAuth2Controller.requestAccessToken(OAuth2Controller.java:208)
\x09at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
\x09at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
\x09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\x09at java.lang.reflect.Method.invoke(Method.java:498)
\x09at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
\x09at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
\x09at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
\x09at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
\x09at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
\x09at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
\x09at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
\x09at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
\x09at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
\x09at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
\x09at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
\x09at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
\x09at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
\x09at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
\x09at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
\x09at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
\x09at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
\x09at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
\x09at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
\x09at org.eclipse.jetty.server.Server.handle(Server.java:503)
\x09at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
\x09at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
\x09at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
\x09at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
\x09at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
\x09at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
\x09at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
\x09at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
\x09at java.lang.Thread.run(Thread.java:748)
</pre>
<hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>

</body>
</html>

Expected behaviour: Return a json object with error and error_description with a status 401 response.

Support multiple cq parameters throughout all APIs

Currently a VC is defined by a single cq parameter throughout multiple endpoints. It would be great to support multiple cq parameters everywhere to define a single VC with an "and"-conjunction (e.g. ?cq=pubDate%3D2009-12-06&cq=corpusSigle%3DWPD becomes the VC pubDate in 2009-12-06 & corpusSigle="WPD").

The use case is to rewrite URLs to redirect from a separated instance (like /instance/example?...) to another instance, that incorporates the corpus as a subcorpus by just appending the subcorpus definition as a (potentially additional) cq parameter (e.g. /instance/main?...&cq=exampleCorpus). By doing that, URLs of separated instances can be published and will keep working, even when the instance is removed, as long as the corpus is transfered to another instance and can be defined as a VC.

VC sharing by group member

Is it necessary/desirable for group members to be able to share their VC with the group?
Note: group member cannot see other members.

This function requires an approval from VC access admin.

Redundant availabilities

If a registered user requests a search, then changes a connection to VPN afterwards and resends the search, the availability rewrite will result in

image

This bug would be resolved with a rewrite revert mechanism in Kalamar (see KorAP/Kalamar#65).

OAuth2 token management for users

Users should be able to list all their apps having KorAP access tokens, and remove these accesses from KorAP's super client (Kalamar).

Handling Shibboleth data & authentication token

Data/attributes from Shibboleth session are to be parsed by Kalamar and sent to Kustvakt. In Kustvakt, these data along with user agreement of KorAP/DeReKo usage should be stored in the database. As a response, Kustvakt should create an authentication token similar to that of LDAP authentication and send it to Kalamar.

User Email

For email notification of group-member invitations, we needs to fetch user-emails via LDAP.

Handling using VC in caching process

When a virtual corpus is in the middle of a caching process, the statistics API referring to that VC will load for possibly a long time until the VC is fully cached. Kustvakt may instead sends a response with an error or a warning regarding the caching process.

Virtual corpus sharing

Constraints for virtual corpus sharing:

  • A user may only share his/her own virtual corpus (VC).
  • A VC can only be shared with a user-group, not individuals.
  • When a user creates a user-group, he/she automatically becomes a VC access admin of the group.
  • Only VC access (VCA) admins of a user-group are allowed to share their VC with the group.

Generating OAuth2 tokens

Apache Oltu token issuer uses UUID in an inefficient encoding (redundant parsing from string to bytes and vice versa) and seems to be not quite secure.

Another token generation method is needed.

Reintroduce versioning

At the beginning of Kustvakt development, the Web-API had a version prefix /v0.1/. This was removed without an alternative.
To make breaking API changes possible, versioning should be reintroduced.

There is some discussion (1, 2) whether different ressources is the right way to handle versioning, but I guess it's the easiest for maintenance and for the developer.

This is especially important now that OAuth and plugins should invite external developers to use our API.

Email notifications of group member invitations

When a group admin invite some users to a group, notifications should be sent to their email addresses (obtain user data via OAuth2) informing them to log in to KorAP in order to accept or reject the invitations.

Add authentication to metadata api

Requests of document metadata should be restricted with user authentication in similar way to match-info requests. This includes if a document may be viewed and and which metadata fields can be presented with respect to the user authentication.

Username and groupname prefix

As part of URI, username and groupname path parameters can be differentiated from other paths, e.g. "vc", by adding prefixes to them, e.g. ~username and @group.

This issue is suggested by @Akron.

Search in unauthorized VC

How should access to an authorized VC be handled?
For instance, user A would like to search in a private VC of user B.
Currently, Kustvakt will throw an unauthorized error.
Should it be more tolerant, e.g. by using rewrite to remove the VC instead?

Retrieving custom metadata fields

Currently Kustvakt only has support for retrieving all metadata fields. It should also support retrieving a custom list of metadata fields. Moreover, the resulting fields should be sorted according to the order in the list.

This issue is suggested by @Akron.

Managing user group member roles

Add services to allow user-group admins to assign and edit roles of user-group members. Roles can only be assigned to active members.

OAuth2 Client Deregistration

  1. For confidential clients, deregistration requires client authentication.
  2. For public clients, only client owners (the users registered the clients) may request for deregistration.
  3. System admins should be allowed to deregister any clients.

Shibboleth Apache Configuration

Apache location for /Shibboleth.sso is ignored. All requests to /Shibboleth.sso are taken to Wordpress because it is set as DocumentRoot.

Support POST for statistics endpoint

When a corpus query is rewritten during search, it is often required to afterwards find out the size of the corpus actually searched. As the rewritten KoralQuery can't easily be transformed back into a VC collection query (that is what Kalamar does), clients have to rely on a POST method for statistics, accepting the VC as a KQ in the payload.

Virtual corpus reference

This description is obsolete. See the comment below.

Virtual corpus (VC) query definition can be complex and very lengthy. In the case of static VC, it can be stored as a static data in Krill. To access the data, Kustvakt should refer to the static-id, instead of sending the corpus query.

Setting up system admin

System admins have not been set up for the new database. Some changes in the database cannot be tested via web-services without system admin accounts, such as if auto-hidden groups are created after publishing a virtual corpus.

Import annotation descriptions to database

The annotation tags and values in the SQL file at /src/main/resources/db/insert are dummy data used for the implementation of description web services.

The annotation description data from Kalamar should be imported to the database.

Search virtual corpus

Logged in users can search for a virtual corpus by id, as well as per URL (pid).

Virtual corpora (VC) that can be searched are of type PREDEFINED or PUBLISHED. PROJECT VC can also be searched if the user is a member of the project group(s). VC creators/owners can always search for their VC.

When a user search for a PUBLISHED VC for the first time, he/she is added to an auto generated group associated to the VC. The PUBLISHED VC are only shown to the members of this group.

Automatic abortion of queries

Different mechanisms should help to stop avoid ressource-hungry processes in KorAP. Some are already in place in separate components, but they should further be evaluated in the context of our architecture.

Mechanisms that are useful:

  • using a user-specific timeout per search (already implemented in Krill)
  • using a memory-limit
  • catching http disconnections

(This was the result of a meeting with our colleagues from the BBAW.)

Collection Rewrite Approach

Collection rewrites attached an AND operation with availability values restricting resource access, to an existing collection query. Current approach checks a given collection query nodes that determines whether collection rewrite is needed or not. This approach is tricky and vulnerable to leak.

Another approach would be to always perform collection rewrite and normalize/optimize the collection query afterwards, similar to the boolean optimiser from managing gigabytes.

See https://github.com/KorAP/Krawfish-prototype/blob/master/lib/Krawfish/Koral/Util/Boolean.pm

Publish virtual corpora

Publishing a virtual corpus means making it available to all users. Unlike PREDEFINED VC, however, PUBLISHED VC are not listed for all users.

When a VC is published, a user-group of type HIDDEN are automatically generated by the system for the VC and an access of type HIDDEN is added for the user group to the VC. The PUBLISHED VC are only shown to the members of this group and its creator.

HIDDEN group means that members of this group cannot see the group itself.

Static Corpus Rewrite

Virtual corpus (VC) content can be categorized into static and dynamic. The number of docs of dynamic VC may change over time, while that of static VC must remain the same. To establish static VC, static corpus rewrite adds a restriction "publishDate" with value the date of VC creation, to the virtual corpus query.

Referring published VC in search

When a user refer a published VC in a search, e.g. "referTo marlin/published-vc", the user should to be added to the hidden group of the vc. Thus, the vc will be included in the list of available vc for that user.

Corpus licence with limited number of users

Kustvakt should handle the license that limit access to a corpus based on the number of users currently accessing it. e.g Süddeutsche Zeitung, Januar 1995 - Dezember 1999 has QAO-NC-LOC:ids-NU:1 license that limit access to the corpus to only one user at a time.

Solution: default VC rewrite for "all corpora access" should exclude corpora with "NU" keywords in the license.

Warn or error on ```page=0``` in the search API

Currently, an invalid parameter value for page will silently be treated as not set and therefore replaced by the default value for page page=1.

In Kalamar, parameter p=0 returns an error instead.

We may want to add an error and return an empty result set for invalid page values (which makes erroneaus API requests obvious to the user) or explicitely rewrite the page value and warn the user about that (which is not so obvious to the user though).

Search api returning metadata only

As described in KorAP/Krill#58:

Krill should support search queries returning only all metadata without match snippets, thus allowing search on all data without license restrictions.

Metadata should be return for every match regardless of redundancy.

This requires a change to the rewrite mechanism in Kustvakt as well. Once KorAP/Krill#58 is done, the implementation should follow the descriptions in the then closed Krill issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.