korap / kustvakt Goto Github PK
View Code? Open in Web Editor NEW:speedboat: User and policy management component for KorAP, capable of rewriting queries for policy based document restrictions.
License: BSD 2-Clause "Simplified" License
:speedboat: User and policy management component for KorAP, capable of rewriting queries for policy based document restrictions.
License: BSD 2-Clause "Simplified" License
Responses for "no result found" and "no resource found" cases should be differentiated. NO_RESOURCE_FOUND or RESOURCE_NOT_FOUND status code should be added and the response should be sent with HTTP 404 (not found).
User settings are supposed to be flexible and extensible. Thus, it is designed to be stored as JSON.
Kustvakt server starts up might take a long time due to VC caching depends on the VC size.
Kustvakt should provides a service to reset client secret.
While resetting a client secret, all users' access token should not be invalidated and should be kept active. To invalidate all users' access tokens, the client should be deleted instead.
See https://www.oauth.com/oauth2-servers/client-registration/deleting-applications-revoking-secrets/
When a user creates a user group (UG) , he/she is automatically becomes a UG admin and a virtual corpus access (VCA) admin. To add members to the group, a UG admin may sent invitations to some users by listing their usernames. These users may accept or reject the invitations.
If a user have rejected an invitation or he has been deleted from a group, a new invitation to the group can be sent to him/her.
Kustvakt should support OAuth2 authorization via PIN (or other kind of code) activation, for instance to facilitate login in "IDS Wortraumstation" using virtual reality (VR) tools. The authorization process involves two KorAP front-end instances, e.g. VR app and Kalamar for the browser. The general flow is:
To achieve this, two additional web-service are needed for:
This issue is related to KorAP/Kalamar#101.
Usergroup name should be unique and be part of URI instead of groupId.
Krill-indexer may be run separately from a Kustvakt server to add or update documents to an existing index. To recognize these changes in the running KorAP system, the Krill index reader within Kustvakt must be reloaded.
Kustvakt can accommodate this function through an API request to close the index reader. Krill automatically open the index reader next time the index is to be read.
When Krill-Index is updated, LeafReaderContext used as key reference in the VC cache, are changed. Thus, the cache should be updated, i.e. the predefined virtual corpora should be re-cached.
As an alternative to an auto-generated virtual corpus id, a unique combination of username and vcname can better represent a virtual corpus. This representation should be adopted in URI in place of VC id.
When sending wrong username or password info to /oauth2/token, the server response with:
HTTP/1.1 500 Server Error
Cache-Control: must-revalidate,no-cache,no-store
Content-Type: text/html;charset=iso-8859-1
Content-Length: 4654
Connection: close
Server: Jetty(9.4.z-SNAPSHOT)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /api/v1.0/oauth2/token. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NullPointerException
\x09at de.ids_mannheim.korap.web.OAuth2ResponseHandler.throwit(OAuth2ResponseHandler.java:90)
\x09at de.ids_mannheim.korap.web.OAuth2ResponseHandler.throwit(OAuth2ResponseHandler.java:83)
\x09at de.ids_mannheim.korap.web.controller.OAuth2Controller.requestAccessToken(OAuth2Controller.java:208)
\x09at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
\x09at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
\x09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\x09at java.lang.reflect.Method.invoke(Method.java:498)
\x09at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
\x09at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
\x09at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
\x09at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
\x09at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
\x09at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
\x09at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
\x09at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
\x09at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
\x09at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
\x09at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
\x09at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
\x09at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
\x09at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
\x09at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
\x09at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
\x09at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
\x09at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)
\x09at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
\x09at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
\x09at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
\x09at org.eclipse.jetty.server.Server.handle(Server.java:503)
\x09at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
\x09at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
\x09at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
\x09at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
\x09at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
\x09at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
\x09at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
\x09at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
\x09at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
\x09at java.lang.Thread.run(Thread.java:748)
</pre>
<hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>
</body>
</html>
Expected behaviour: Return a json object with error
and error_description
with a status 401 response.
Currently a VC is defined by a single cq
parameter throughout multiple endpoints. It would be great to support multiple cq
parameters everywhere to define a single VC with an "and"-conjunction (e.g. ?cq=pubDate%3D2009-12-06&cq=corpusSigle%3DWPD
becomes the VC pubDate in 2009-12-06 & corpusSigle="WPD"
).
The use case is to rewrite URLs to redirect from a separated instance (like /instance/example?...
) to another instance, that incorporates the corpus as a subcorpus by just appending the subcorpus definition as a (potentially additional) cq
parameter (e.g. /instance/main?...&cq=exampleCorpus
). By doing that, URLs of separated instances can be published and will keep working, even when the instance is removed, as long as the corpus is transfered to another instance and can be defined as a VC.
Adds refresh token and refresh request to the token endpoint.
Is it necessary/desirable for group members to be able to share their VC with the group?
Note: group member cannot see other members.
This function requires an approval from VC access admin.
If a registered user requests a search, then changes a connection to VPN afterwards and resends the search, the availability rewrite will result in
This bug would be resolved with a rewrite revert mechanism in Kalamar (see KorAP/Kalamar#65).
Add ID token to OAuth2 access token.
Users should be able to list all their apps having KorAP access tokens, and remove these accesses from KorAP's super client (Kalamar).
Data/attributes from Shibboleth session are to be parsed by Kalamar and sent to Kustvakt. In Kustvakt, these data along with user agreement of KorAP/DeReKo usage should be stored in the database. As a response, Kustvakt should create an authentication token similar to that of LDAP authentication and send it to Kalamar.
Member invitations to join a deleted user group should be disallowed.
For email notification of group-member invitations, we needs to fetch user-emails via LDAP.
When a virtual corpus is in the middle of a caching process, the statistics API referring to that VC will load for possibly a long time until the VC is fully cached. Kustvakt may instead sends a response with an error or a warning regarding the caching process.
Constraints for virtual corpus sharing:
Apache Oltu token issuer uses UUID in an inefficient encoding (redundant parsing from string to bytes and vice versa) and seems to be not quite secure.
Another token generation method is needed.
At the beginning of Kustvakt development, the Web-API had a version prefix /v0.1/
. This was removed without an alternative.
To make breaking API changes possible, versioning should be reintroduced.
There is some discussion (1, 2) whether different ressources is the right way to handle versioning, but I guess it's the easiest for maintenance and for the developer.
This is especially important now that OAuth and plugins should invite external developers to use our API.
When a group admin invite some users to a group, notifications should be sent to their email addresses (obtain user data via OAuth2) informing them to log in to KorAP in order to accept or reject the invitations.
Requests of document metadata should be restricted with user authentication in similar way to match-info requests. This includes if a document may be viewed and and which metadata fields can be presented with respect to the user authentication.
There has been a request filter implementation for this, but never been tested.
See https://developer.matomo.org/guides/tracking-introduction for implementation guide.
How should access to an authorized VC be handled?
For instance, user A would like to search in a private VC of user B.
Currently, Kustvakt will throw an unauthorized error.
Should it be more tolerant, e.g. by using rewrite to remove the VC instead?
Implement services to revoke a specific or all OAuth2 access token(s) belonging to a particular user.
Currently Kustvakt only has support for retrieving all metadata fields. It should also support retrieving a custom list of metadata fields. Moreover, the resulting fields should be sorted according to the order in the list.
This issue is suggested by @Akron.
Add services to allow user-group admins to assign and edit roles of user-group members. Roles can only be assigned to active members.
Spring config is loaded twice in the test suite because both Junit and Jersey need it.
Apache location for /Shibboleth.sso is ignored. All requests to /Shibboleth.sso are taken to Wordpress because it is set as DocumentRoot.
When a corpus query is rewritten during search, it is often required to afterwards find out the size of the corpus actually searched. As the rewritten KoralQuery can't easily be transformed back into a VC collection query (that is what Kalamar does), clients have to rely on a POST method for statistics, accepting the VC as a KQ in the payload.
Context "c" is recognized as token instead of char, e.g
http://localhost:8089/api/v1.0/search?q=Wasser&ql=poliqarp&context=5-c,5-c
results in
"context": { "left": [ "token", 5 ], "right": [ "token", 5 ] },
This description is obsolete. See the comment below.
Virtual corpus (VC) query definition can be complex and very lengthy. In the case of static VC, it can be stored as a static data in Krill. To access the data, Kustvakt should refer to the static-id, instead of sending the corpus query.
System admins have not been set up for the new database. Some changes in the database cannot be tested via web-services without system admin accounts, such as if auto-hidden groups are created after publishing a virtual corpus.
The annotation tags and values in the SQL file at /src/main/resources/db/insert
are dummy data used for the implementation of description web services.
The annotation description data from Kalamar should be imported to the database.
Logged in users can search for a virtual corpus by id, as well as per URL (pid).
Virtual corpora (VC) that can be searched are of type PREDEFINED or PUBLISHED. PROJECT VC can also be searched if the user is a member of the project group(s). VC creators/owners can always search for their VC.
When a user search for a PUBLISHED VC for the first time, he/she is added to an auto generated group associated to the VC. The PUBLISHED VC are only shown to the members of this group.
Different mechanisms should help to stop avoid ressource-hungry processes in KorAP. Some are already in place in separate components, but they should further be evaluated in the context of our architecture.
Mechanisms that are useful:
(This was the result of a meeting with our colleagues from the BBAW.)
Describe user group related web-services in the wiki.
Collection rewrites attached an AND operation with availability values restricting resource access, to an existing collection query. Current approach checks a given collection query nodes that determines whether collection rewrite is needed or not. This approach is tricky and vulnerable to leak.
Another approach would be to always perform collection rewrite and normalize/optimize the collection query afterwards, similar to the boolean optimiser from managing gigabytes.
See https://github.com/KorAP/Krawfish-prototype/blob/master/lib/Krawfish/Koral/Util/Boolean.pm
Publishing a virtual corpus means making it available to all users. Unlike PREDEFINED VC, however, PUBLISHED VC are not listed for all users.
When a VC is published, a user-group of type HIDDEN are automatically generated by the system for the VC and an access of type HIDDEN is added for the user group to the VC. The PUBLISHED VC are only shown to the members of this group and its creator.
HIDDEN group means that members of this group cannot see the group itself.
Virtual corpus (VC) content can be categorized into static and dynamic. The number of docs of dynamic VC may change over time, while that of static VC must remain the same. To establish static VC, static corpus rewrite adds a restriction "publishDate" with value the date of VC creation, to the virtual corpus query.
When a user refer a published VC in a search, e.g. "referTo marlin/published-vc", the user should to be added to the hidden group of the vc. Thus, the vc will be included in the list of available vc for that user.
Kustvakt should handle the license that limit access to a corpus based on the number of users currently accessing it. e.g Süddeutsche Zeitung, Januar 1995 - Dezember 1999 has QAO-NC-LOC:ids-NU:1 license that limit access to the corpus to only one user at a time.
Solution: default VC rewrite for "all corpora access" should exclude corpora with "NU" keywords in the license.
Currently, an invalid parameter value for page
will silently be treated as not set and therefore replaced by the default value for page page=1
.
In Kalamar, parameter p=0
returns an error instead.
We may want to add an error and return an empty result set for invalid page values (which makes erroneaus API requests obvious to the user) or explicitely rewrite the page value and warn the user about that (which is not so obvious to the user though).
As described in KorAP/Krill#58:
Krill should support search queries returning only all metadata without match snippets, thus allowing search on all data without license restrictions.
Metadata should be return for every match regardless of redundancy.
This requires a change to the rewrite mechanism in Kustvakt as well. Once KorAP/Krill#58 is done, the implementation should follow the descriptions in the then closed Krill issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.