Invenio Resources module to create REST APIs
TODO: Please provide feature overview of module
Further documentation is available on https://invenio-records-resources.readthedocs.io/
Records REST APIs for Invenio.
Home Page: https://invenio-records-resources.readthedocs.io
License: MIT License
Invenio Resources module to create REST APIs
TODO: Please provide feature overview of module
Further documentation is available on https://invenio-records-resources.readthedocs.io/
(maybe this doesn't need to be configurable if ES caps at 10000)
This could be configured in invenio-records-rest, so maybe we want to be able to configure this still.
(honestly I am not a particularly big proponent of this)
The endpoint /api/rdm-records?from=2
should be possible with the new API like it was with invenio-records-rest in legacy API.
Look at what from
was doing starting from (hehe): https://github.com/inveniosoftware/invenio-records-rest/blob/32d220507c8af870a495a0540d1c7ec3e4a6dbbf/invenio_records_rest/views.py#L460
What is the difference between the
a) api.py Record,
b) the controller.py's RecordController and
c) the resources/record.py's RecordResource ?
I mean Record abstracts DB+ES record and adds a PID to it and RecordResource plays the role of the Flask-RESTful equivalent resource... but then what does the controller do? Record already knows about DB, ES... Like in the frontend, duplication of exact names is an indicator for me that we haven't settled on the responsibilities and their location.
Aggregation support from invenio-records-rest should be ported here.
We should normalize the way we generate links across the codebase. The main rule is: The business logic layer should not be tied to the presentation layer aka views.
Here is a list of places that break the above rule:
create_pid_redirected_error_handler
is trying to build a redirect link based on a view URL i.e using url_for
.As a developer, I would like to easily extend the search engine with new features, so that I don't have to write a lot of code.
For instance, I want to add spatial querying.
The developer should not have to overwrite the service nor the resource methods. Just provide options in the configs.
URL parsing is already done in Flask-Resources. The search engine should have a query evaluator/interpreter that is passed all arguments (q, sort, page, pin, size, communities, ...). It will then "interpret/evaluate" the parameters by creating the Elasticsearch DSL object.
problem
It is not clear how the reponse building should treat links. The problem arises from the fact the needed attributes are not available in the places where it is needed (controller?)
context
I think these methods' responsibility is to wrap things up in Flask responses... but they also seem to be serializing with that links and prettyprint is that normal?
@ppanero prettyprint is part of the flask reponse. It specifies the tabbing.
The links, I assumed it would be part of the response no matter the type of serialization. Do we think that sort of answer (no links) would be used/needed?
Once this question is answered, we get into the separation of concerns. There we need to ask ourselves: Are links part of the record that goes in the response, or part of the response per se. IMHO, the second one, since is part of the REST paradigm.
See api.py#81
metadata = record_hit["_source"]
revision_id = record_hit["_version"]
# TODO: Py <3.7 cannot handle : in the offset.
# Cannot use %Y-%m-%dT%H:%M:%S.%f%z
updated = parser.parse(metadata["_updated"])
created = parser.parse(metadata["_created"])
old code:
return json.dumps(
dict(
hits=dict(hits=serialized_records_list, total=total),
# TODO FIX global links
links="",
aggregations=aggregations,
),
**self._format_args(response_ctx.get("prettyprint", False))
)
Current
Tests are not being cleaned correctly: indexes are not created/deleted properly. A fake index must be setup and base_app
must be used rather than app
in the fixtures.
Previous
The get (search) endpoint is failing because the fetcher can't retrieve 'recid'
key from returned data
:
invenio_resources/service/service.py:158: in search
pid = cls.fetcher()(data=record, record_uuid=None)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
record_uuid = None
data = {'_access': {'metadata_restricted': False, 'files_restricted': False}, '_owners': [1], '_created_by': 1, 'access_right...d title', 'type': 'Other', 'lang': 'eng'}], 'publication_date': '2020-07-08', '_publication_date_search': '2020-07-08'}
def recid_fetcher(record_uuid, data):
"""Legacy way to fetch a record's identifiers.
:param record_uuid: The record UUID.
:param data: The record metadata.
:returns: A :data:`invenio_pidstore.fetchers.FetchedPID` instance.
"""
pid_field = current_app.config['PIDSTORE_RECID_FIELD']
return FetchedPID(
provider=RecordIdProvider,
pid_type=RecordIdProvider.pid_type,
> pid_value=str(data[pid_field]),
)
E KeyError: 'recid'
.venv/lib/python3.6/site-packages/invenio_pidstore/fetchers.py:67: KeyError
Needs investigation.
As a developer installing InvenioRDM, I should be able to configure certain aspects of the Resource and Service without jumping through hoops. For instance, it should be possible and documented how to change the sorting or faceting of bibliographic records (this module should provide the capacity to do so).
In an instance's invenio.cfg, it should be possible to do:
class MyBibliographicRecordResourceConfig(BibliographicRecordResource):
# my overrides
RDM_RECORDS_BIBLIOGRAPHIC_RECORD_CONFIG = MyBibliographicRecordResourceConfig
This is related to #92 .
Should be able to configure permission back as well.
Each record of the search result hits, should have their individual links serialized as well.
how to inject into serialized (stringified) records?
Package version (if known): 0.4.X
When performing a search on /rdm-records
the returned hits have null values for updated
and created
fields.
/rdm-records
"created": null,
"updated": null,
These should have the UTC string of the respective updated nad created fields.
This breaks the search because there is string manipulation of the dates but they are javascript null
at this point. The use of lodash's _.get()
was not enough to prevent this problem because lodash tests on undefined
(not a present field of value null
).
Package version (if known): master (0.6.x)
New links generation
test_pagination
/test_resource_pagination
or test_faceting
Generated links always have: page, size, sort. Optionally have aggregations if passed in the first place
In https://github.com/inveniosoftware/invenio-records-resources/blob/b9ba4bd9241c20fa039466825f136b44e15f2715/invenio_records_resources/errors.py only the handler is used (?) the other two exceptions are never used.
Add black to the run-tests.sh
so it is checked or remove black linting.
Our approach so far makes it very easy for a module like invenio-rdm-records to define its own BibliographicResourceConfig or BibliographicServiceConfig that (eventually) inherit from the config defined here in records-resources. However this approach doesn't take into account the use case where an instance might want to further customize these configuration objects.
For instance, changing the API urls is not possible per instance.
This is tricky to accomplish because .as_blueprint()
is called at the top-level in views.py
outside of an application context...
Potential solutions:
EDIT
ResourceConfig
can be defined in the config.py and fetched with an application context. (I think).The service layer needs to know about the UI routes in order to provide links to HTML page of records.
Right now it assumes the UI route is the API route without the api/
part which is obviously not always the case.
Related to #45
It would be nice to have the same interface for a Record coming from the DB and one from ES because then
Currently invenio_records_resources.resources.record:RecordResource
has two functions:
def _make_item_body(self, item_result):
"""Make the body content."""
# Resolve links
item_result.links.resolve(config=self.config.links_config)
# Create the result
res = item_result.record
# Add errors if present
if item_result.errors:
res['errors'] = item_result.errors
return res
def _make_list_body(self, list_result):
"""Make the body content for a list item."""
return {
"hits": {
"hits": [
self._make_item_body(record)
for record in list_result.records
],
"total": list_result.total
},
"links": list_result.links,
"aggregations": list_result.aggregations
}
Which are used for instance like this:
def read(self):
"""Read an item."""
item = self.service.read(
id_=resource_requestctx.route["pid_value"],
identity=g.identity
)
return self._make_item_body(item), 200
Key questions arising from this:
read()
resource method not returning a response?_make_item/list_body
need to transform the data again?Currently we don't return a response in read()
because we want to avoid repeating two lines that do the actual serialization in flask_resources.responses.Response
.
Perhaps it makes sense that in-fact a resource has more control over the serialization?
If the service results (RecordItem
, RecordList
) looked differently, they could perhaps be returned directly instead? Only the link resolving needs to be run by the resource (because the service doesn't know about the URLs).
RecordItem
A record item could likely be reduced to simply:
RecordList
Would have to mimick the structure that _make_list_body
creates.
How to filter fields according to permissions.
A user might have access to restricted files or extension metadata, but other might not. however, the second might still have access to metadata. How do we filter that without leaking info (e.g. checksums, etc.).
old code:
def process_metadata(self, record, response_ctx, *args, **kwargs):
return (
copy.deepcopy(record.replace_refs())
if self.replace_refs
else record.dumps()
)
The current ConfigLoaderMixin does not allow for running outside the Flask application context because it is accessing the Flask application configuration.
This means you have to do this:
r = Resource(config=ResourceConfig, service=Service(config=ServiceConfig))
app.register_blueprint(r.as_blueprint('records'))
but the config is already defined as default config:
class Resource:
default_config = ResourceConfing
class Service:
default_config = ServiceConfing
Instead, you should just do like this:
r = Resource(service=Service())
app.register_blueprint(r.as_blueprint('records'))
Zenodo supports aggregations (done), filters, post_filters (done). Filters are left to support.
Question:
What is the use case for filtering? It seems to be used for communities, but it is unclear to me what the general problem it is trying to solve.
Following user-stories are basically about filtering record metadata according to what a given identity can access.
User stories
As an uploader/admin, I want to see the full metadata including files of a closed access record
As a visitor, I want to see the full metadata excluding files of a closed access record
Example
When getting a record:
GET /record/12345-12345 HTTP/1.1
The service layer will do the following actions:
The presentation layer will then afterwards:
Considerations
Examples of filtering metadata
_files
from the metadata (closed access record - meaning a record that's publicly visible, but where the files are restricted).See many comments and problems identified in inveniosoftware/flask-resources#15. They were not addressed there because it should not be invenio-tight.
From base.py
to flask-resources/context.py
.
Syntactic sugar: isn't it nicer to have minter, data_validator as properties in the RecordService class so you can write e.g self.data_validator.validate(data)?
Serialization / Deserialization / Marshmallow
It dawns on me that marshmallow is both a record model (the fields + validation), a record serializer (json.py!), a loader (expects this shape)... There is a murkiness here, we might want to clear up by decree. We can say: here is what marshmallow will do for us and the things it won't.
When we get rid of the dependency to invenio-rest
we should apply an application errorhandler for the HTTPException. E.g
from flask_resources.errors import handle_http_exception
from werkzeug.exceptions import HTTPException
def init_app():
...
app.register_error_handler(HTTPException, handle_http_exception)
...
The current sep
branch is having test failures on Elasticsearch 6.
See Travis
The search URL should accept sort
as a URL args and sort results appropriately.
Example: https://localhost/api/records/?sort=mostrecent&size=10&page=2
The search URLs with sorting are now:
Browser: https://127.0.0.1:5000/search?q=Richard%20smith&sort=bestmatch&order=asc&p=1&s=10&l=list
API: https://127.0.0.1:5000/api/rdm-records?q=Richard%20smith&sort=bestmatch&page=1&size=10
Note how this is different from what it used to be:
Browser: https://127.0.0.1:5000/search?q=Richard%20smith&sort=bestmatch&order=desc&p=1&s=10&l=list
API: https://127.0.0.1:5000/api/rdm-records?q=Richard%20smith&sort=-bestmatch&page=1&size=10
In other words: the meaning of sort=bestmatch
on the API changed from meaning "lowest best match to highest best match" to mean "highest best match to lowest best match" just like sort=mostrecent
means "highest updated date to lowest updated date".
Do we keep this change, or try to reproduce what Invenio-records-rest had?
I made those changes (target me ๐ฏ) because ascending / descending logic didn't seem to make sense for both mostrecent and bestmatch. My take is that bestmatch and mostrecent imply an ordering, so "order" should be used to reverse that order (or keep it the same)... The goal was also to get something working for the release quickly. But I don't mind trying to keep same logic as invenio-records-rest (some more work in records-resources needed to achieve that though).
Package version (if known): v0.6.2
Search preference is not enabled and it could potentially lead to bouncing effect.
services/records/service.py::search()
It was disabled due to the lack of user-agent
and ip
related information. This info should be extracted and set in the resource_requestctx
, then the resource should pass it to the service.
The old search object uses the request
object https://github.com/inveniosoftware/invenio-search/blob/81f4d23a1f24fe0d1c2fb7b1a9c64fd7870daed3/invenio_search/api.py#L150 it should be migrated in a way that is not accessed and just passed https://github.com/inveniosoftware/invenio-search/blob/81f4d23a1f24fe0d1c2fb7b1a9c64fd7870daed3/invenio_search/api.py#L270
Values should be accessed at the resource level and passed to the service search? (maybe is the service who calculates the preference (hash) based on what the resource passes.
Why:
When a PidDeletedError is raised, a record
is associated with the error. We should be able to serialize somehow this record in the error response so users can return any information they want. We need to investigate how to serialize this response.
In Invenio-Records-Resources and Invenio-Drafts-Resources.
/records
: GET, POST/records/:id
: GET, DELETE/records/:id/versions
GET, POST/records/:id/files
: GET/records/:id/files/:key
GET/records/:id/files/:key/download
GET/user/records
GET/records/:id/draft
: GET, POST, PUT, DELETE/records/:id/draft/actions/:action
: POST/records/:id/draft/files
GET, POST, PUT, DELETE/records/:id/draft/files/:key
: GET, PUT, DELETE/records/:id/draft/files/:key/download
GETCurrently, with an index receiver, you're able to enrich/filter a record prior to having it indexed in Elasticsearch.
Currently, this is being used to e.g.:
Generating the data that will be serialized in the case of pagination or hypermedia API links (link generation) is business logic. How this data is serialized is serializer logic. This was apparent by XML serializer wanting to reuse business logic used by JSON serializer.
Where to place this business logic is not clear cut. Either in the resource unit or in the service...
Migrate the existing Elasticsearch completion suggester from Invenio-Records-REST:
class SuggestResource(MethodView):
"""Resource for records suggests."""
view_name = '{0}_suggest'
def __init__(self, suggesters, search_class=None, **kwargs):
"""Constructor."""
self.suggesters = suggesters
self.search_class = search_class
def get(self, **kwargs):
"""Get suggestions."""
completions = []
size = request.values.get('size', type=int)
for k in self.suggesters.keys():
val = request.values.get(k)
if val:
# Get completion suggestions
opts = copy.deepcopy(self.suggesters[k])
# Context suggester compatibility adjustment
if 'context' in opts.get('completion', {}):
context_key = 'context'
elif 'contexts' in opts.get('completion', {}):
context_key = 'contexts'
else:
context_key = None
if context_key:
ctx_field = opts['completion'][context_key]
ctx_val = request.values.get(ctx_field)
if not ctx_val:
raise SuggestMissingContextRESTError
opts['completion'][context_key] = {
ctx_field: ctx_val
}
if size:
opts['completion']['size'] = size
completions.append((k, val, opts))
if not completions:
raise SuggestNoCompletionsRESTError(
', '.join(sorted(self.suggesters.keys())))
# Add completions
s = self.search_class()
for field, val, opts in completions:
source = opts.pop('_source', None)
if source is not None and ES_VERSION[0] >= 5:
s = s.source(source).suggest(field, val, **opts)
else:
s = s.suggest(field, val, **opts)
if ES_VERSION[0] == 2:
# Execute search
response = s.execute_suggest().to_dict()
for field, _, _ in completions:
for resp in response[field]:
for op in resp['options']:
if 'payload' in op:
op['_source'] = copy.deepcopy(op['payload'])
elif ES_VERSION[0] >= 5:
response = s.execute().to_dict()['suggest']
result = dict()
for field, val, opts in completions:
result[field] = response[field]
return make_response(jsonify(result))
Suggestion for service API could look like this:
grant_service.suggest('<value>', 'suggest-conf-name', context={'funder': '...'})
Package version (if known): latest master - v0.6.0
When creating a record, the record.commit
operation sets the record.model.version_id
to 2, consequently the record.revision_id
's value is 1. The increment happens when merging the session (https://github.com/inveniosoftware/invenio-records/blob/master/invenio_records/api.py#L408). There is another operation performing potential operations over the record.
tests/services/test_service.py::test_simple_flow
. Adding assert item.revision_id == 0
after the creation line 38. Can be seen in the travis build of #112revision_id
should be 0.
Tried deactivating components. For example the AccessComponent
performas a record.update
operation. My thinking was: This might flag as modified but not commit, then when merging there are two operations that are flushed. It was not the case.
It is part of json.dumps
but the serializer should not care about the response formatting
old code:
@staticmethod
def _format_args(prettyprint=False):
"""Get JSON dump indentation and separates."""
if prettyprint:
return dict(indent=2, separators=(", ", ": "),)
else:
return dict(indent=None, separators=(",", ":"),)
Example of implementation in flask-resources
, it was removed from there because no default class needed to implement it.
class SerializableMixin:
"""Serializable Interface.
Objects that will be serialized must implement it.
"""
@property
def object():
"""Returns the object itself."""
raise NotImplementedError()
@property
def id():
"""Returns the object id."""
raise NotImplementedError()
@property
def version():
"""Returns the object version or revision."""
raise NotImplementedError()
@property
def last_modified():
"""Returns the date of the last modification."""
raise NotImplementedError()
invenio-records-rest supported a "from" query string parameter, but it seemed to be for legacy purposes. As such we didn't implement it in the new invenio-records-resources search functionality. Is that correct or should "from" be supported and if so what should its meaning be?
In the new implementation of resources
,services
, etc. there are a lot of config classes passed that they require specific attributes to be set. It would be useful to validate these classes and throw appropriate errors to users and avoid having implicit errors that can distract us from finding the source of the problem.
A solution could be a basic metaclass that these config classes are inheriting from. That metaclass has access to all attributes of the subclasses, thus it could run validation on them
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.