Comments (12)
please consider also this https://blog.r-hub.io/2021/07/30/cache/
from relter.
Two points on this issue.
The caching mechanism could be better implemented in the get_id function. This requires some work for managing internal and system variable and exactly define the flow. I stopped myself on this way after first steps in a branch on my fork. The point is that this sort of local cache could be useful to speed up one's work, but it does not fix the current issue,
e.g. during workshops.
A different approach would be that of implementing a reverse-proxy with caching capabilities. I installed on CNR-IREA internal swarm server a version (with some customization) of the docker image from https://github.com/djmaze/docker-caching-proxy.
I made some benchmark by setting the ReLTER::set_deims_base_url("") and invoking ReLTER::get_ilter_observedProperties(sitesNum = 100) and first results seem promising (50% time of the first -not cached- request).
I'm not an nginx expert and I'm sure it should be fine tuned, but it seems to me a possible strategy to ameliorate the workshop scenario.
from relter.
Hi Paolo:
Thanks for revisiting this. Maybe I am not following, but what is the use case for caching queries? Do we expect multiple queries from one user for the same data? This might happen in a workshop framework, as you mention, but that's a special case, no?
In a broader view, the audience for the ReLTER package would be researchers working on a particular site. They will generally get the site metadata from DEIMS once, then go on to search for real data, elsewhere. Is that your view also?
If that is indeed the intention, then why do we need, i.e. get_ilter_generalinfo()
to return a full list of all sites on DEIMS? (> 1000 sites). Why should we allow that query to run at all without at least a country_name
or a site_name
??
And more importantly, following Christoph's request, we need to always use his API to get site data and never request the whole list of sites. Unless I am missing something, we have not yet implemented that. If I am wrong, please correct me!
from relter.
Hi,
I agree with Micha as I struggling to see the use case here. I think that if we optimise the way the API is queried as well as add additional filter options for the API, then we should have covered most (potential) performance issues.
Setting up a dedicated caching layers seems to be a little too much. Most users are probably not interested in querying all site records and their contents at once.
We can, of course, discuss this during the upcoming eLTER Software workshop.
Best,
Christoph
from relter.
Dear @ptagliolato, @micha-silver and @stopopol.
The caching mechanism is not only for the large query but for improve the performances of all the queries from those dedicated to a single site (get_site_info()) to those dedicated to a network (get_network_observedProperties(), get_network_sites()). I thinking, for example, to have 100 simultaneous calls to obtain information from the list of sites of the LTER-Italia network.
The function get_ilter_generalinfo() was designed to get an overall idea of the global network and to answer network management questions (e.g. what is the altimetric distribution of sites in the ILTER network, what is the distribution of sites with respect to biogeographical regions, etc.). These queries will be made with low temporal frequency and during the training could be made simultaneously by a high number (20?) of people. In both cases, caching could be a good solution.
If we feel that the functions dedicated to ILTER are not important, we can also delete them, but before the workshop we cannot touch them.
from relter.
I've just had a look at the get_ilter_generalinfo function, but the description is not entirely correct (https://docs.ropensci.org/ReLTER/reference/get_ilter_generalinfo.html). There are only ~750 ILTER sites. If the function returns much more than that, it's actually querying all of DEIMS. Looking at the source code (https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_generalinfo.R) the function is actually doing that. The query parameter should be something like this: https://deims.org/api/sites?network=1aa7ccb2-a14b-43d6-90ac-5e0a6bc1d65b&verified=true
EDIT: This also applies to the other ILTER functions, e.g. https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_observedProperties.R
The functions are querying all of DEIMS instead of only querying the ILTER sites. Please update these functions.
EDIT 2: From what I can see, the functions also don't capitalise on the filter functionality of the API. The API can be filtered by site name (https://deims.org/api/sites?name=lago) and country (https://deims.org/api/sites?country=it).
from relter.
@stopopol Last commits (see #83) exploits the DEIMS API filter functionality you suggested (get_ilter_generalinfo). Moreover: it is now mandatory to set at least one of the two filters; the function now avoids any other request per-site (with the exception of cases where the first response from /sites/? returns only 3 sites, this is for temporary compatibility of old tests). I hope this solves the issue on the first function. Other get_ilter* functions must be modified or discarded (their status "questioning" accounts for this).
from relter.
@ptagliolato : mentioning here so I won't forget :-)
I think we need a function similar to get_ilter_generalinfo()
to query for networks. IIUC, There is, so far, no way to get the DEIMS ID of a network. Maybe the same function with an additional parameter?
from relter.
In order not to do the same mistake in communication, I would involve @stopopol and ask him for an appropriate API to retrieve network-only-related info (networkId, name, related networks for example). We can further discuss this online.
from relter.
@ptagliolato Good idea. But it will be after Lyon in any case.
from relter.
We can discuss feature requests in detail in Lyon or other dedicated (virtual) meetings. In principle, if you need additional filter and query functionality for the rest-api I'll be happy to implement but it depends on the field/entity type. networks should be more straightforward to do but there are different ways to realise it.
from relter.
@stopopol Looking at the workshop agenda, the best window for a meeting among us will be Wednesday.
from relter.
Related Issues (20)
- change the logo of the package
- pkgdown build error HOT 53
- Release ReLTER 1.2.0
- Remove dependency on RCurl and XML
- Normalize DESCRIPTION HOT 1
- Resources on dependencies limitations HOT 4
- Viewing maps HOT 1
- Size of repo - figures...
- get_site_MODIS doesn't seem to use credentials from setenv
- PhenoAppLyon
- taxon_id_pesi
- rm docs folder HOT 4
- remove rgeos from description file HOT 2
- What does Rdpack::reprompt() do? HOT 2
- error when knitting sites_contact.Rmd HOT 4
- No need to load testthat in individual test files HOT 1
- suggestion: use the withr package HOT 4
- Remove dependency on RCurl HOT 3
- A package that might be relevant for ReLTER
- Actions fail for no CRAN package (MODIStsp) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from relter.