Comments (11)
+1, my vote would be a boolean setting to enable ERDDAP to use the values of X-Forwarded-Host
, X-Forwarded-Port
, and X-Forwarded-Proto
to build response URLs, fall back to the Host
header if not present, and fall back again to the baseUrl
setting if no headers are present.
from erddap.
Looking at other web frameworks, this is what they typically do:
- Generate relative URLs (relative to site root, so starting with /) by default
- Allow methods that generate URLs to request an absolute URL instead (e.g. absolute=TRUE) to override this where it can be useful (notably where the URL is going to be used outside of a web context, like in making an XML metadata document for instance).
- Allow ERDDAP installations to configure their default values for absolute URL construction in setup.xml (this will provide defaults for absolute URLs outside of the context of an HTTP request, like when generating the flag URLs in the Daily Report)
- Rely on request headers and validated X-Forwarded-* (and maybe the new Forwarded header) to override those default values in setup.xml. Of note, Tomcat provides RemoteIPValve which provides some of this functionality already and it is probably best to use it since it will also help the Tomcat logs have the proper values.
An ERDDAP server that is public facing is then probably best off generating relative urls and using the forwarded headers.
A private ERDDAP server that should redirect links to a public server is probably behind a proxy. The proxy server can then set appropriate X-Forwarded-* headers to ensure the generated links go to the public server (e.g. send X-Forwarded-Host: public-erddap.example.com).
Upside, non-proxied requests to erddap will then work fine as well and generate appropriate relative URLs.
I actually might have some space to do a first pass at a patch for this (it's impacting my own operations in different way). In addition, I'd update how ERDDAP gets the client IP address to use the same header scheme. I'd propose a configuration like
<!-- setup.xml -->
<!-- for backwards compatibility -->
<baseUrl>http://myerddap.example.com:8080</baseUrl>
<baseHttpsUrl>https://myerddap.example.com:8443</baseHttpsUrl>
<!--
These are the defaults for absolute URLs that are generated where we DON'T have a user request or the user request
is missing information.
If the block is omitted, we can pull from baseHttpsUrl to determine the defaults. Https is chosen since, if you have SSL enabled, using it is never bad but not using it can be bad.
-->
<urlSettings>
<host>myerddap.example.com</host>
<protocol>https</protocol>
<prefix>erddap</prefix> <!-- since you can move the ERDDAP WAR and change the prefix, we should define that here? -->
<port>8080</port>
</urlSettings>
// pseudo-code for URL building
public String make_url(uri_relative_to_prefix, HttpServletRequest request =null, bool make_absolute_url = False) {
// Defaults are from the urlSettings configuration
host = urlSettings.host;
protocol = urlSettings.protocol;
prefix = urlSettings.prefix;
port = urlSettings.port;
// Check and override with the HTTP request values (we can use RemoteIPValve to override all of these)
if (request !== null) {
host = request.getServerName();
protocol = request.getScheme();
port = request.getServerPort();
prefix = request.getContextPath();
// Of note, we can get the remote address like this
remote_addr = request.getRemoteAddr() ;
}
else {
make_absolute_urls = True; // we need to make absolute URLs in this case since we have no way to understand relative URLs
}
relative_uri = "/" + prefix + "/" + uri_relative_to_prefix;
if (make_absolute_urls) {
# Should probably do some sanitization here
return protocol + "://" + host + ":" + port + relative_uri
}
else {
return relative_uri;
}
}
RemoteIPValve will support using X-Forwarded-For and X-Forwarded-Proto then to get the Protocol and Remote Address. I think Server Name and Port should be fine if your proxy sets a proper Host header when proxying (to test). Overriding ContextPath might be harder and may need to be manually implemented - some Java frameworks have support at some versions and some don't, but I would propose X-Forwarded-Prefix or X-Forwarded-Path (instead of X-Script-Name) for consistency.
from erddap.
There is code that relies on the baseUrl being set, and using '/.' will likely cause strange behavior, so I agree that's not a good solution. Using relative URLs (or the headers) should be possible, though it will require updating every place that generates URLs individually.
from erddap.
The original decision to use absolute rather than relative urls was chance, but I think there are a few places where the absolute urls were very advantageous (sorry, I can't think of where).
If you decide to switch, it should be easy (famous last words), because all uses of baseUrl and baseHttpsUrl are funneled through 1 or 2 methods in EDStatic.
If you switch to relative, one situation to be aware of is all the places where private info is being passed (signing up for subscriptions, logging in, etc). These places rely on a method in EDStatic that pushes people to https.
from erddap.
I thought of one of the situations where absolute URLs are an advantage: in a grid of ERDDAPs https://coastwatch.pfeg.noaa.gov/erddap/download/grids.html#grids, this allows the secondary (not officially public) ERDDAPs to push users back to the main, public ERDDAP.
I think there are other cases where absolute urls are useful, but can't remember.
So maybe the best solution is to support both absolute and relative urls. This could be decided by the administrator by either supplying (for absolute) or not supplying (for relative) the baseUrl and baseHttpsUrls. Since the construction of the start of the URLs are handled by 1 or 2 methods in EDStatic, a few small changes there should allow for both options. That has the minor secondary benefit of leading to no change for administrators (unless they want relative urls) and no change for users (e.g., if any are screen scraping URLs).
from erddap.
Ah, I'd forgotten about our friend RemoteIPValve
. We could pretty easily add it into server.xml in the the docker-erddap image here:
https://github.com/axiom-data-science/docker-erddap/blob/main/update-server-xml.sh
Re: the context path, while I like the idea of respecting a X-Forwarded-Path
header I think ERDDAP may make assumptions that it's served at /erddap/
in other places and may break if that assumption isn't true, others can correct me.
from erddap.
@srstsavage is correct: in several places, ERDDAP code makes the assumption that the server is installed at /erddap/. Among other reasons, this assumption makes it easy to identify an ERDDAP URL (e.g., a data source that is a remote ERDDAP).
from erddap.
@turnbullerin, I don't think your use of will work because I think (sorry, it's hard for me to check right now to be certain), that the setup.xml file is read and processed first by turning it into a set of key=value pairs. So using a nested xml construct would require changing all that code (not impossible, but extra work).
And I warn again to everyone who blithely assumes they can easily make significant changes: it may seem easy to make a change (e.g., absolute to relative URLs) and for a "first pass" (as you say) it is, but that ignores all the little places scattered throughout the code that will suddenly become bugs that you won't catch (and which you won't see until various things stop working and people start reporting bugs or just start thinking that ERDDAP is yet another lame, buggy program). And it ignores the fact that there may be situations where the current system has advantages. So again my advice to Chris is: if you decide to make these changes, do them in a way that ensures that the current system can continue to be used with absolutely no changes to the way the current system works, but allow this optional system for people who want to experiment with it (if you choose to support it). But it is up to Chris (or whoever makes the changes which Chris will choose whether or not to accept).
I also want to caution against making changes just because something is now in fashion or "what other frameworks do". Note that netcdf-java development has basically gone sideways (few new features) for the last ~5 years as the code has been refactored and modernized. In this process, all existing code that used the netcdf-java library has been broken (probably a couple of times with different releases) and numerous bugs have been introduced in netcdf-java. It's a mess, and I pity the new developers who are trying to get everything working again. Refactoring should never be undertaken lightly. Bugs are horrible things. They can ruin a software package's reputation. If you refactor, you need to be 100% certain you're not introducing bugs.
I think many people are used to working on small programs (a couple hundred lines), where, if you make a mistake during refactoring, it is easy to find and fix. ERDDAP is now a large complex program. I'm not saying you shouldn't request/make changes like this. I'm just saying that it's a good idea to be far more cautious.
But that's advice from a (re)tired old guy.
from erddap.
Ah, right, another reason to keep setup.xml
flat is that we'll want to have all the settings assignable via ERDDAP_
environment variables, so maybe <urlHost>
, <urlPort>
, etc which map to ERDDAP_urlHost
, ERDDAP_urlPort
, etc or similar would work.
That's a fair and useful warning about the complexity of the codebase and using caution when making changes. I'll offer that this is exactly the kind of thing that a standard, comprehensive jUnit or similar test suite helps with. If the tests run out of the box in any environment and have good coverage, developers can feel more confident that their changes don't introduce unintended bugs because all expected functionality is tested. And obviously this is greatly helpful when upgrading dependencies, Java runtimes, etc. Working up such a suite would obviously be a large undertaking but it seems worthwhile to me if we want to attract a community of contributors.
from erddap.
Your comments about tests (especially about being more confident that changes causing bugs) are correct. Note that ERDDAP already has a huge number of tests (although not in jUnit). Those tests have uncovered unexpected problems related to code changes numerous times. I think Chris is evaluating moving the tests to some framework (Maven?). Although I tried to make them all easy to run in any environment (largely via the erddapTest and erddapBigTest directory), there are still a lot of tests dependent on other files or other things (e.g., ERDDAP, Postgres and Cassandra running on the local computer). And many tests occasionally fail because some remote dependency (e.g., a THREDDS or ERDDAP, or a given dataset in them) is temporarily unavailable. And many speed tests fail on a busy computer (e.g., when running lots of tests) but pass on a not busy computer. Basically, it wasn't and still isn't easy to make all of the tests easy for other people to run. It's a very worthy project, but definitely a time-consuming project and one that will be difficult or impossible to complete because ERDDAP interacts with so many types of live external services and changing datasets. It is well worth continuing to work towards making it easy for any developer to run the 80% of the tests that can be easily run.
That said, I think it would be a mistake to go with the common policy of assuming that the tests will catch all problems and that if all of the tests pass then the change can be accepted. ERDDAP is big and complex. There are a lot of subtle details. It's really hard to make the test coverage so complete that it catches all bad consequences of any possible change (e.g., are you going to test every generated URL on every type of web page?). I think it is much better in general to only make changes where you are pretty darn sure you fully understand all of the consequences and then use the tests as a backup system. If that isn't possible, maybe the change is not worth the risk. I know that is not how things are commonly done now but it still seems like the right way to do things. Again, it would be really bad if ERDDAP devolved into yet another buggy, unreliable program.
Again, that's my opinion. Chris is the person who has now accepted the responsibility for deciding which changes are accepted. He's the one who takes responsibility for stewarding ERDDAP and ensuring that ERDDAP is as bug free as possible and he will have to deal with any problems. Please be understanding of that responsibility. So if he decides a given change isn't worth the risk, please respect that decision.
from erddap.
All great points! It would be wonderful to be able to use the extensive effort that you put into the existing custom tests in a standard framework. Totally agree that even very comprehensive tests couldn't be blindly trusted, but it could give contributors a good sense that their changes were ready for review by others and prevent wasted time if an unexpected bug was introduced.
In my opinion GeoServer is a great analogue for what we're talking about: the code base is massive and complex, end users expect stability, and they have a very comprehensive test suite which ensures expected behaviors and performance, but there's a small team of project leads who understand the project vision and all parts of the code who can make the final say on all changes before merging.
from erddap.
Related Issues (20)
- HTML Entity Name munging in XML listings HOT 1
- Search Multiple ERDDAPs seems to be broken HOT 1
- Overly restrictive S3 limitations HOT 3
- Outer axis overlap between files HOT 1
- Using XInclude HOT 2
- Unicode attributes and localized metadata HOT 46
- Record requests in a structured format HOT 4
- Inaccurate varName in logs when printing for null value attribute HOT 11
- Error: 'Error { code=404; message="Not Found: Currently unknown HOT 1
- Wrong link in the iso19115.xml metadata extraction ! HOT 5
- Dataset Citation and display info
- Add a way for admins to disable/and or adjust the limits for the "Unusual activity" mail HOT 2
- Failing test cases through maven HOT 3
- "EDDTableAggregateRows" - Time, Latitude and Longitude min/max values are calculated from the first child only HOT 4
- Complex ncml test case EDDGridFromNcFilesTests.testNcml failing
- Please show loadDatasets progress diagnostics HOT 1
- How to request a protected dataset from a script ? HOT 19
- S3 Support HOT 1
- ERDDAP not sending emails when emailUserName is not an email address
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from erddap.