genericmappingtools / gmtserver-admin Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 3.0 97.46 MB

Cache data and script for managing the GMT data server

License: GNU Lesser General Public License v3.0

Shell 88.96% Makefile 11.04%

gmtserver-admin's People

Contributors

Stargazers

Watchers

Forkers

dravat chhei whigg

gmtserver-admin's Issues

Document the decisions we made about the naming, structure, et. al.

Recently, we had a lot of discussions via GitHub issues and made some big changes to the GMT data server. GitHub issues are good places for discussion, but they are closed after we make the final decisions and become less visible to other users or us. Someday we may forget the decisions we made years ago, and make the wrong choice.

We should write down the final decisions in the README file (or a new file), so that we can easily check if a change violates the "rules" we made.

server is not doing a git pull

I added a new file to the gmtserver-admin and committed it yesterday. However, it is not on the gmtserver and I had to do a git pull and then stuff came over. There must be a problem with the srv_git_update.sh script that is run in crontab. The script is running, but no update.

List the mirrors on the website?

The Europe mirror is running for several months, but I believe most users don't know the existence of the mirror.

In PR GenericMappingTools/website#73, I add a separate page listing the mirrors of both FTP site and the data server. Please see if you agree with the changes.

In addition to that PR, we also need to mention the alternative mirrors in the following places:

Current SRTM15+ version not found

The source file for the SRTM15+ datasets was not found, possibly due to a new release.

website: https://topex.ucsd.edu/pub/srtm15_plus/
current version:

Current SRTM15+ version SRTM15_V2.3.nc not found

The source file for the SRTM15+ datasets was not found, possibly due to a new release.

current version: SRTM15_V2.3.nc

Todo list:

Current SRTM15+ version SRTM15_V2.3.nc not found

The source file for the SRTM15+ datasets was not found, possibly due to a new release.

current version: SRTM15_V2.3.nc

To-do list:

The naming of the remote files

Since we are getting close to the release of 6.1 we need to finalize decisions on names. To set the stage, let me paint the picture of the plans ahead. These will be accelerated should our NASA proposal be funded, but is likely to happen regardless. The idea is to make GMT the simplest tool to make maps using remote data. Given we already serve earth_relief in various resolutions, and from 6.1 also in both pixel and gridline registration, you know the basics. Here are two things from the NASA proposal that will affect our work:

For plotting maps (not computing), it will be allowed to not specify the resolution. I.e., I would just say gmt grdimage earth_relief -pdf map and GMT will select the appropriate grid resolution that will render a map of the requested dimensions (implicitly 15cm here) at a stated resolution (or higher). The stated resolution would be a new GMT defaults: GMT_IMAGE_RESOLUTION [300]. The reason for this is that the common man cannot be trusted to pick the right grid resolution when making a map. All of this is seamless and under the hood and new data automatically are downloaded to the user after we refresh the server.
We plan to add relief, gravity, and imagery for other planetary bodies (Mars, Moon, Venus, Mercury, etc). For Earth we have made a deal with EarthByte to distribute earth_age_xxy_g|p.grd and we will work with Sandwell to provide earth_gravity_xxy_g|p.grd as well.
Because the initial download of the 15s earth_relief file takes a long time, we plan to split these files into tiles (similar to SRTM but larger) so that users only need to download the tiles they need; this will dramatically speed up response times and avoid 3.1 Gb initial downloads.

Given those plans, I would prefer to have a common naming scheme [and this also affects the organization of the server directories (#37) in one new way]. I imagine the layout and names would be something like this:

server
  earth
     relief
       earth_relief_xxy_g|p.grd
       ........
     gravity
       earth_gravity_xxy_g|p.grd
       ........
     mask
       earth_mask_xxy_g|p.grd [land = 1, water = 0]
     images
        earth_daytime_xxy_p.tif
         ........
        earth_nighttime_xxy_p.tif
 moon
    relief
       moon_relief_xxy_g|p.grd
....
 mars
     relief
        mars_relief_xxy_g|p.grd
     images
        .....

From this layout, I hope you understand why I do not support having the BlueMarble and BlackMarble be named that way in GMT. Those names mean something to those who are aware of them, but if you are not then it is not obvious what those data mean. I argue that earth_daytime and earth_nighttime (or similar) would be clearer and fit into the naming hierarchy of the above plan.
Other than the names, this layout means I believe we should move the earth_relief files to that new subdirectory since 6.1 is not out yet so now is the time to have a permanent directory structure that easily can accommodate new datatypes and planetary bodies.

A final reminder: when the user first access a remote file, we put up a notice with the reference and credits (such as for BlueMarble etc), e.g.,

grdinfo [NOTICE]: Earth Relief at 1x1 arc degrees from Gaussian Cartesian filtering (111 km fullwidth) of SRTM15+V2.1 [Tozer et al., 2019].

I hope you will approve of this plan. I would like feedback from @GenericMappingTools/core on this.

Update DOI reference for age grids

I believe this job will require

Running gredit on all the single-file age grids to change the remark setting. Probably a command that is different for each resolution, like this:
gmt grdedit earth_age_06m_g.grd -D+s"Obtained by Gaussian Cartesian filtering (11.2 km fullwidth) from age.2020.1.GTS2012.1m.nc [Seton et al., 2020; http://dx.doi.org/110.1029/2020GC009214]"
Update the gmt_data_server.txt with the dates for all age entries.
JP@ tiles are unaffected since no data change and no metadata regarding citation in those files.

Does this cover it all, @seisman?

Server update error

The file Wessel_GJI_Fig_5.txt @PaulWessel recently added was not successfully updated on the GMT server.

Running rsync -a --delete cache ../data manually gives the following error:

rsync: failed to set times on "/export/gmtserver/gmt/data/cache": Operation not permitted (1)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]

Excess options in grdedit manual?

It shows also but none of them are usable here, right?

Extend 1s and 3s coverage via REMA and ArcticDEM

FYI: From a tip by Brook Tozer, I am downloading REMA and ArcticDEM. These are 8 meter pixel resolutions of DEMs, and the plan would be to filter these to 1s and 3s and add a bunch of new "SRTM" tiles beyond the ±60 latitude limitation. We are talking 1 Tb of data per dataset here so probably not something I can do in time for 6.3, but it would just be an eventual update to earth_relief. The wget command is chugging away as we speak. I can see my decision to get the latest MacBook M1 Max with 8 Tb SSD will be full soon enough...

Current SRTM15+ version SRTM15_V2.3.nc not found

The source file for the SRTM15+ datasets was not found, possibly due to a new release.

current version: SRTM15_V2.3.nc

To-do list:

gmtserver-admin installed on gmtserver

I have cloned the gmtserver-admin repo to the gmtserver. As explained in GenericMappingTools/gmt#1645 I have renamed the old cache dir to cache-orig for now and added a symbolic link called cache that points to gmtserver-admin/cache. I have tested that this works as before.

I was going to set up a git pull -q call in crontab but this CentOS has git 1.8.3.1 only so no -C path option. I will see if we can get things updated before I set up hourly refresh on the repo.

Update Makefile to include grav data?

In my local version of the makefile I added the commands to process the faa data:

		make earth-faa
earth-faa:
		scripts/srv_downsampler_grid.sh earth_faa
		scripts/srv_tiler.sh earth_faa

Then, I run make earth-faa. I get some messages in the terminal

scripts/srv_tiler.sh: línea 168: printf: 11.2: número no válido
make: *** [Makefile:55: earth-faa] Error 1

I think that the error is due to the fact earth_faa_01m_g.grd doesn't exist. Besides this, all the files create looks fine.

So, we should we update the makefile? Are the commnads in makefile ok?

Add plate boundaries as a built-in dataset

Description of the desired feature

Since it's one of the most frequently used geo datasets to plot as well as based on a comment in the forum (https://forum.generic-mapping-tools.org/t/borders-plate-tectonics/1946) I think it would be useful to have the plate boundaries of Bird (2002) as a built-in dataset in GMT/PyGMT.

Are you willing to help implement and maintain this feature? Yes (but first discuss)

GMT rsync protocol is not working again?

The USTC mirror maintainers reported that the rsync command doesn't work recently (ustclug/mirrorrequest#247):

rsync rsync://oceania.generic-mapping-tools.org/

@PaulWessel

SRTM15+ vs SYNBATH V1.2

Hi @GenericMappingTools/gmt-contributors and @GenericMappingTools/core: David Sandwell et al will release a new version of the SRTM15+ grid we use to derive @earth_relief_???.grd from. Normally, we would just get it and rerun the processing (scripts-driven but manually done by me, not auto on server yet [tbd]). However, there is a complication in that they also release SYNBATH v.1.2. The difference is that the SYNBATH grid adds two additional synthetic improvements:

Newly identified small seamounts are modeled as circular Gaussian shapes and the bathymetric Gaussian of some height, fixed radius-to-height ratio and density contrast whose VGG anomaly best matches the observed VGG is used to overwrite the regular inversion (which gives too smooth shapes and too low height for these small seamounts).
Using the full seafloor roughness machinery of Goff, Jordan etc they model seafloor abyssal hill roughness statistically by matching synthetic abyssal hills using spreading directions from the age grid to VGG anomalies and fill that into areas where there are no bathymetry constraints.

The result of this is that SRTM15+ and SYNBATH differ in areas of no data constraints: SRTM15+ will be smooth in those areas due to the inversion of gravity superimposed on long-wavelength bathymetry only, while SYNBATH will have "anatomically correct" small seamounts of improved amplitude and slope as well as realistic seafloor roughness in younger unmapped seafloor.

The casual user just wanting a relief map will probably be happy with SYNBATH. Scientists working with the seafloor in any capacity are likely to want both versions, for different uses. I am in that camp myself, and the convenience of using @earth_relief means I would like to see both versions in GMT.

The question is how we handle this. Imagine we called this @earth_shape or something better. We are obviously not storing 41 Gb or earth_shape_01s_g.nc files since they are the same as earth_relief_01s_g.nc. It is only in the oceans they will differ. So if the oceans are 70% of Earth then quite a few earth_relief tiles (at various resolutions) will get a corresponding earth_shape tile. The entire earth_relief_** tree directory is 51 Gb on the server. Not counting the 03s and 01s tiles it is only 3.2 Gb and if 70% is representative then 2.3 Gb. That is not much, if we can get it to work.

The paper describing these grids are concerned that users, especially the casual ones, will be confused and think the oceans have fully been mapped. I think these concerns apply to users that only see these data in Google Earth. GMT is usually used by more science-literate folks, but we would need to be clear in the documentation what the two flavors are.

Let me know your thoughts on this, including a good name that starts with earth_*.

Oceania server offline

The oceania server is showing as offline at https://www.generic-mapping-tools.org/mirrors/

Add a workflow checking for new releases of SRTM15+

We've discussed this before, but I couldn't find an issue tracking it. Prior to placing the SRTM15+v2.4 files on oceania, I would like to add and test a GitHub workflow that checks for new releases of the SRTM15+ grids. Since it looks like Dave Sandwell's group only includes the latest version at https://topex.ucsd.edu/pub/srtm15_plus/, I think we could just use curl to check if the file set as SRC_FILE in recipes/earth_relief.recipe exists.

@PaulWessel and @seisman, is there a reason why there are not currently any GitHub actions workflows in this repository?

Adding a Europe data mirror

@joa-quim may have access to an U Algarve server that can mirror the GMT data and cache materials. My understanding is this is a CentOS server and there may be limited system admin help to set things up. We need a crontab entry that runs a sync script, possibly just a rsync command, that would keep his mirror in sync with oceania.generic-mapping-tools.org. There may be some further system setup to add SSL so that we can access this server via https. Finally, we need to add a forward from europe.generic-mapping-tools.org to that server.

Question about what files can/should be added to the gmt server

Two questions about what files can be added to the gmt server:

Are files in gmtserver-admin/cache only for core GMT tests/example scripts or can files be added for PyGMT, GMT.jl, GMT/MEX tests/example scripts as well (e.g., GenericMappingTools/pygmt#1364)?
What is the preferred file size limit for data added to gmtserver-admin/cache? The current largest file size is ~34 MB, with most files <5 MB. The file referenced in GenericMappingTools/pygmt#1364 is ~47 MB. It would be helpful to have file size guidelines for tests/examples.

Install GMT on the gmt server

I realize I cannot run my update_doi_age.sh on the server since it has no gmt installation.
Seems like that should be fixed, no @seisman? This is CentOS-7. I am not sure what is the best approach here. We do not plan to do much cutting-edge-GMT work on this machine, so it could just have a relative recent GMT version from CentOS. What is the latest GMT on CentOS-7? As long as 6 or better I think that is OK. Shall I just follow our INSTALL instructions?

Merge the GMT FTP server into Data Server?

The GMT FTP site (ftp://ftp.soest.hawaii.edu/gmt) is the site for distributing GMT tarballs. Before we migrated to GitHub, it's the only official site for downloading GMT tarballs.

As mentioned in the bus-factors repository, it has a bus factor = 1, as Paul is the only person who can manage files in the FTP server.

Considering that "FTP is almost dead" (https://filecamp.com/blog/top-5-reasons-ftp-dead/, https://www.ghacks.net/2019/08/16/google-chrome-82-wont-support-ftp-anymore/), should we migrate the GMT FTP server to the Data Server? For example, put all tarballs in the releases subdirectory?

Pros:

the Data server uses HTTP or HTTPS, not FTP.
@PaulWessel @joa-quim @leouieda @seisman all can manage the files in the data server, so the bus factor is 4.
The data server already have many mirrors

We can still keep the FTP server for backward compatibility, but we may retire it after 5-10 years.

Add more mirrors

I have now sent reminder emails to UNAVCO and NOAA (Walter), plus asked the S. Africa FTP mirror folks if they would be able to serve data as well, and also Dietmar at U Sydney. Checking with @leouieda regarding S America - we have Eder doing the FTP mirror but not sure he has capacity. Once Leo responds I can send an inquiry that way as well. The main weakness today is North America but getting a unavco and noaa would go a long way (US east coast, US Mountain). Perhaps we can get Dave to host a us-west-coast mirror?

Originally posted by @PaulWessel in #18 (comment)

Add info for first time users in UH Server

I have this tips for first time users in the University of Hawaii servers (or any other?). Where should I put it?

Login in the server with ssh and them use groups to see if one belongs to the gmt groups (and thus has permission to work):

e.g.:

ssh [email protected]
(If the password is ok, then a message like this will appear)
Last login: Tue Aug 16 16:19:32 2022 from 190.246.68.195
-bash-4.2$ groups
gmt

Does the gmt FTP server support rsync?

The China mirror uses the rsync command to do the mirroring.

The command works well for the GMT data server:

rsync rsync://oceania.generic-mapping-tools.org/gmtdata/

but it doesn't work for the gmt ftp server:

$ rsync rsync://ftp.soest.hawaii.edu/gmt                              
rsync: [Receiver] failed to connect to ftp.soest.hawaii.edu (128.171.151.230): Connection refused (61)
rsync error: error in socket IO (code 10) at clientserver.c(137) [Receiver=3.2.3]

Due to the rsync failure, the China mirror has to "mirror" an unofficial mirror:

rsync rsync://gmt.mirror.ac.za/gmt/

which is not ideal and may be insecure.

Does the GMT FTP server support rsync? If not, can it be enabled?

Gridline-, pixel-, or both registrations?

The earth_relief_xxy DEMs we supply have different registrations: The SRTM 1s are pixel registered, as is the 15s SRTM15+v2. For 30s and up we downsample the 15s grid and write gridline registered files. This was partly motivated by the registrations of past DEMs (like ETOPO1m) as well as wanting to match the registrations of other useful grids (e.g., crustal ages). However, as we add more remote grids and especially images, there will inevitably be a mismatch, and then we are forcing the user to do painful things like resample one item from pixel to grid or vice versa, loosing the highest frequencies in the process.

Another solution is to supply both g and p versions on the downsampled versions. Thus, while 1s, 3s, 15s must remain in their original pixel registration, 30s and up could come in two flavors, e.g.

earth_relief_01m[p|g]

The way this would work is that if no p|g are indicated then we get the default registration [TBD], while if you are specific you get that particular version, if it exists, else you get a warning and we serve you the other one. Most users won't care or understand the difference, but people who need to get a 1-minute DEM that matches the pixel registration on some global geotiff may ask for the pixel-registered DEM. I would only apply this scheme to geophysical data, not images (which thus will be pixel only).

I am contemplating this for my own needs as well: I do make maps of crustal ages using shading from DEMs, and I do analyze depth-age from the two grids, and it is always super-annoying if I need to switch registration as I loose information. And of course, all the global geotiffs are pixel registered and thus I rather use a pixel-registered 01m DEM than fuss with the conversion myself.

If someone naively uses grdgradient on the gridline DEM then uses that as intensities with a pixel image then we can give a helpful error message when the registration clashes.

I wish I could automate the selection but it seems difficult to know what the user needs ahead of time. Thoughts from @GenericMappingTools/gmt-contributors ?

Old files from SRTM15_v2.1 in test server?

I was looking in the test server and I found these two folders. I think that the first is an old version (2.1) of the SRTM15, and thus could be deleted.

http://test.generic-mapping-tools.org/server/earth/earth_relief2.1/
http://test.generic-mapping-tools.org/server/earth/earth_relief/

Finalizing earth_day/night images

I have earlier fussed about this because I want to downsample these images the same way we do data set, i.e.,g a Gaussian Cartesian circle that is sensitive to latitude. gdal_translate does not to that. But I think I have the solution:

extract red, green, and blue grids from the full-resolution image via grdmix.c
Apply the same processing we do for earth_relief to each of the band grids, including both pixel and gridline verisons for downsampled versions
Assemble the final images via grdmix and write geotiffs.

Now we have @earth_night_xxy[_g|p].tif and @earth_day_xxy[_g|p].tif.

I think this is the way to get this done, and done consistently. Any concerns, @joa-quim and @seisman ?

Organization of server data

Currently, we only have earth_relief_xxy files in the gmt/data directory (everything else is under gmt/data/cache). However, we are about to add both blue and black marbles, the global crustal ages, and it is likely there will be more data sets in the future that should not be considered for cache (since they will have multiple resolutions etc). To peak ahead, it is likely we will split large global items into tiles, similar to SRTM. Whether we do that or not right now, it seems we should think about organization. How about this:

gmt/data/cache: Odds and ends used for tests and examples tutorial etc.
gmt/data/server: Data served by us.  In here there will be subdirectories:
    earth_relief
    earth_ages
    earth_marble [maybe one each for black and blue, or some clever scheme]
    ...

Inside these directories are the actual files: earth_relief_xym plus srtm1, strtm3 will be in the earth_relief folder, etc.

Perhaps the gmtserver needs to produce or maintain a listing of what is in server so that gmt can discover that we have added more data. We would at least need to know if a dataset is tiled or not to know what to do. I think the decisions that happen in gmt_remote.c depending on earth_relief resolution (get file or get tiles) need to be abstracted away and be based on a setup file we refresh, just like we refresh the hashes.

Adding a north America data mirror

When we started the GMT data server project, we were envisioning a series of mirror servers around the world. These could be forwarded to as continent.generic-mapping-tools.org or possibly country.generic-mapping-tools.org. Continents seems like a way to start. We have the master at oceania. Per #15 we may get one for Europe in Portugal. North America definitively needs one or more. Do we go for states (california, colorado, newyork etc) or just north-america and let others add their own unofficial ones?

If we are to select a singe North America server then maybe I should approach UNAVCO about this instead of adding to Dave Sandwell's workload?

The data server fails to update

waveform_AV.DOL.txt is still not available in cache. Checking the GMT data server log, I see

Start Time: Thu Mar 18 03:00:01 HST 2021
fatal: Not a git repository (or any parent up to mount point /export/gmtserver)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /export/gmtserver)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /export/gmtserver)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
/export/gmtserver/gmt/gmtserver-admin/scripts/srv_git_update.sh: line 19: [: : integer expression expected
End Time: Thu Mar 18 03:00:01 HST 2021

Something wrong?

What to do when @earth_relief_xxy is given?

Here is my proposal as to what we do when remote files are specified without a choice for registration.

GMT <= 6.0.0: Links on the server pointing to the gridline-registered versions in the deeper directories. That way we stay backwards compatible with 6.0.0. Since 6.0.0 does not read gmt_data_server.txt it cannot access the other files on the server.
GMT 6.1: Principle is to avoid duplicating data. Thus, I suggest that we do the same thing we do if there is no extension: We append it. Hence, a given filename of @earth_relief_02m should first have "_p" appended to it (or g if there is no p), and then we append ".grd" to make it a file name. This way there is never a file called earth_relief_02m.grd written to the users directory, only the one in the earth/earth_relief directory. If earth_relief_xxy is actually a directory with tiles, we download the needed tiles, blend and return that grid. Therefore, we should not add links inside the earth/earth_relief directory to deal with the missing p or g as that only would lead to duplication of files or require a more complicated implementation in GMT. We do not want (or can) to set symbolic links in the user's directory. I think a consequence of this for 6.1 is that we ignore any old grids. The reason is that they are already outdated by newer data and if they had the right name and placement they would be re-downloaded anyway.
Error: Someone giving @earth_relief_xxy_p|g when there is no such file or directory.

Cannot get past one tile

I want to download all the 30 second tiles:

gmt grdinfo @earth_relief_30s -G

However, I cannot get past this tile: N60W135, even if restarting after failure:

gmt grdinfo @earth_relief_30s -G    
grdblend [NOTICE]: Remote data courtesy of GMT data server oceania [http://oceania.generic-mapping-tools.org]

grdblend [NOTICE]: Earth Relief at 30x30 arc seconds from Gaussian Cartesian filtering (1.0 km fullwidth) of SRTM15+V2.1 [Tozer et al., 2019].
grdblend [NOTICE]:   -> Download 15x15 degree grid tile (earth_relief_30s_p): N60W135
grdblend [ERROR]: Libcurl Error: Couldn't resolve host name
grdblend [WARNING]: You can turn remote file download off by setting GMT_DATA_UPDATE_INTERVAL to "off"
grdconvert [ERROR]: File /Users/pwessel/.gmt/server/earth/earth_relief/earth_relief_30s_p/N60W135.earth_relief_30s_p.jp2 was not found
grdconvert [ERROR]: Cannot find file /Users/pwessel/.gmt/server/earth/earth_relief/earth_relief_30s_p/N60W135.earth_relief_30s_p.jp2
grdblend [ERROR]: ERROR - Unable to convert SRTM file /Users/pwessel/.gmt/server/earth/earth_relief/earth_relief_30s_p/N60W135.earth_relief_30s_p.jp2 to compressed netCDF format

It got 243 tiles before then and I have 300 Gb of free space on my drive. Permissions on the server are all consistent across the tiles. But above it says cannot resolve host name. Network is up.

Any ideas? Pinging @joa-quim @seisman @meghanrjones

Server is down and status shield does not work

Check if NASADEM tiles truly have been updated relative to what we serve

See the forum for details.

rsync the server data

Per SOEST IT staff, this is now configured and users need to run

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' <destination_directory>

where <destination_directory> is the full path to where they want to mirror these files on their local computer. The quotes are needed do to the * wildcard. I just tested this on my Mac and it ran fine. We have replaced the symlinks with actual files and directories. Let me know how this is working for @joa-quim and @seisman now. I notice the files are created with rw for owner only but that is probably a umask setting for me rather than in general. Thus, you may need to do a

chmod -R og+r <destination_directory>

to make sure files are readable.

Rules for grids with two types of registrations

For good reasons, GMT 6.1 will serve up both pixel and gridline registered files if possible. For a particular increment, say 01m, it means the server has these two files:

earth_relief_01m_g.grd
earth_relief_01m_p.grd

If you specify any of these with @ then you get that file by that name written to your .gmt/server/earth/earth_relief folder.

If you instead ask for @earth_relief_01m then what happens? Since those are the only names known to GMT <= 6.0.0 we have for backwards compatibility reasons set links that points to the corresponding g versions, except when there is no g (i.e., for 1, 3, 15s files).

Most users could not care less about p vs g, and will be confused by suddenly having to pick one. Thus, having a default registration for all files we offer seems like a good idea. Experts can always get what they want, and our updated dataset description will need to explain all this, including when it matters (want to use DEM with another data, both needs to be same registration), otherwise you must convert one of your files and loose lots of short-wavelength information.

However, what the default resolution should be is not obvious, and in many cases does not matter at all. Yet, I think picking the pixel version makes sense from three lines of arguments:

For rectangular projections (e.g., Mercator), having pixels may mean the entire image that is projected and filled fits perfectly inside the domain. Gridline registered files will extend half a pixel outside and a clip path makes things only show inside.
If you run grdimage -A to build an image directly then you are definitively already in pixel territory and you may minimize resampling by starting with a pixel grid.
The highest resolution data set (i.e., the source data) is usually only available in one registration and all lower resolution versions will be available in that registration. For relief, which I assume will always be our most used data, that registration is pixel.

Unless Dave agrees to make both g and g grids at 15s it would seem selecting p as the default registration avoids switching from p to g as a function of increment.

Thoughts, @joa-quim and @seisman ?

Make this repo public

Hi developers-

I set this repo up as a private repo since it seemed like it would have no value to outsiders and we don't really need people to comment on our server schemes. But isn't this wrong? Would it not be better to open this up? I think in light of our succession planning the answer is we should only have open repositories. Let me know what you think, @leouieda, @joa-quim, and @seisman, and if you agree I can make the changes.

Check status of GMT data server mirrors

Currently, GMT data server has two mirrors:

Oceania: https://oceania.generic-mapping-tools.org (both HTTP and HTTPS)
Europe: http://europe.generic-mapping-tools.org (HTTP only)

shields provides a simple way to check/show if a site is running. It shows that the Europe mirror is up but the Oceania mirror is down.

The possible reason is that the Oceania mirror returns 403 Forbidden status code. Is it possible to let the Oceania mirror shows the index page and return 200, like what the Europe mirror does?

Adding an Asia data mirror

@seisman may be able to organize a china.generic-mapping-tools.org mirror. Can this be the Asia mirror or do we push on folks in Japan?

Need recipe for building land/ocean mask grids

Given that creating high-resolution land/ocean masks with grdlandmask takes a while, my earlier list of datasets for earth included earth_mask_xxy.grd grids. While I am swamped with revising the remote machinery, perhaps someone could build a srv_buildmask.sh script to create these files? They would be byte grids I think (=nb). Unfortunately, only the old native GMT format supports bit-grids but we need it to be a netCDF grid.
See the other scripts in the scripts folder for style. We would build this for the same spacings that we have DEM for.

What to do if highest resolution grid has dumb increments?

In testing the mars_relief recipe on the highest resolution MOLA grid (Mars_HRSC_MOLA_BlendDEM_Global_200mp_v2.tif) one learns that it is pixel registered and 106694 x 53347 with a 200 m pixel which works out to be 12.1468873601 arc seconds. Not exactly a good number to divide into one degree. Basically, there are 296.372222222 pixels per degree. Hence the trusted tiler which tries to make 10x10 degree JP2 tiles run into massive

grdconvert [WARNING]: (e - x_min) must equal (NX + eps) * x_inc), where NX is an integer and |eps| <= 0.0001.

warnings and it is just junk of course since one tile cannot match the edge of the next. I had tentatively named this grid mars_relief_12s_p.nc, knowing it is not actually 12s. Of course, anyone studying Mars might want to make the highest quality map they can of a specific region and want that grid, but we are unable to make it tiled. So, the options I see are:

Let the highest resolution of a data set be the one with a grid increment that divides into 1 degree and yields a whole integer.
Let the highest resolution of a data set be the next standard increment (1,3,5,10,15, ...)
Let the tiler script simply skip grids that cannot be tiled and we only serve it as a single grid (as we do for 6m and coarser).

In case 1, we find the first integer below 296.372222222 that divides nicely into 3600 is 288. Thus, one would select 12.5s as the increment and filter the 2.1468873601s grid marginally to yield a 12.5 grid. We may choose to name it to the nearest integer second for compliance with our patterns (so 12s or 13s). The original highest resolution grid would not be distributed.
In case 2, we know the answer is 15s so we simply produce solutions from 15s and up. The original highest resolution grid would not be distributed.
In case 3 we upload the untiled mars_relief_12s_p.nc grid (like we do for low resolutions) and start tiling at 15s. This means anyone attempting to cut a chunk off @mars_relief_12s will have to wait for the entire 3.1 Gb grid to be downloaded (once), but at least the highest resolution grid is distributed.

So the casual user might be fine with cases 1 or 2, while Marsophobes will complain that we messed up the high resolution data by filtering.

I dont like to dumb down the original so I think we should pursue option 3. It is a simple test to add to the tiler to check if we have an integer number of nodes per degree, and if not we skip tiling that guy. In support of this, our unadulterated highest res netCDF grid is 3.1 Gb while the original TIF from NASA is 11 Gb, all due to our lossless compression and use of 16-bit integers with granularity of 50 cm. Comments, please.

Refreshing tiles

Currently, the gmt_hash_server.txt file contains

All the random, files in cache
The old earth_relief_xxy.grd files

We never added the SRTM tiles since we assumed they will not change. This is probably not 100% true as new versions have come out. If that is the case we have no other way to refresh them than to tell users to remove the srtm? subdirectory.
We are about to distribute more tiled datasets and these tiles will change over time. The good news is that there are way fewer of these, so they could to into the gmt_hash_server.txt file.

Seems to me this is an acceptable trade-off: Add a few hundred tile files to the cache list, but not the 28000+ SRTM 1s and 3s tiles.

OK with that, @joa-quim and @seisman ?

Directory structure on server

Right now, https://ocania.generic-mapping-tools.org points to www.soest.hawaii.edu:/gmt/data. That directory looks like this:

-bash-4.2$ ls
cache earth_relief_02m.grd earth_relief_05m.grd earth_relief_15m.grd earth_relief_30m.grd gmt_hash_server.txt gmt_md5_server.txt.orig srtm3
earth_relief_03m.grd earth_relief_06m.grd earth_relief_15s.grd earth_relief_30s.grd gmt_hash_server_previous.txt gmtserver-admin
earth_relief_01m.grd earth_relief_04m.grd earth_relief_10m.grd earth_relief_20m.grd earth_relief_60m.grd gmt_md5_server.txt srtm1

This is not ideal from the point of view of some mirror operator trying to mirror this via rsync or similar since there are stuff here that should not be synced, such as the gmtserver-admin local repo which contains the cache data (the cache you see is just a symbolic link to gmtserver-admin/cache). I think it would be better if the URL we used only showed

cache
earth_relief_*.grd
gmt_hash_server.txt
srtm[13]

We can do this by placing the gmtserver-admin local repo one level up and change the cache link accordingly). I would also like to delete the old gmt_md5_server.txt since not used by GMT6. Do you see any issues with this?

meca.dat not found on server

Apparently our UNAVCO course example and youtube refers to @meca.dat but it is not on the server? Did we forget to add it?

Add lunar DEM file to remote server

Per a discussion at the forum, I'm hoping to add a lunar topographic relief grid to the GMT remote server.

Generalizing tiling operations

These nots are mostly to myself to clarify how I did the tiling blending, but also for GMT contributes to comment on in case I am proposing the wrong path.

Currently, we only offer two tiled data sets: SRTM1s and SRTM3s. THey are not available everywhere (land only) so we also have a tiny srtm_tiles.nc grid that is true/false if a tile exist. Here is how GMT handles a request like gmt grdimage @earth_relief_01s -RFR -JM20c -B -pdf map:

During module initialization (gmt_init_module) we call gmtlib_file_is_srtmrequest. It checks if this is one of the two earth_relief_01|3s files, and if so it sets ocean to true if we gave earth_relief instead of srtm_relief as the name. If not then we are done.
Next, gmtlib_get_srtmlist is called. It builds the list of tile file names needed. To do this it uses the region (-R) rounded outwards to nearest tile size (here 1 degree). It then loops over all the imaginary tiles needed but if the relevant node in the grid srtm_tiles.nc is false we skip it since that tile is not present on the server. If ocean was determined to be true then we also add the file earth_relief_15s as the last file. The listfile is called =srtm##### (random unique ints) and placed in the temp dir. Its path is then used to replace the initial @earth_relief_01|3s file given to the module.
When the grid is accessed by GMT_Read_Data, we recognize the gridname using the gmtlib_file_is_srtmlist function, and if true then we call gmtlib_assemble_srtm which passes the list of tiles to grdblend. The resulting grid is passed back out to the module

All this works fine, but we need to generalize this to work for any dataset with any tile size, etc. This means we must

Use the dataset name (without _xxy) to mean a directory with tiles (instead of special names like srtm1). E.g., if we tiled the 15s grid then inside eart/earth_relief there would be a directory earth_relief_15s and no earth_relief_15s_p.grd file.
Let the tile size be a variable
Standardize the name of the tiles.nc grid
Generalize the notion of a background default grids for areas with no tiles.

I think the information needed to do this simply goes into the gmt_data_server file which is loaded into memory during GMT_Create_Session.

We may also consider not doing the with or without ocean options since that means different input names.

Downsampling SRTM15+V2.grd to other resolutions

Our beta earth_relief_xxy grids were downsampled by spherical Gaussian filtering. This means that the wavelengths of features anywhere on earth where filtered the same way. However, the way the original grid was produced via Cartesian gridding of strips, there is much higher spectral resolution in the east-west direction as we go to higher latitudes, since the bins per degree longitude is fixed. Potentially, there is information at a shorter wavelength in east-west at high latitudes that is shorter than the pixel spacing at Equator.

My question then is this: Do we continue to do the spherical filtering so all parts of the planet has the same spectral content, or do we use a filter that does not reach wider in east-west as we go to high latitude? Below is a series of plots. The first three are from Africa near (0,0). The first plot is the source (15s) while the others are 5m smoothed versions. The second plot is spherically filtered while the third is Cartesian filtered. There are some tiny differences between the last two:

The next three are from NE Greenland at latitudes 75-80. Same order of plots and now you can see a clear difference in that the Cartesian filtering leaves more details in the east-west direction:

I don't really like that the original grid is not spatially uniform, but I cannot do much about that. I tend to like the spherical treatment I gave these grids, but you may feel otherwise. What do you guys think, @GenericMappingTools/core?

The Portugal mirror is offline

Link: http://portugal.generic-mapping-tools.org

Ping @joa-quim @PaulWessel

genericmappingtools / gmtserver-admin Goto Github PK

gmtserver-admin's People

Contributors

Stargazers

Watchers

Forkers

gmtserver-admin's Issues

Recommend Projects

Recommend Topics

Recommend Org