nansencenter / django-geo-spaas Goto Github PK
View Code? Open in Web Editor NEWGeoDjango apps for satellite data management in Geo-Scientific Platform as a Service
License: GNU General Public License v3.0
GeoDjango apps for satellite data management in Geo-Scientific Platform as a Service
License: GNU General Public License v3.0
Revision
does not seem to exist in the dict keys and the situations in geospaas.vocabularies.managers
are not tested.
if 'Revision' in platform.keys():
continue
I.e., the second line is not tested.
Check if this is really needed
Add unit tests for these situations
When the database models are changed we need all the migrations to keep databases intact.
In 83ff64a, the file geospaas/catalog/migrations/0002_auto_20160705_1331.py
and other autogenerated migrations have been removed. That causes existing geo-spaas installations to fail.
Didn't you test? Remember to also test the dependent apps, like django-geo-spaas-svp-drifters.
This needs to be reverted and solved properly...
Please remember to also follow the Nansat conventions.
My problem is related to scatterometer data but the solution could be relevant in other cases as well. In geo-spaas, a full scatterometer acquisition will always(?) overlap with other data since the dataset covers a full orbit. The solution in the nansat mapper (scatterometers.py
) is to split the scene into four. An extra kwarg quartile
is therefore added, and its default value is 0, i.e., the first quartile of the scene is opened.
If we want to add all quartiles to geospaas, we need to create a separate management command but we also need to ensure that this is run every time a scatterometer dataset is ingested. Otherwise we will only ingest the first quartile.
Solution: Ensure the dataset is correctly ingested by adding a new method, named after the nansat mapper, before regular nansat ingestion. The nansat ingestor should always try to run such methods before falling back to default.
In order to support a Posgresql database, the psycopg2
package needs to be added to the docker image
Intermediate models are needed only if there is extra information associated to the connection between two models. DatasetParameter and VisualizationParameter do not have such information and are therefore most likely obsolete.
The URI generic syntax consists of a hierarchical sequence of five components:
URI = scheme:[//authority]path[?query][#fragment]
DatasetURI.uri
is presently not validated. This could be a big problem if a lot of data is wrongly ingested. We should add some validation in the get_or_create
method of this model:
Check that the generic uri syntax is followed
Check that the uri points to an actual resource
Correct any wrong uris already added to the database - perhaps using migrations?
When a new dataset is ingested, we should always add CF-variables to the parameters
field. This would allow searching as, e.g.:
ds = Dataset.objects.filter(parameters__standard_name='wind_speed')
Important when the databases start to grow...
I am using the new ascat mapper in nansencenter/nansat@185ae9c - but the metadata is not registered when I ingest with the nansat_ingestor. I thought I had done it correctly. I'm taking holiday in 25 minutes. Would be good if @akorosov could look at it....
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: entry_title is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: entry_id is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: summary is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: ISO_topic_category is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: gcmd_location is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: data_center is not provided in Nansat metadata!
warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/nansat/domain.py:567: UserWarning: > 180 deg correction to longitudes - disabled..
warnings.warn("> 180 deg correction to longitudes - disabled..")
Sometimes it is nice to have an overview of ingested files from command line. We can add a command 'list' or 'summary' or 'list_datasets' that simply prints the list of datasets for a given period, area, etc.
This command can extend the CommandBaseClass from #17
Data center name is hardcoded to "NERSC". It should rather be
sname = pti.get_gcmd_provider('nersc')['Short_Name']
isocatname = 'Oceans'
->
isocatname = pti.get_iso19115_topic_category('oceans')['iso_topic_category']
gcmd_location
should also be added
Since we are using try-except clauses for these cases, I am not sure how to best test this...
It seems that migration fails, but still it is possible to connect to the virtual machine.
After login to the machine, the new environment should be activated manually by the command "source activate py3django".
If some python packages were not properly installed, add them to the list in the file, "provisioning/roles/nansencenter.django/tests/conda_env_requirements.yml"
TASK [geospaas : geospaas | Run migrate on django-geo-spaas project] ***********
fatal: [geospaas]: FAILED! => {"changed": false, "cmd": "./manage.py migrate --noinput --settings=project.settings --pythonpath=/vagrant/project", "msg": "\n:stderr: Traceback (most recent call last):\n File "./manage.py", line 15, in \n execute_from_command_line(sys.argv)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line\n utility.execute()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/core/management/init.py", line 357, in execute\n django.setup()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/init.py", line 24, in setup\n apps.populate(settings.INSTALLED_APPS)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/apps/registry.py", line 112, in populate\n app_config.import_models()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/apps/config.py", line 198, in import_models\n self.models_module = import_module(models_module_name)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/importlib/init.py", line 126, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File "", line 994, in _gcd_import\n File "", line 971, in _find_and_load\n File "", line 955, in _find_and_load_unlocked\n File "", line 665, in _load_unlocked\n File "", line 678, in exec_module\n File "", line 219, in _call_with_frames_removed\n File "/vagrant/geospaas/nansat_ingestor/models.py", line 4, in \n from geospaas.nansat_ingestor.managers import DatasetManager\n File "/vagrant/geospaas/nansat_ingestor/managers.py", line 8, in \n from nansat.nansat import Nansat\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/init.py", line 37, in \n from nansat.domain import Domain\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/domain.py", line 23, in \n from nansat.tools import add_logger, initial_bearing, haversine, gdal, osr, ogr\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/tools.py", line 31, in \n from mpl_toolkits.basemap import Basemap\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/mpl_toolkits/basemap/init.py", line 155, in \n pyproj_datadir = os.environ['PROJ_LIB']\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/os.py", line 669, in getitem\n raise KeyError(key) from None\nKeyError: 'PROJ_LIB'\n", "path": "/home/vagrant/anaconda/envs/py3django/bin:/home/vagrant/anaconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games", "state": "absent", "syspath": ["/tmp/ansible_django_manage_payload_jSeB5z/ansible_django_manage_payload.zip", "/vagrant", "/usr/lib/python2.7", "/usr/lib/python2.7/plat-x86_64-linux-gnu", "/usr/lib/python2.7/lib-tk", "/usr/lib/python2.7/lib-old", "/usr/lib/python2.7/lib-dynload", "/usr/local/lib/python2.7/dist-packages", "/usr/lib/python2.7/dist-packages"]}
to retry, use: --limit @/vagrant/provisioning/site.retry
PLAY RECAP *********************************************************************
geospaas : ok=25 changed=2 unreachable=0 failed=1
Sometimes it would be great to have additional fields in the database. For example, it could be polarization
and pass
for SAR data, drogue_lost_date
for SVP-drifters, etc. Unfortunately, there is only one loophole/(available ~empty field) in a current model structure which allows you to push an additional information/metadata. This field is catalog.Dataset.summary
. But using this field for accumulating additional metadata is a messy and not convenient solution.
To keep a generic structure of the db ingester models is used just as a proxy. Thus it is not possible to add additional fields to an ingester model without creating a whole new table.
I did not find an entirely consistent solution, yet. But I thought that it could be possible to add one single field to the catalog.models.Dataset
which will allow you to bind the Dataset with any other ("external") table/model from outside (from an ingester). Thus, the generic structure of the geo-spaas will not change but will allow to tune external tables and add additional metadata from ingesters
There is a many to many relationship between dataset table and the parameter table. Some tests should be written in order to test the filtering ability of datasets based on specifying the parameters of them. This is working in the Django shell and the tests should assert that this filtering is ok. Thus, tests should consider below notes:
ingest
or ingest_thredds_crawl
command should be used to ingesting a new dataset with nansatcrawl
function for this purpose if ingest_thredds_crawl
command is usedNansat
objectds = Dataset.objects.filter(parameters__standard_name='wind_speed')
For this purpose, below python code ( as snippet, not the exact code ) can be used in a separate file:
import os
import django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.project.settings")
django.setup()
import geospaas.nansat_ingestor.management.commands.ingest as ingest
command = ingest.Command()
command.handle(files=["/data/MER_FRS_1PNPDK20120303_093810_000000333112_00180_52349_3561.N1"], nansat_option=[])
import geospaas.nansat_ingestor.management.commands.ingest_thredds_crawl as ingest_thredds_crawl
command = ingest_thredds_crawl.Command()
command.handle(url=["http://nbstds.met.no/thredds/catalog/NBS/S2A/2017/01/catalog.html"],date = '2017/01/10')
Similar to nansencenter/nansat#450
What is this error message I face when installing geo-spaas-vagrant:
"Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again."
Thanks
platform.short_name and instrument.short_name are used as natural_key in Source and cannot be empty.
But for level-4 data the short_names are empty because higher level GCMD platform and GCMD instrument must be used (e.g. ACTIVE REMOTE SENSING).
So far a hack is introduced in Nansat for GLOBCURRENT data: mapper_opendap_globcurrent sets instrument and platform to Jason-1.
May processing commands run processing for a given period of time, given area and given file types.
Each of such commands must first find matching files in the database.
We can add a base abstract class ProcessingBaseCommand(BaseCommand) that defines CLI arguments for these basic filtering and performs filtering. Then processing command can extend this class.
Such class can be added e.g. to geospaas.catalog.utils
when we want to create parameters for latitude the get
command raise the get() returned more than one Parameter -- it returned 2!
exception in this line:
It is because we have several latitude parameter that we have in the installed database from the build_container.sh
script which is run in the our database.
It is also good to check that the same problem is not occur in the longitude
parameter as well.
In nansat_ingestor source.specs is set to n.get_metadata('entry_title'). But that raises error (something like test UNIQUE criteria error) when adding datasets from the same sources but with different entry_title.
Update of vocabularies is quite slow: pythesint is forced to update its JSON files first. Maybe an option can be added to vocabularies.update_vcabularies that allows to get metadata from the already existing JSON files, e.g.:
./manage.py update_vocabularies fast=True
Individual Vagrantfile for each repository in combination with small, modular ansible roles (also in separate repos) can benefit faster development.
Several models, like Platform and Instrument, have no uniqueness constraint (a natural_key()
method does not guarantee uniqueness of the object).
This makes them unsafe in a concurrency context, for example multi-threading. The following piece of code inserts three platforms instead of one:
def create_platform():
_, _ = geospaas.vocabularies.models.Platform.objects.get_or_create({
'Category': 'test',
'Series_Entity': 'test',
'Short_Name': 'test',
'Long_Name':'test'
})
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
for i in range(3):
executor.submit(create_platform)
The models I have found which are concerned by this issue are:
vocabularies
appGeographicLocation
Personnel
Role
A pretty straightforward solution is to add a unique=True
parameter on one of the fields, or a unique_together
constraint in the Meta of the Model class. This would be the simplest solution to integrate with the existing model.
For the vocabularies
models, we could do either of those:
short_name
long_name
uniqueunique_together
constraintThis might be overkill and not really necessary since a primary key is automatically generated by Django.
The ingest_thredds_crawl command ingests datasets available on thredds. We're interested in all services, not only opendap. However, now the opendap is ingested before the other services, using the default name and service specified in the DatasetURI model. This results in an IntegrityError, which has been ignored. Rather than doing it like this, the opendap uri should be added with the correct attributes. Pretty stupid error which can be solved by just changing the input to the dataset get_or_create method...
It is important to keep migrations under proper version control. Therefore, makemigrations should be run manually and added with git instead of this being done in the vm provisioning.
This code from commit ba6d44e is obviously wrong (what if the scheme is http
?):
uri_parts = urlparse(uri)
if uri_parts.scheme == 'file' and uri_parts.netloc == 'localhost' and len(uri_parts.path) > 0:
return True
else:
raise ValueError('Invalid URI: %s' % uri)
Sometimes geometry looks really bad (especially fir high latitude scenes and also close to dateline). It is grealty improved when GCPs are first reprojected.
So far the viewer prints coordinates of displayed polygons into HTML. If there are many polygons that is very slow. Better to add polygons using ajax and a micro-service that generates polygons.
Current implementation in sea-ice-drifters may help
The microservice can be reused e.g. in jupyter notebooks.
Speedup provisioning by using docker containers
I have made a couple of apps for in-situ data management, e.g.:
In addition, we have in-situ drifters etc.;
There is some repetitive code in the managers of those apps, and I think we could benefit from making a parent manager in django-geo-spaas.
So far Dockerfile has django==2.2. Otherwise two errors are raised by testing:
ERROR: test_search_loads (geospaas.viewer.tests.FormAndViewTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/django/template/defaulttags.py", line 1021, in find_library
return parser.libraries[name]
KeyError: 'staticfiles'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
django.template.exceptions.TemplateSyntaxError: 'staticfiles' is not a registered tag library. Must be one of:
admin_list
admin_modify
admin_urls
bootstrap_tags
cache
i18n
l10n
leaflet_tags
log
static
tz
======================================================================
ERROR: geospaas.catalog.tests (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: geospaas.catalog.tests
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "/opt/conda/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name
__import__(name)
File "/src/geospaas/catalog/tests.py", line 10, in <module>
from django.utils.six import StringIO
ModuleNotFoundError: No module named 'django.utils.six'
But Django is evolving and so should we.
Tests currently fail in all branches... I will look into it
For the border points in the western hemisphere returns negative longitue values.
If a datasets crosses dateline (180E) then border contains high positive values (near 180) and high negative values (near -180). The border gets distorted and it is impossible to correctly search for it.
Add option to Domain.get_border() to fix longitude values if dataset crosses dateline.
To call Nansat with this option from nansat_ingestor.
It is starting to be important with some documentation for this package. Sections to be added and filled out are (please edit and add if anything is missing):
Since these additions do not affect the working Python code, we do not need to create an issue branch for this.
By default, Django does not validate model fields before saving. This can, however, be done using the full_clean()
method.
Consider if we should validate model fields (in particular the Dataset
model) when new instances are created (I have added a validator to the entry_id
-field)
Implement model validation before save()
(possibly using signals)
Processing systems for SAR, NOAA NDBC buoys, Lance buoys, GNSSR, HAB, ASCAT wind, and AIS should be moved to separate repositories.
The proper citation is required for many journals now.
The entry_ID
should be added to our Dataset model. It seems to be the right place to put, e.g., the station identification of a NOAA weather buoy with a common prefix. The selection of an entry_ID
depends on how we define our datasets. For example, for the SVP drifter datasets we define a dataset per drifter in 10(?) day intervals. Thus, the entry_ID
should be carefully defined in each case...
Add field entry_id
to Dataset
Write tests to ensure it is working
Make migrations
Add code to automatically set unique entry_id
in the nansat_ingestor
Make sure all ingestors add unique entry_id
's
Add description to docs
Add migrations in all apps to update tables with correct entry_id
's
A default value of the end
parameter is 2010-12-31
and it is too early
def find_datasets(self, start='1900-01-01',
end='2010-12-31',
extent=None,
geojson=None,
mask='',
**kwargs):
The problem is not so severe since the default value of argument for the base command is 2100-12-31
parser.add_argument('--end',
action='store',
metavar='YYYY-MM-DD',
default='2100-12-31',
help='End of time range',)
Anyway it is probably should be fixed
Viewer should be able to search the databases by filtering them based on desired parameter
If you try to include geospaas
directly from the installed egg, the viewer won't work because it cannot find its templates.
This can apparently be fixed by adding a MANIFEST.in
file, or the package_data
parameter in the setup.py
file.
Here is the doc about this.
This happened when I executed ./manage.py migrate
from the shell.
This tutorial 'Using the Django test runner to test reusable applications' shows the commonly accepted directory structure for testing reusable applications.
Task:
To create testing environment following the tutorial.
Now by default a new version of Django 3.0 is installed in Docker. It conflicts with outdated version of django-overleaf.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.