nansencenter / django-geo-spaas Goto Github PK

View Code? Open in Web Editor NEW

20.0 13.0 6.0 2.25 MB

GeoDjango apps for satellite data management in Geo-Scientific Platform as a Service

License: GNU General Public License v3.0

Python 95.61% HTML 2.42% CSS 0.57% Dockerfile 0.46% Shell 0.55% JavaScript 0.40%

geodjango geospaas

django-geo-spaas's People

Contributors

Stargazers

Watchers

Forkers

mdsnor mortenwh aperrin66 xxqhh adaniy omarshohid

django-geo-spaas's Issues

Why do we check for "Revision" in vocabularies.managers?

Revision does not seem to exist in the dict keys and the situations in geospaas.vocabularies.managers are not tested.

if 'Revision' in platform.keys():
    continue

I.e., the second line is not tested.

Check if this is really needed
Add unit tests for these situations

NEVER EVER DELETE MIGRATIONS

When the database models are changed we need all the migrations to keep databases intact.

In 83ff64a, the file geospaas/catalog/migrations/0002_auto_20160705_1331.py and other autogenerated migrations have been removed. That causes existing geo-spaas installations to fail.

Didn't you test? Remember to also test the dependent apps, like django-geo-spaas-svp-drifters.

This needs to be reverted and solved properly...

Please remember to also follow the Nansat conventions.

We should implement a thredds crawler ingestor command in geospaas.nansat_ingestor

Customized ingestion of nansat objects depending on specific mapper traits

My problem is related to scatterometer data but the solution could be relevant in other cases as well. In geo-spaas, a full scatterometer acquisition will always(?) overlap with other data since the dataset covers a full orbit. The solution in the nansat mapper (scatterometers.py) is to split the scene into four. An extra kwarg quartile is therefore added, and its default value is 0, i.e., the first quartile of the scene is opened.

If we want to add all quartiles to geospaas, we need to create a separate management command but we also need to ensure that this is run every time a scatterometer dataset is ingested. Otherwise we will only ingest the first quartile.

Solution: Ensure the dataset is correctly ingested by adding a new method, named after the nansat mapper, before regular nansat ingestion. The nansat ingestor should always try to run such methods before falling back to default.

Add Postgis support in the docker image

In order to support a Posgresql database, the psycopg2 package needs to be added to the docker image

pypi automation

Intermediate models DatasetParameter and VisualizationParameter seems to be obsolete

Intermediate models are needed only if there is extra information associated to the connection between two models. DatasetParameter and VisualizationParameter do not have such information and are therefore most likely obsolete.

Add validation of uri before saving DatasetURI objects

The URI generic syntax consists of a hierarchical sequence of five components:

URI = scheme:[//authority]path[?query][#fragment]

DatasetURI.uri is presently not validated. This could be a big problem if a lot of data is wrongly ingested. We should add some validation in the get_or_create method of this model:

Check that the generic uri syntax is followed
Check that the uri points to an actual resource
Correct any wrong uris already added to the database - perhaps using migrations?

OpenSearch client

Make opensearch queryset ability

Ingestors should also add parameters that are connected to the Dataset

When a new dataset is ingested, we should always add CF-variables to the parameters field. This would allow searching as, e.g.:

ds = Dataset.objects.filter(parameters__standard_name='wind_speed')

Important when the databases start to grow...

The nansat ingestor manager seems to fail in adding metadata

I am using the new ascat mapper in nansencenter/nansat@185ae9c - but the metadata is not registered when I ingest with the nansat_ingestor. I thought I had done it correctly. I'm taking holiday in 25 minutes. Would be good if @akorosov could look at it....

/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: entry_title is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: entry_id is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:76: UserWarning: summary is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: ISO_topic_category is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: gcmd_location is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/geospaas/nansat_ingestor/managers.py:86: UserWarning: data_center is not provided in Nansat metadata!
  warnings.warn('%s is not provided in Nansat metadata!' % name)
/home/vagrant/miniconda/lib/python2.7/site-packages/nansat/domain.py:567: UserWarning: > 180 deg correction to longitudes - disabled..
  warnings.warn("> 180 deg correction to longitudes - disabled..")

setup.py : wrong classifier

Add command LIST

Sometimes it is nice to have an overview of ingested files from command line. We can add a command 'list' or 'summary' or 'list_datasets' that simply prints the list of datasets for a given period, area, etc.
This command can extend the CommandBaseClass from #17

Some entries are not added by nansat_ingestor get_or_create, and some are hardcoded

Data center name is hardcoded to "NERSC". It should rather be
sname = pti.get_gcmd_provider('nersc')['Short_Name']
isocatname = 'Oceans' ->
isocatname = pti.get_iso19115_topic_category('oceans')['iso_topic_category']
gcmd_location should also be added

Since we are using try-except clauses for these cases, I am not sure how to best test this...

vagrant up fails

It seems that migration fails, but still it is possible to connect to the virtual machine.
After login to the machine, the new environment should be activated manually by the command "source activate py3django".
If some python packages were not properly installed, add them to the list in the file, "provisioning/roles/nansencenter.django/tests/conda_env_requirements.yml"

TASK [geospaas : geospaas | Run migrate on django-geo-spaas project] ***********
fatal: [geospaas]: FAILED! => {"changed": false, "cmd": "./manage.py migrate --noinput --settings=project.settings --pythonpath=/vagrant/project", "msg": "\n:stderr: Traceback (most recent call last):\n File "./manage.py", line 15, in \n execute_from_command_line(sys.argv)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line\n utility.execute()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/core/management/init.py", line 357, in execute\n django.setup()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/init.py", line 24, in setup\n apps.populate(settings.INSTALLED_APPS)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/apps/registry.py", line 112, in populate\n app_config.import_models()\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/django/apps/config.py", line 198, in import_models\n self.models_module = import_module(models_module_name)\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/importlib/init.py", line 126, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File "", line 994, in _gcd_import\n File "", line 971, in _find_and_load\n File "", line 955, in _find_and_load_unlocked\n File "", line 665, in _load_unlocked\n File "", line 678, in exec_module\n File "", line 219, in _call_with_frames_removed\n File "/vagrant/geospaas/nansat_ingestor/models.py", line 4, in \n from geospaas.nansat_ingestor.managers import DatasetManager\n File "/vagrant/geospaas/nansat_ingestor/managers.py", line 8, in \n from nansat.nansat import Nansat\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/init.py", line 37, in \n from nansat.domain import Domain\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/domain.py", line 23, in \n from nansat.tools import add_logger, initial_bearing, haversine, gdal, osr, ogr\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/nansat/tools.py", line 31, in \n from mpl_toolkits.basemap import Basemap\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/site-packages/mpl_toolkits/basemap/init.py", line 155, in \n pyproj_datadir = os.environ['PROJ_LIB']\n File "/home/vagrant/anaconda/envs/py3django/lib/python3.6/os.py", line 669, in getitem\n raise KeyError(key) from None\nKeyError: 'PROJ_LIB'\n", "path": "/home/vagrant/anaconda/envs/py3django/bin:/home/vagrant/anaconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games", "state": "absent", "syspath": ["/tmp/ansible_django_manage_payload_jSeB5z/ansible_django_manage_payload.zip", "/vagrant", "/usr/lib/python2.7", "/usr/lib/python2.7/plat-x86_64-linux-gnu", "/usr/lib/python2.7/lib-tk", "/usr/lib/python2.7/lib-old", "/usr/lib/python2.7/lib-dynload", "/usr/local/lib/python2.7/dist-packages", "/usr/lib/python2.7/dist-packages"]}
to retry, use: --limit @/vagrant/provisioning/site.retry

PLAY RECAP *********************************************************************
geospaas : ok=25 changed=2 unreachable=0 failed=1

Update catalog.model.Dataset structure for allowing external extention

Description

Sometimes it would be great to have additional fields in the database. For example, it could be polarization and pass for SAR data, drogue_lost_date for SVP-drifters, etc. Unfortunately, there is only one loophole/(available ~empty field) in a current model structure which allows you to push an additional information/metadata. This field is catalog.Dataset.summary. But using this field for accumulating additional metadata is a messy and not convenient solution.

To keep a generic structure of the db ingester models is used just as a proxy. Thus it is not possible to add additional fields to an ingester model without creating a whole new table.

Solution

I did not find an entirely consistent solution, yet. But I thought that it could be possible to add one single field to the catalog.models.Dataset which will allow you to bind the Dataset with any other ("external") table/model from outside (from an ingester). Thus, the generic structure of the geo-spaas will not change but will allow to tune external tables and add additional metadata from ingesters

relation between dataset and parameter table should be included in tests

There is a many to many relationship between dataset table and the parameter table. Some tests should be written in order to test the filtering ability of datasets based on specifying the parameters of them. This is working in the Django shell and the tests should assert that this filtering is ok. Thus, tests should consider below notes:

ingest or ingest_thredds_crawl command should be used to ingesting a new dataset with nansat
we need to mock the crawl function for this purpose if ingest_thredds_crawl command is used
we need to mock the Nansat object
assert that the filtering ability is working with such a command the assertion line of the test:

ds = Dataset.objects.filter(parameters__standard_name='wind_speed')

For this purpose, below python code ( as snippet, not the exact code ) can be used in a separate file:

import os

import django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.project.settings")
django.setup()

import geospaas.nansat_ingestor.management.commands.ingest as ingest
command = ingest.Command()
command.handle(files=["/data/MER_FRS_1PNPDK20120303_093810_000000333112_00180_52349_3561.N1"], nansat_option=[])



import geospaas.nansat_ingestor.management.commands.ingest_thredds_crawl as ingest_thredds_crawl
command = ingest_thredds_crawl.Command()
command.handle(url=["http://nbstds.met.no/thredds/catalog/NBS/S2A/2017/01/catalog.html"],date = '2017/01/10')

Build image automatically on TravisCI

Similar to nansencenter/nansat#450

installing geo-spaas-vagrant and error message

What is this error message I face when installing geo-spaas-vagrant:

"Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again."

Thanks

platform.short_name and instrument.short_name cannot be empty

platform.short_name and instrument.short_name are used as natural_key in Source and cannot be empty.
But for level-4 data the short_names are empty because higher level GCMD platform and GCMD instrument must be used (e.g. ACTIVE REMOTE SENSING).
So far a hack is introduced in Nansat for GLOBCURRENT data: mapper_opendap_globcurrent sets instrument and platform to Jason-1.

Add ProcessingBaseCommand(BaseCommand)

May processing commands run processing for a given period of time, given area and given file types.
Each of such commands must first find matching files in the database.
We can add a base abstract class ProcessingBaseCommand(BaseCommand) that defines CLI arguments for these basic filtering and performs filtering. Then processing command can extend this class.
Such class can be added e.g. to geospaas.catalog.utils

issue in ingest_thredds_crawl when run from a test script with local database (not the test database)

when we want to create parameters for latitude the get command raise the get() returned more than one Parameter -- it returned 2! exception in this line:

django-geo-spaas/geospaas/nansat_ingestor/managers.py

Line 137 in f18160a

pp = Parameter.objects.get(standard_name=band_meta['standard_name'])

It is because we have several latitude parameter that we have in the installed database from the build_container.sh script which is run in the our database.

It is also good to check that the same problem is not occur in the longitude parameter as well.

Source.specs should not be set to metadata entry_title

Why?
It duplicates information
It contradicts with unique_together = [platform, instrument] in the Source model

In nansat_ingestor source.specs is set to n.get_metadata('entry_title'). But that raises error (something like test UNIQUE criteria error) when adding datasets from the same sources but with different entry_title.

Allow fast update of vocabularies

Update of vocabularies is quite slow: pythesint is forced to update its JSON files first. Maybe an option can be added to vocabularies.update_vcabularies that allows to get metadata from the already existing JSON files, e.g.:

./manage.py update_vocabularies fast=True

Add Vagrantfile

Individual Vagrantfile for each repository in combination with small, modular ansible roles (also in separate repos) can benefit faster development.

Tasks

Create Vagrant file and a playbook that install miniconda, django, gdal, spatialite (using Galaxy)
Create role 'ansible-geospaas-project' which:
- starts a Django project
- modifies settings to use spatialite (to be replaced later by customizable backend),
- adds geospaas.catalog, .vocabularies, .viewer
- runs update_vocabularies
Add 'ansible-geospaas-project' to Galaxy

Several models have no uniqueness constraint

Problem

Several models, like Platform and Instrument, have no uniqueness constraint (a natural_key() method does not guarantee uniqueness of the object).

This makes them unsafe in a concurrency context, for example multi-threading. The following piece of code inserts three platforms instead of one:

def create_platform():
    _, _ = geospaas.vocabularies.models.Platform.objects.get_or_create({
        'Category': 'test',
        'Series_Entity': 'test',
        'Short_Name': 'test',
        'Long_Name':'test'
    })

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    for i in range(3):
        executor.submit(create_platform)

The models I have found which are concerned by this issue are:

all models in the vocabularies app
GeographicLocation
Personnel
Role

Possible solutions

Unique constraint

A pretty straightforward solution is to add a unique=True parameter on one of the fields, or a unique_together constraint in the Meta of the Model class. This would be the simplest solution to integrate with the existing model.

For the vocabularies models, we could do either of those:

make short_name
make long_name long_name unique
put all or part of the attributes in a unique_together constraint

Make one field the primary key

This might be overkill and not really necessary since a primary key is automatically generated by Django.

Bug in ingest_thredds_crawl

The ingest_thredds_crawl command ingests datasets available on thredds. We're interested in all services, not only opendap. However, now the opendap is ingested before the other services, using the default name and service specified in the DatasetURI model. This results in an IntegrityError, which has been ignored. Rather than doing it like this, the opendap uri should be added with the correct attributes. Pretty stupid error which can be solved by just changing the input to the dataset get_or_create method...

Makemigrations should not be done in provisioning

It is important to keep migrations under proper version control. Therefore, makemigrations should be run manually and added with git instead of this being done in the vm provisioning.

Bug in utils.utils.validate_uri causes all ingestion of remote data to fail

This code from commit ba6d44e is obviously wrong (what if the scheme is http?):

uri_parts = urlparse(uri)
if uri_parts.scheme == 'file' and uri_parts.netloc == 'localhost' and len(uri_parts.path) > 0:
    return True
else:
    raise ValueError('Invalid URI: %s' % uri)

Reproject GCPs before getting geometry in nansat ingestor

Sometimes geometry looks really bad (especially fir high latitude scenes and also close to dateline). It is grealty improved when GCPs are first reprojected.

Switch to geojson in viewer

So far the viewer prints coordinates of displayed polygons into HTML. If there are many polygons that is very slow. Better to add polygons using ajax and a micro-service that generates polygons.

Add microservice that serializes geojson for given area, time, source, (type?)
Replace JavaScript to show geojson using AJAX.

Current implementation in sea-ice-drifters may help
The microservice can be reused e.g. in jupyter notebooks.

Add Dockerfile

Speedup provisioning by using docker containers

Ingestor for in-situ stationary datasets

I have made a couple of apps for in-situ data management, e.g.:

django-geo-spaas-metno-observation-stations
django-geo-spaas-metno-buoys
django-geo-spaas-noaa-ndbc

In addition, we have in-situ drifters etc.;

django-geo-spaas-ais
django-geo-spaas-argo-floats
django-geo-spaas-svp-drifters

There is some repetitive code in the managers of those apps, and I think we could benefit from making a parent manager in django-geo-spaas.

Switch to Django 3

So far Dockerfile has django==2.2. Otherwise two errors are raised by testing:

ERROR: test_search_loads (geospaas.viewer.tests.FormAndViewTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/django/template/defaulttags.py", line 1021, in find_library
    return parser.libraries[name]
KeyError: 'staticfiles'
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
django.template.exceptions.TemplateSyntaxError: 'staticfiles' is not a registered tag library. Must be one of:
admin_list
admin_modify
admin_urls
bootstrap_tags
cache
i18n
l10n
leaflet_tags
log
static
tz



======================================================================
ERROR: geospaas.catalog.tests (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: geospaas.catalog.tests
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/opt/conda/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/src/geospaas/catalog/tests.py", line 10, in <module>
    from django.utils.six import StringIO
ModuleNotFoundError: No module named 'django.utils.six'

But Django is evolving and so should we.

Remove django==2.2 in Dockerfile
Fix errors that appear in unittests
Make sure new code is backward compatible with Django 2.2

Travis CI Build failing

Tests currently fail in all branches... I will look into it

Update repos to Django v1.11

Nansat_ingestor adds negative lon/lat when crossing dateline

Problem

For the border points in the western hemisphere returns negative longitue values.
If a datasets crosses dateline (180E) then border contains high positive values (near 180) and high negative values (near -180). The border gets distorted and it is impossible to correctly search for it.

Solution

Add option to Domain.get_border() to fix longitude values if dataset crosses dateline.
To call Nansat with this option from nansat_ingestor.

Documentation

It is starting to be important with some documentation for this package. Sections to be added and filled out are (please edit and add if anything is missing):

User documentation - Installation instructions
User documentation - tutorials
Developer documentation - conventions (mainly a link to nansat conventions..)
Developer documentation - how to extend django-geo-spaas

Since these additions do not affect the working Python code, we do not need to create an issue branch for this.

Validation of model fields before saving

By default, Django does not validate model fields before saving. This can, however, be done using the full_clean() method.

Consider if we should validate model fields (in particular the Dataset model) when new instances are created (I have added a validator to the entry_id-field)
Implement model validation before save() (possibly using signals)

This repository should only contain code for basic functionality

Processing systems for SAR, NOAA NDBC buoys, Lance buoys, GNSSR, HAB, ASCAT wind, and AIS should be moved to separate repositories.

Create DOI for geo-spaas

The proper citation is required for many journals now.

Update legal information (authors, license, etc.)
Make the first release
Publish Geo-SPaaS using Zenodo
Update the README.md with the acquired DOI

The Dataset model lacks a field entry_id

The entry_ID should be added to our Dataset model. It seems to be the right place to put, e.g., the station identification of a NOAA weather buoy with a common prefix. The selection of an entry_ID depends on how we define our datasets. For example, for the SVP drifter datasets we define a dataset per drifter in 10(?) day intervals. Thus, the entry_ID should be carefully defined in each case...

Add field entry_id to Dataset
Write tests to ensure it is working
Make migrations
Add code to automatically set unique entry_id in the nansat_ingestor
Make sure all ingestors add unique entry_id's
Add description to docs
Add migrations in all apps to update tables with correct entry_id's

Incorrect default value of end parameter in find_datasets method

A default value of the end parameter is 2010-12-31 and it is too early

    def find_datasets(self, start='1900-01-01',
                            end='2010-12-31',
                            extent=None,
                            geojson=None,
                            mask='',
                            **kwargs):

The problem is not so severe since the default value of argument for the base command is 2100-12-31

        parser.add_argument('--end',
                            action='store',
                            metavar='YYYY-MM-DD',
                            default='2100-12-31',
                            help='End of time range',)

Anyway it is probably should be fixed

search-ability-for-paramter

Viewer should be able to search the databases by filtering them based on desired parameter

Template files are not included in the setuptools package

If you try to include geospaas directly from the installed egg, the viewer won't work because it cannot find its templates.
This can apparently be fixed by adding a MANIFEST.in file, or the package_data parameter in the setup.py file.
Here is the doc about this.

pin down older version of django in Dockerfile

nansencenter / django-geo-spaas Goto Github PK

django-geo-spaas's People

Contributors

Stargazers

Watchers

Forkers

django-geo-spaas's Issues

Description

Solution

Tasks

Problem

Possible solutions

Unique constraint

Make one field the primary key

Problem

Solution

Recommend Projects

Recommend Topics

Recommend Org