cioos-siooc / ckan Goto Github PK

This project forked from ckan/ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.

Home Page: http://ckan.org/

License: Other

Dockerfile 27.75% Shell 72.25%

ckan's Introduction

CIOOS-SIOOC

CIOOS Wiki

ckan's People

Stargazers

Watchers

Forkers

hakaiinstitute mcuthill t-b-b adelle-pitsas

ckan's Issues

Improve error reporting during harvest

Errors returned during a CKAN spatial harvest are not very helpful. Errors are reported by the spatial harvester but also by scheming if a field fails schema validation.

an example is needed here...

Add distributorTransferOption to resource-locater spatial harvest field

in iso19139 the spatial harvester looked under distributor for transport options as well as directly under md_distribution. in the iso19115-3 implementation, the first case was not added. we should do that now so that we capture all transfer option metadata and correctly create resources in CKAN during harvest.

Preview ERDDAP recources in CKAN

Many resource formats are able to be previewed in CKAN. Given our high use of ERDDAP, it may be nice to also allow previewing of ERDDAP datasets.

Solution:

Could preview the lat/long/time of datasets easily as these fields are consistent across all datasets. could use existing geoview extension.
Could preview a limited number of rows (n = 100?) of datasets in table view

RA Review look and feel (footer / header) changes

Merge RA and National bug fixes back into core cioos code base

We have all made small changes to the CKAN code to make things work in the interest of time. It will benefit everyone if we can merge these changes back into the CIOOS CKAN repo's.

Solution:
Review national and RA specific code and merge changes back to base cioos code where appropriate.

Review ckan install docs

Review install doc and update with windows install instructions.

Typo in the datasets page template

Describe the bug
I'm not sure if this is right place to point out this issue, but on dataset pages, 'Temporal' is spelled 'Temportal'.

To Reproduce
Steps to reproduce the behavior:

Go to https://catalogue.cioos.ca/
Click on any dataset
Scroll down to 'Additional Info'
The field 'Temportal Extent' should be 'Temporal Extent'

Add citation to dataset page

Adding a citation example to each dataset page has been discussed for a while.

Solution:
Initially, a hard-coded citation in any format would be sufficient. Ultimately it would be nice to be able to provide users with citations in the format they want.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Citation example string
- Esip: https://esip.figshare.com/articles/Data_Citation_Guidelines_for_Earth_Science_Data_Version_2/8441816
Refworks/RIS file export Export a RIS file (example formats below)
- https://www.jstor.org/stable/26897827

Python options:

JS options:

https://citation.js.org/ and https://github.com/citation-js/replacer

Test harvesting of metadata form xml

document process in wiki!

Add how to review pull requests to wiki

Implement organization hide update in National and RA CKAN's

see cioos-siooc/ckanext-cioos_theme#36
and #60
for details

Review and improve development and deployment workflows

Currently to deploy CKAN one must pull the git repo and all its sub repositories, build CKAN which generates docker images and containers, then run the containers. This is error-prone and easy to get wrong.

Updating CKAN or submodules is a similar process in that one must pull all changes, copy CKAN or submodule changes to the volumes, update production.ini or other config files as needed, and restart the containers to pull changes into them.

While this allows for quite a bit of flexibility during development it is more cumbersome than needed for a simple deployment workflow.

Solution:
One way to address this would be to generate docker images that are pushed to docker-hub. These images would then be used directly to build docker containers for deployment situations. This approach would also improve our release workflow as it would be easy to pick a version of the image to use if needed.

Removing CIOOS CKAN fork in favour of CKAN core? more info needed...

Remove sub repositories? more info needed...

Alternatives:
???

Additional context

A docker hub CIOOS organization has been created here: https://hub.docker.com/orgs/cioos
existing repos can be found here: https://hub.docker.com/orgs/cioos/repositories

Point extent error during harvest

Describe the bug
During spatial harvest, a spatial value of point is transformed to a bbox so that solr can index correctly. This is reported as an error and should only be a warning or not reported at all.

Set default minimum zoom level in dataset map

Currently when looking at the location map in a dataset, if the spatial extent is a point or very small bbox then map zoom's so far in that only ocean is visible.

Describe the solution you'd like
Set initial default zoom level so that some coastline is visible. A ckan wide default setting would work but it would be nice if this was also overridable by a setting in the dataset json.

Review outstanding pull requests

There are quite a few outstanding pull requests in need of review or discussion.

Reorganize repos

Is your feature request related to a problem? Please describe.

The main repository is a forked repo of ckan, but our work doesn't match the use case of a fork in my opinion (contribution to the main repo or building a new product from an existing codebase).
Working with a forked repository will make it harder to upgrade with new CKAN releases.
The cioos ckan code is scattered accross several repositories: the main and submodules (some of them are third party repos, some of them are cioos). The current practice is to create a forked repository when an existing plugin needs to be customized. This practice creates too many repositories for just a few changes and complexify the project with no benefit in my opinion.

Describe the solution you'd like
Here is what the target could look like:
https://docs.google.com/presentation/d/1O9zf2TToXOPenB9OSGV-R72IgFusoGOoLwouei6PwWE/edit?usp=sharing

Only one repository for CIOOS CKAN common code (one plugin).
One repository per RA for RA spécific code (one plugin per RA to alter theme style or else)
A docker image based the ckan official codebase
A docker image based on the CIOOS CKAN code and overriding the previously mentioned image
A docker image per RA based on CKAN RA specific code and overriding the previously mentioned image

Review CKAN GUI and generate ideas for things to change

The current layout of the CKAN front end is either stock or somewhat clumsy. The focus initially was to produce a working prototype.

Solution:
Review existing interface and suggest improvments. The datasets page has never been reviewed and would be a good place to focus.

Alternatives:
Could focus on home page but this is likely to be more of a CIOOS wide discussion

Additional context:
Many of the components on the datasets page were added by Matt as they seemed to be used on other sites that looked good. In general, we should discuss if the overall focus is on a clean, low clutter, interface with more detail available on request (click on more for example). Or if we want to show as much detail as possible initially to users.

Gerneral ideas around what to chagne on dataset page

Add more items
Improve gui/ux
Citation example string
- Esip: https://esip.figshare.com/articles/Data_Citation_Guidelines_for_Earth_Science_Data_Version_2/8441816
Refworks/RIS file export Export a RIS file (example formats below)
- https://www.jstor.org/stable/26897827

Bilingualism is missing

CKAN Version if known (or site URL)

All

Please describe the expected behavior

Requesting a resource through API should return bilingual JSON data.

Please describe the actual behavior

When interacting with CKAN API, the requested URL returns unilingual JSON data.

What steps can be taken to reproduce the issue?

As an example, one could request

http://<ckan_instance>/api/3/action/package_show?id=

while changing values according to there setup.

Add config setting to add ra css to base template

ad 3rd css entry to bae template to pull in RA css file. add between cioos_theme.css and admin css entries

add contentUrl to schema.org jsonlp output

as discussed in pull request cioos-siooc/ckanext-cioos_theme#19, we should look into adding contentUrl attribute to dataset resources

Test Issue

GanttStart: 2019-06-10
GanttDue: 2016-06-15

CKAN Version if known (or site URL)

Please describe the expected behaviour

Please describe the actual behaviour

What steps can be taken to reproduce the issue?

Delete docker image tags on PR merge

to keep dockerhub organized it may be a good idea to delete tags when a pull request is merged into the default branch (master or cioos).

Log path error during ckan start

The touch commands in the CKAN entrypoint.sh file are failing. the ckan_log environment variable is misused and does not get populated. A potential fix is to create logs directory and touch log files in the Dockerfile.

RUN mkdir -p $CKAN_VENV/src/logs
RUN touch "$CKAN_VENV/src/logs/ckan_access.log"
RUN touch "$CKAN_VENV/src/logs/ckan_default.log"

Stress test SOLR index

We have not stress tested CKAN. With the possible addition of indexing xml files as well it would be good to know how CKAN and specificly SOLR will respond to a high dataset volume and high request load.

Solution:

generate a significant volume of datasets using randomly generated data
add xml files, also randomly generated
index
use Gatling to stress test
examine docker container performance and what is cached in CKAN or SOLR

change owner of i18n folder on container start

ckan db init may try to build translations. GSA/datagov-ckan-multi#380

On the first start of ckan container, the db init script will build translations as root but ckan runs as the ckan user so will not have access. a chown is needed to allow ckan to access these files after which the container will start.

for example /bin/bash chown -R ckan:ckan /usr/lib/ckan/venv/src/ckan/ckan/public/base/i18n

improve speed of xml indexing

currently, the indexer waits quite a long time for a response from the XML URL before timing out. Would be a good idea to lower the timeout so that XML URLs that are non-responsive do not dramatically slow down the indexing process.

Pull header and footer from wordpress into ckan in an automated way

Currently, the workflow for updating the CKAN header and footer is very manual. it involves scraping the WordPress site HTML and copying the relevant HTML into a CKAN template file. then adjusting CSS classes to match existing ones in CKAN. Inevitably there has been some small CSS change in WordPress that requires significant tweaking to the CKAN CSS to make it all look the same.

Solution:
find a way to pull menu changes and CSS from WordPress into CKAN. perhaps a shared CSS file that is used for both WordPress and CKAN sides of the header/footers? HTML would still need to be pulled across but this could also be a shared file. Would mean doing away with the mega-menu plugin in WordPress likely. That would work for pacific and Atlantic but may not work for OGSL

Alternatives:
???

Additional context:
???

review schema.org content

Currently, we include structured data as part of datasets via the dcat extension. This allows for more detailed google searches.

ERDDAP also supports schema.org. Is there something there we can leverage or include in the ckan datasets?

Format key missing error during harvest

geoview throws an error when a resource does not have a key format. We can hard code a default of the empty string so that the key will show up even though we may not always be able to guess the format. this happens with poorly formed URLs such as http://rgh. Only happens with test data so far.

error when indexing empty xml path

when running a search reindex, if there is no harvest object and the xml_metadata_url extra is empty the indexer throws an error. this situation happens if a dataset is created using the api, no harvest object, and xml url is not set. The code should check for an empty field and fail gracefully without error, perhaps show warnning.

Add search and index info to wiki

Xml indexing
what is in All text search
Querying for fields not in all text/stored in solr
Better document harvester config

Generate wiki page with field mappings

need a way to show how ckan schema, xml harvest, and yaml all map. Would be nice to include a description of what schema fields are intended for.

Depends on cioos-siooc/ckanext-cioos_theme#33

Add config setting to set contact email in page footer

This will allow for custom RA contact emails rather than hard-coding [email protected]

Improve ckan build time

Currently it is very slow to build ckan images.

Etienne suggested removing the extensions from the source when adding ckan to the docker images would allow that stage to be cached and improve build time. The extensions are already copied to the image at a later step so this is redundant code and needlessly bloats the image.

changed waf xml not updating during ckan harvest

re-harvesting of WAF metadata does not update a dataset even though it has been changed on the waf. Perhaps an issue with how waf_modified_date is populated?

reportedby Étienne.

improve error messages during xml indexing

currently when indexing a ckan dataset the following error message is displayed if the harvest object is not available or if it is not xml.

Unable to find harvest object "fe0eeab9-c445-46dc-8245-ca25fa9a2fd2" referenced by dataset "7a1e35d0-7784-4bb7-aef2-7ef85afc3b0d". Trying xml url

The error message should indicate if the harvest object is not found (not 200 response) or if it is JSON and thus not parsable by XML parser.

change Cited-responsible-party in harvester

Cited-responsible-party currently only pulls in a limited number of roles. should be all roles, do not limit by role name.

Update national footer css

trigger harvest remotely via API

If it were possible to trigger harvest remotely that would be handy, but otherwise a cron job is fine too. I took a quick look but didn't find it in the API

Review CKAN's reliance on jsonpdataproxy

When previewing CSV files stored on a WAF, CKAN uses an external service to get the data (jsonpdataproxy) It would be better if this was not the case.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

It takes 24 hours for jsonpdataprocy to update a view
makes CKAN reliant on an external service
examples and links:

Harvester failing to link dataset organization to already existing organization

The organisation https://cnckan.cioos.ca/organization/fisheries-and-oceans-canada was created when harvesting pacific datasets

The dataset https://cnckan.cioos.ca/dataset/c2c3e218-7704-48fa-8c18-327a5a88e28e has been harvested from atlantic but cannot be linked to https://cnckan.cioos.ca/organization/fisheries-and-oceans-canada because the organization guid it is using is different. The harvester assigned a default organization to this dataset.

change point of contact in schema

currently point of contact requires individual name. change to accept individual or organization name.

Content width

It would be better UX to have the same witdh for the CKAN content than in the WP, which currently has more blank on each sides

allow single quotes in keywords

currently, keywords only allow alphanumeric characters and spaces. need to allow single quotes as well.

Add layers to map in CKAN

Is there a way to add shapefiles, geojson, wms/wfs serveries to the map widget in CKAN other then the common basemap? It will be useful to do this for another project but could also be helpful to define RA extent for example. If possible it could also allow adding other features to help users orientate themselves on the map.

Test spatial harvest of national CKAN into CIOOS Atlantic CKAN

ONC not showing up in Responsible Organization filter in CIOOS Pacific/National CKAN

To Reproduce
The national and Pacific CIOOS catalogues aren't including ONC in the 'Responsible Organization' dataset search filter options (see screenshots).

How to debug python code in docker container

this blog may be all we need: https://hackernoon.com/debugging-using-pdb-in-dockerized-environment-i21n2863

Need some examples of how others expect to debug python code.

This may be how to do it using vscode: https://code.visualstudio.com/docs/containers/debug-common

Needs testing

Rework citation identifier, guid, and unique-resource-identifier(-full) fields

Currently, the Citation Identifier CKAN schema field and the guid harvester field are both derived from the unique-resource-identifier-full filed in the spatial harvester. it's xpath is mdb:metadataIdentifier/mcc:MD_Identifier which is associated with the id of the metadata record rather than a data resource. As such the following new mapping needes to be implomented:

CKAN	Spatial	xpath
name	guid	mdb:metadataIdentifier/mcc:MD_Identifier
Citation Identifier	unique-resource-identifier	mdb:identificationInfo/mri:MD_DataIdentification/mri:citation/cit:CI_Citation/cit:identifier/mcc:MD_Identifier

Note that mcc:MD_Identifier contains code, authority, description, version, and codespace. https://wiki.esipfed.org/MD_Identifier

review resource-locator field in spatial harvester

Consider reworking resource-locator to support the full distribution metadata including format. https://wiki.esipfed.org/MD_Distribution

cioos-siooc / ckan Goto Github PK

ckan's Introduction

CIOOS-SIOOC

ckan's People

Stargazers

Watchers

Forkers

ckan's Issues

CKAN Version if known (or site URL)

Please describe the expected behavior

Please describe the actual behavior

What steps can be taken to reproduce the issue?

CKAN Version if known (or site URL)

Please describe the expected behaviour

Please describe the actual behaviour

What steps can be taken to reproduce the issue?

Recommend Projects

Recommend Topics

Recommend Org