Git Product home page Git Product logo

cioos-siooc / ckan Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ckan/ckan

2.0 2.0 4.0 182.77 MB

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.

Home Page: http://ckan.org/

License: Other

Dockerfile 27.75% Shell 72.25%

ckan's Introduction

ckan's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ckan's Issues

Improve error reporting during harvest

Errors returned during a CKAN spatial harvest are not very helpful. Errors are reported by the spatial harvester but also by scheming if a field fails schema validation.

an example is needed here...

Add distributorTransferOption to resource-locater spatial harvest field

in iso19139 the spatial harvester looked under distributor for transport options as well as directly under md_distribution. in the iso19115-3 implementation, the first case was not added. we should do that now so that we capture all transfer option metadata and correctly create resources in CKAN during harvest.

Preview ERDDAP recources in CKAN

Many resource formats are able to be previewed in CKAN. Given our high use of ERDDAP, it may be nice to also allow previewing of ERDDAP datasets.

Solution:

  • Could preview the lat/long/time of datasets easily as these fields are consistent across all datasets. could use existing geoview extension.
  • Could preview a limited number of rows (n = 100?) of datasets in table view

Merge RA and National bug fixes back into core cioos code base

We have all made small changes to the CKAN code to make things work in the interest of time. It will benefit everyone if we can merge these changes back into the CIOOS CKAN repo's.

Solution:
Review national and RA specific code and merge changes back to base cioos code where appropriate.

Typo in the datasets page template

Describe the bug
I'm not sure if this is right place to point out this issue, but on dataset pages, 'Temporal' is spelled 'Temportal'.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://catalogue.cioos.ca/
  2. Click on any dataset
  3. Scroll down to 'Additional Info'
  4. The field 'Temportal Extent' should be 'Temporal Extent'

Add citation to dataset page

Adding a citation example to each dataset page has been discussed for a while.

Solution:
Initially, a hard-coded citation in any format would be sufficient. Ultimately it would be nice to be able to provide users with citations in the format they want.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Python options:

JS options:

Review and improve development and deployment workflows

Currently to deploy CKAN one must pull the git repo and all its sub repositories, build CKAN which generates docker images and containers, then run the containers. This is error-prone and easy to get wrong.

Updating CKAN or submodules is a similar process in that one must pull all changes, copy CKAN or submodule changes to the volumes, update production.ini or other config files as needed, and restart the containers to pull changes into them.

While this allows for quite a bit of flexibility during development it is more cumbersome than needed for a simple deployment workflow.

Solution:
One way to address this would be to generate docker images that are pushed to docker-hub. These images would then be used directly to build docker containers for deployment situations. This approach would also improve our release workflow as it would be easy to pick a version of the image to use if needed.

Removing CIOOS CKAN fork in favour of CKAN core? more info needed...

Remove sub repositories? more info needed...

Alternatives:
???

Additional context

Point extent error during harvest

Describe the bug
During spatial harvest, a spatial value of point is transformed to a bbox so that solr can index correctly. This is reported as an error and should only be a warning or not reported at all.

Set default minimum zoom level in dataset map

Currently when looking at the location map in a dataset, if the spatial extent is a point or very small bbox then map zoom's so far in that only ocean is visible.

Describe the solution you'd like
Set initial default zoom level so that some coastline is visible. A ckan wide default setting would work but it would be nice if this was also overridable by a setting in the dataset json.

Reorganize repos

Is your feature request related to a problem? Please describe.

  • The main repository is a forked repo of ckan, but our work doesn't match the use case of a fork in my opinion (contribution to the main repo or building a new product from an existing codebase).
    Working with a forked repository will make it harder to upgrade with new CKAN releases.
  • The cioos ckan code is scattered accross several repositories: the main and submodules (some of them are third party repos, some of them are cioos). The current practice is to create a forked repository when an existing plugin needs to be customized. This practice creates too many repositories for just a few changes and complexify the project with no benefit in my opinion.

Describe the solution you'd like
Here is what the target could look like:
https://docs.google.com/presentation/d/1O9zf2TToXOPenB9OSGV-R72IgFusoGOoLwouei6PwWE/edit?usp=sharing

  • Only one repository for CIOOS CKAN common code (one plugin).
  • One repository per RA for RA spécific code (one plugin per RA to alter theme style or else)
  • A docker image based the ckan official codebase
  • A docker image based on the CIOOS CKAN code and overriding the previously mentioned image
  • A docker image per RA based on CKAN RA specific code and overriding the previously mentioned image

Review CKAN GUI and generate ideas for things to change

The current layout of the CKAN front end is either stock or somewhat clumsy. The focus initially was to produce a working prototype.

Solution:
Review existing interface and suggest improvments. The datasets page has never been reviewed and would be a good place to focus.

Alternatives:
Could focus on home page but this is likely to be more of a CIOOS wide discussion

Additional context:
Many of the components on the datasets page were added by Matt as they seemed to be used on other sites that looked good. In general, we should discuss if the overall focus is on a clean, low clutter, interface with more detail available on request (click on more for example). Or if we want to show as much detail as possible initially to users.

Gerneral ideas around what to chagne on dataset page

Bilingualism is missing

CKAN Version if known (or site URL)

All

Please describe the expected behavior

Requesting a resource through API should return bilingual JSON data.

Please describe the actual behavior

When interacting with CKAN API, the requested URL returns unilingual JSON data.

What steps can be taken to reproduce the issue?

As an example, one could request

http://<ckan_instance>/api/3/action/package_show?id=

while changing values according to there setup.

Test Issue

GanttStart: 2019-06-10
GanttDue: 2016-06-15

CKAN Version if known (or site URL)

Please describe the expected behaviour

Please describe the actual behaviour

What steps can be taken to reproduce the issue?

Log path error during ckan start

The touch commands in the CKAN entrypoint.sh file are failing. the ckan_log environment variable is misused and does not get populated. A potential fix is to create logs directory and touch log files in the Dockerfile.

RUN mkdir -p $CKAN_VENV/src/logs
RUN touch "$CKAN_VENV/src/logs/ckan_access.log"
RUN touch "$CKAN_VENV/src/logs/ckan_default.log"

Stress test SOLR index

We have not stress tested CKAN. With the possible addition of indexing xml files as well it would be good to know how CKAN and specificly SOLR will respond to a high dataset volume and high request load.

Solution:

  • generate a significant volume of datasets using randomly generated data
  • add xml files, also randomly generated
  • index
  • use Gatling to stress test
  • examine docker container performance and what is cached in CKAN or SOLR

change owner of i18n folder on container start

ckan db init may try to build translations. GSA/datagov-ckan-multi#380

On the first start of ckan container, the db init script will build translations as root but ckan runs as the ckan user so will not have access. a chown is needed to allow ckan to access these files after which the container will start.

for example /bin/bash chown -R ckan:ckan /usr/lib/ckan/venv/src/ckan/ckan/public/base/i18n

improve speed of xml indexing

currently, the indexer waits quite a long time for a response from the XML URL before timing out. Would be a good idea to lower the timeout so that XML URLs that are non-responsive do not dramatically slow down the indexing process.

Pull header and footer from wordpress into ckan in an automated way

Currently, the workflow for updating the CKAN header and footer is very manual. it involves scraping the WordPress site HTML and copying the relevant HTML into a CKAN template file. then adjusting CSS classes to match existing ones in CKAN. Inevitably there has been some small CSS change in WordPress that requires significant tweaking to the CKAN CSS to make it all look the same.

Solution:
find a way to pull menu changes and CSS from WordPress into CKAN. perhaps a shared CSS file that is used for both WordPress and CKAN sides of the header/footers? HTML would still need to be pulled across but this could also be a shared file. Would mean doing away with the mega-menu plugin in WordPress likely. That would work for pacific and Atlantic but may not work for OGSL

Alternatives:
???

Additional context:
???

review schema.org content

Currently, we include structured data as part of datasets via the dcat extension. This allows for more detailed google searches.

ERDDAP also supports schema.org. Is there something there we can leverage or include in the ckan datasets?

Format key missing error during harvest

geoview throws an error when a resource does not have a key format. We can hard code a default of the empty string so that the key will show up even though we may not always be able to guess the format. this happens with poorly formed URLs such as http://rgh. Only happens with test data so far.

error when indexing empty xml path

when running a search reindex, if there is no harvest object and the xml_metadata_url extra is empty the indexer throws an error. this situation happens if a dataset is created using the api, no harvest object, and xml url is not set. The code should check for an empty field and fail gracefully without error, perhaps show warnning.

Improve ckan build time

Currently it is very slow to build ckan images.

Etienne suggested removing the extensions from the source when adding ckan to the docker images would allow that stage to be cached and improve build time. The extensions are already copied to the image at a later step so this is redundant code and needlessly bloats the image.

improve error messages during xml indexing

currently when indexing a ckan dataset the following error message is displayed if the harvest object is not available or if it is not xml.

Unable to find harvest object "fe0eeab9-c445-46dc-8245-ca25fa9a2fd2" referenced by dataset "7a1e35d0-7784-4bb7-aef2-7ef85afc3b0d". Trying xml url

The error message should indicate if the harvest object is not found (not 200 response) or if it is JSON and thus not parsable by XML parser.

trigger harvest remotely via API

If it were possible to trigger harvest remotely that would be handy, but otherwise a cron job is fine too. I took a quick look but didn't find it in the API

Review CKAN's reliance on jsonpdataproxy

When previewing CSV files stored on a WAF, CKAN uses an external service to get the data (jsonpdataproxy) It would be better if this was not the case.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Harvester failing to link dataset organization to already existing organization

The organisation https://cnckan.cioos.ca/organization/fisheries-and-oceans-canada was created when harvesting pacific datasets

The dataset https://cnckan.cioos.ca/dataset/c2c3e218-7704-48fa-8c18-327a5a88e28e has been harvested from atlantic but cannot be linked to https://cnckan.cioos.ca/organization/fisheries-and-oceans-canada because the organization guid it is using is different. The harvester assigned a default organization to this dataset.

Content width

It would be better UX to have the same witdh for the CKAN content than in the WP, which currently has more blank on each sides

Add layers to map in CKAN

Is there a way to add shapefiles, geojson, wms/wfs serveries to the map widget in CKAN other then the common basemap? It will be useful to do this for another project but could also be helpful to define RA extent for example. If possible it could also allow adding other features to help users orientate themselves on the map.

Rework citation identifier, guid, and unique-resource-identifier(-full) fields

Currently, the Citation Identifier CKAN schema field and the guid harvester field are both derived from the unique-resource-identifier-full filed in the spatial harvester. it's xpath is mdb:metadataIdentifier/mcc:MD_Identifier which is associated with the id of the metadata record rather than a data resource. As such the following new mapping needes to be implomented:

CKAN Spatial xpath
name guid mdb:metadataIdentifier/mcc:MD_Identifier
Citation Identifier unique-resource-identifier mdb:identificationInfo/mri:MD_DataIdentification/mri:citation/cit:CI_Citation/cit:identifier/mcc:MD_Identifier

Note that mcc:MD_Identifier contains code, authority, description, version, and codespace. https://wiki.esipfed.org/MD_Identifier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.