Git Product home page Git Product logo

cancerdataexpo's People

Contributors

nutjob4life avatar yuliujpl avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cancerdataexpo's Issues

Support Project Scientist and Program Officer

According to Jackie Dahlgren in <MW3PR11MB4617C5475637324BFFB90006C2A8A@MW3PR11MB4617.namprd11.prod.outlook.com>, there are two new roles appearing in the SOAP XML API for the Committee_Membership operation:

  • Project Scientist
  • Program Officer

Current the CancerDataExpo treats these as mere "members", but they should have distinguished roles in the RDF.

Support DataSharingPolicyDT slot on Site() API

BW said:

In our TEST region, Perdy, both the Registered_Person() with new fields InterestNameList/ InterestDescList and Site() with new field DataSharingPolicyDT are available for you to test. You’ve previously tested there before, so hopefully you’ve still got access to that connection point and everything tests out smoothly. Let us know.

GenericSetup error

Buildout out results in:

Got Products.PloneLDAP 1.2.
Version and requirements information containing products.genericsetup:
  [versions] constraint on products.genericsetup: 1.7.7
  Requirement of plone.app.ldap: Products.GenericSetup>=1.8.2
  Requirement of plone.app.dexterity: Products.GenericSetup
  Requirement of plone.app.caching: Products.GenericSetup
  Requirement of Products.CMFPlone: Products.GenericSetup>=1.4
  Requirement of Products.CMFPlacefulWorkflow: Products.GenericSetup
While:
  Installing zope.
Error: The requirement ('Products.GenericSetup>=1.8.2') is not allowed by your [versions] constraint (1.7.7)

Pinning Products.GenericSetup to 1.8.2 might help.

LabCAS Collaborative Group inconsistency

Map LabCAS's inconsistent naming of collaborative groups to the official group names.

The names currently in EDRN LabCAS Solr are:

  • Breast and Gynecologic (missing "Cancers Research Group") ❌
  • Breast/GYN ❌
  • GI and Other Associated (missing periods, "Cancers Research Group") ❌
  • Lung and Upper Aerodigestive Cancers Research Group ✅
  • Lung and Upper Aerodigestive (missing "Cancers Research Group") ❌
  • Lung and Upper Areodigestive (misspelled "aerodigestive", missing words) ❌
  • Not Applicable (not a collaborative group) ❌
  • Prostate and Urologic (missing "Cancers Research Group") ❌
  • TBD (not a collaborative group) ❌

The official names are:

  • Breast and Gynecologic Cancers Research Group
  • G.I. and Other Associated Cancers Research Group
  • Lung and Upper Aerodigestive Cancers Research Group
  • Prostate and Urologic Cancers Research Group

See also EDRN/labcas-backend#3.

RDF for Protocols has incorrect cancerType

The RDF for protocols from the CancerDataExpo looks like this:

  <ns2:Protocol rdf:about="http://edrn.nci.nih.gov/data/protocols/288">
        …
        <ns1:cancerType>174, 182, 183, 182                                                                                  </ns1:cancerType>
        …

That cancerType is useless; it should be an rdf:resource to the corresponding Disease object.

Collaborative Group Filter

In LabCAS, we need to map these to proper values:

  • Breast and Gynecologic (missing "Cancers Research Group") ❌
  • Breast/GYN ❌
  • GI and Other Associated (missing periods, "Cancers Research Group") ❌
  • Lung and Upper Aerodigestive (missing "Cancers Research Group") ❌
  • Lung and Upper Areodigestive (misspelled "aerodigestive", missing words) ❌
  • Prostate and Urologic (missing "Cancers Research Group") ❌

And these should drop the collaborative group predicates:

  • Not Applicable (not a collaborative group) ❌
  • TBD (not a collaborative group) ❌

Old Plone Hotfixes prevent running

A number of old Plone hotfixes are installed; these prevent HTTP requests from being serviced:

AttributeError: 'module' object has no attribute 'decode_htmlentity

Removing the hotfixes from the Zope instance's eggs list helps. However we should ascertain if there are newer and/or better hotfixes that go with the version of Plone used in the CancerDataExpo.

BioMuta data uses incorrect object class

Apparently when David added BioMuta data he reused the object class for Biomarkers, i.e., http://edrn.nci.nih.gov/rdf/rdfs/bmdb-1.0.0#Biomarker. But that's the class for actual biomarkers, not mutation data.

The new portal mixes all the RDF into a single statement database which results in mutations and biomarkers being treated the same, and it cannot ingest mutations as biomarkers.

A better object class would be urn:edrn:types:biomarkers:mutation.

Cull Old RDF

The CancerDataExpo app saves every single RDF file it generates whenever its upstream data changes. While this is nice in theory, there's probably no use for reviewing a list of protocols from 2017 when the ones in 2020 are authoritative.

The app should automatically archive or just outright delete older RDF.

Dockerize

What it says: use Docker to "containerize" the application and make a Docker Composition so we don't need to rely on sysadmins to participate in deployment of this.

Consortium FIlter

For LabCAS RDF generation, filter on the "Consortium" field from Solr; include "EDRN" only.

Cannot buildout

A new buildout of this software fails trying to find distributions for z3c.recipe.staticlxml. In addition, a pinned setuptools prevents it from working, plus it's missing an unpinned zc.buildout.

Adding these allow-hosts seems to help:

[buildout]
allow-hosts +=
    oodt.jpl.nasa.gov
    pypi.fury.io
    *.githubusercontent.com
    *.github.com
    *.python.org
    *.plone.org
    launchpad.net
    files.pythonhosted.org
    pypi.org
    effbot.org

In addition, these version unpins are needed:

[versions]
setuptools =
zc.buildout =

Furthermore, these version pins are needed

[versions]
biopython = 1.66
Products.GenericSetup = 1.8.2
plone.recipe.zope2instance = 4.4.1
Products.LDAPUserFolder = 2.27

Need separate DB server

Currently, the CancerDataExpo uses a single Zope instance to as both app server and database server. This means we cannot do database maintenance (specifically packing the database) without also bringing down the app server.

The application should use the separate Zope instance as app server that uses a Zope Enterprise Objects (ZEO) instance as database server.

This will also let us do daily database backups.

Provide more information on LabCAS to the portal

The issue EDRN/P5#102 requests that the same graphs that appear on EDRN LabCAS also appear on the portal. However, the RDF from LabCAS produced by the CancerDataExpo doesn't include the additional information necessary to produce these graphs, specifically

  • Number of datasets
  • Number of files

In addition, it may be more efficient to have the CancerDataExpo produce the graphs that the portal can then ingest for rapid display; this could happen too in this issue.

LabCAS RDF needs additional fields

In order to support statistical charting in the new Wagtail portal, LabCAS RDF generation needs these predicates:

  • discipline (proteomics, genomics…)
  • data category (mass spectrometry, DNA microarray…)

LabCAS Collection numbers incorrect

The LabCAS RDF generator produces a statement

  <ns2:statistics rdf:about="https://edrn-labcas.jpl.nasa.gov/data-access-api/collections">
    <ns1:cardinality>44</ns1:cardinality>
  </ns2:statistics>

but then has just 36 <ns2:collection> objects in its output.

The reason is that it asks Solr for the count of all collections and uses that for the <ns2:statistics> object. It then iterates over the collections and dumps all non-EDRN consortium collections. That's why we end up with 36 < 44. We should constrain the searches to just Consortium:EDRN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.