edrn / cancerdataexpo Goto Github PK
View Code? Open in Web Editor NEWBuildout for the EDRN backend data application server we affectionately call the CancerDataExpo
Home Page: https://edrn.jpl.nasa.gov/cancerdataexpo
License: Apache License 2.0
Buildout for the EDRN backend data application server we affectionately call the CancerDataExpo
Home Page: https://edrn.jpl.nasa.gov/cancerdataexpo
License: Apache License 2.0
The buildout of this software fails with an error:
TypeError: 'Version' object has no attribute '__getitem__'
The issue seems to be an older version of plone.recipe.zope2instance
incompatible with a later setuptools
. Pinning to 4.4.1 might help.
For more info, see: https://community.plone.org/t/buildout-typeerror-version-object-has-no-attribute-getitem/6607
According to Jackie Dahlgren in <MW3PR11MB4617C5475637324BFFB90006C2A8A@MW3PR11MB4617.namprd11.prod.outlook.com>
, there are two new roles appearing in the SOAP XML API for the Committee_Membership
operation:
Project Scientist
Program Officer
Current the CancerDataExpo treats these as mere "members", but they should have distinguished roles in the RDF.
BW said:
In our TEST region, Perdy, both the Registered_Person() with new fields InterestNameList/ InterestDescList and Site() with new field DataSharingPolicyDT are available for you to test. You’ve previously tested there before, so hopefully you’ve still got access to that connection point and everything tests out smoothly. Let us know.
Buildout out results in:
Got Products.PloneLDAP 1.2.
Version and requirements information containing products.genericsetup:
[versions] constraint on products.genericsetup: 1.7.7
Requirement of plone.app.ldap: Products.GenericSetup>=1.8.2
Requirement of plone.app.dexterity: Products.GenericSetup
Requirement of plone.app.caching: Products.GenericSetup
Requirement of Products.CMFPlone: Products.GenericSetup>=1.4
Requirement of Products.CMFPlacefulWorkflow: Products.GenericSetup
While:
Installing zope.
Error: The requirement ('Products.GenericSetup>=1.8.2') is not allowed by your [versions] constraint (1.7.7)
Pinning Products.GenericSetup to 1.8.2 might help.
Map LabCAS's inconsistent naming of collaborative groups to the official group names.
The names currently in EDRN LabCAS Solr are:
The official names are:
See also EDRN/labcas-backend#3.
The RDF for protocols from the CancerDataExpo looks like this:
<ns2:Protocol rdf:about="http://edrn.nci.nih.gov/data/protocols/288">
…
<ns1:cancerType>174, 182, 183, 182 </ns1:cancerType>
…
That cancerType
is useless; it should be an rdf:resource
to the corresponding Disease
object.
In LabCAS, we need to map these to proper values:
And these should drop the collaborative group predicates:
A number of old Plone hotfixes are installed; these prevent HTTP requests from being serviced:
AttributeError: 'module' object has no attribute 'decode_htmlentity
Removing the hotfixes from the Zope instance's eggs
list helps. However we should ascertain if there are newer and/or better hotfixes that go with the version of Plone used in the CancerDataExpo.
Apparently when David added BioMuta data he reused the object class for Biomarkers, i.e., http://edrn.nci.nih.gov/rdf/rdfs/bmdb-1.0.0#Biomarker
. But that's the class for actual biomarkers, not mutation data.
The new portal mixes all the RDF into a single statement database which results in mutations and biomarkers being treated the same, and it cannot ingest mutations as biomarkers.
A better object class would be urn:edrn:types:biomarkers:mutation
.
Grab a copy of https://edrn.jpl.nasa.gov/cancerdataexpo/rdf-data/registered-person/@@rdf
Look at, for example, Dean Brenner's photo URI:
http://edrn.jpl.nasa.gov/dmcc/staff-photographs/piPhoto67.gif
That URI is a URL and it is not 404 not found. It should be found. The correct URI is
https://edrn.jpl.nasa.gov/cancerdataexpo/staff-photographs/piphoto67.gif/@@images/image.gif
DMCC has added a new API called MemberGroup to their SOAP API.
We will need to convert from proprietary SOAP to neutral RDF.
Like it says
The CancerDataExpo app saves every single RDF file it generates whenever its upstream data changes. While this is nice in theory, there's probably no use for reviewing a list of protocols from 2017 when the ones in 2020 are authoritative.
The app should automatically archive or just outright delete older RDF.
The SOAP API at the DMCC is being upgraded so that the Registered_Person() endpoint will have two new slots:
Interest
(varchar 500) is a |
-separated list of interests that a person possessesInterest Description
(varchar 8000) is a |
-separated list of in-depth summaries of those interestsWhat it says: use Docker to "containerize" the application and make a Docker Composition so we don't need to rely on sysadmins to participate in deployment of this.
For LabCAS RDF generation, filter on the "Consortium" field from Solr; include "EDRN" only.
A new buildout of this software fails trying to find distributions for z3c.recipe.staticlxml
. In addition, a pinned setuptools
prevents it from working, plus it's missing an unpinned zc.buildout
.
Adding these allow-hosts
seems to help:
[buildout]
allow-hosts +=
oodt.jpl.nasa.gov
pypi.fury.io
*.githubusercontent.com
*.github.com
*.python.org
*.plone.org
launchpad.net
files.pythonhosted.org
pypi.org
effbot.org
In addition, these version unpins are needed:
[versions]
setuptools =
zc.buildout =
Furthermore, these version pins are needed
[versions]
biopython = 1.66
Products.GenericSetup = 1.8.2
plone.recipe.zope2instance = 4.4.1
Products.LDAPUserFolder = 2.27
Currently, the CancerDataExpo uses a single Zope instance to as both app server and database server. This means we cannot do database maintenance (specifically packing the database) without also bringing down the app server.
The application should use the separate Zope instance as app server that uses a Zope Enterprise Objects (ZEO) instance as database server.
This will also let us do daily database backups.
So we don't have to hand-build multi-architecture images anymore
The clearance = CL № 22-6806
As the title says: LabCAS RDF should have public data only. Currently it has everything.
The issue EDRN/P5#102 requests that the same graphs that appear on EDRN LabCAS also appear on the portal. However, the RDF from LabCAS produced by the CancerDataExpo doesn't include the additional information necessary to produce these graphs, specifically
In addition, it may be more efficient to have the CancerDataExpo produce the graphs that the portal can then ingest for rapid display; this could happen too in this issue.
In order to support statistical charting in the new Wagtail portal, LabCAS RDF generation needs these predicates:
The LabCAS RDF generator produces a statement
<ns2:statistics rdf:about="https://edrn-labcas.jpl.nasa.gov/data-access-api/collections">
<ns1:cardinality>44</ns1:cardinality>
</ns2:statistics>
but then has just 36 <ns2:collection>
objects in its output.
The reason is that it asks Solr for the count of all collections and uses that for the <ns2:statistics>
object. It then iterates over the collections and dumps all non-EDRN consortium collections. That's why we end up with 36 < 44
. We should constrain the searches to just Consortium:EDRN
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.