Git Product home page Git Product logo

ingest-client's Introduction

Build Status Maintainability Test Coverage PyPI

This repository was part of HCA DCP/1 and is not maintained anymore. DCP/2 development of this component continues in the forked repository at https://github.com/ebi-ait/ingest-client.

Ingest Client

This repository contains the hca-ingest Python package library which can be shared across ingest services.

Installation

pip install hca-ingest

Usage

API package

To use the Ingest API interface in your python script

from ingest.api.ingestapi import IngestApi

Configure the ingest url to be used by setting the environment variable for INGEST_API

INGEST_API=http://localhost:8080

Schema template package

The schema template package provides convenient lookup of properties in the HCA JSON schema. Each property in the JSON schema is represented as a simple key that is prefixed with the schema name.

The first element is the short name for the schema followed by the property. e.g the key for the biomaterial_id property in the donor_organism schema is donor_organism.biomaterial_core.biomaterial_id.

The schema template provides access to attributes of each key that is useful for developing schema aware applications that need to query or generate JSON documents against the JSON schema.

Key Description Examples
{key}.schema.high_level_entity Tells you if the property is part of type, module or core schema donor_organism.biomaterial_core.schema.high_level_entity = core, donor_organism.schema.high_level_entity = type, donor_organism.medical_history.schema.high_level_entity = module
{key}.schema.domain_entity Tells you if the property is in a biomaterial, file, process, protocol, analysis or project schema donor_organism.schema.domain_entity = biomaterial, dissociation_protocol.schema.domain_entity = protocol, dissociation_protocol.schema.domain_entity = protocol, sequence_file.schema.domain_entity = File
{key}.schema.module Tells you the name of the schema where this property is defined donor_organism.schema.domain_entity = biomaterial, dissociation_protocol.schema.module = dissociation_protocol, dissociation_protocol.schema.module = dissociation_protocol, donor_organism.medical_history.schema.module = medical_history
{key}.schema.url Gives you the full URL to the schema where this property is defined donor_organism.medical_history.schema.url = https://schema.humancellatlas.org/module/biomaterial/5.1.0/medical_history
{key}.value_type Tells you the expected value stype for this property. Can be one of object, string, integer donor_organism.medical_history.medication.value_type = string, sequence_file.lane_index.value_type = integer, sequence_file.file_core.value_type = object
{key}.multivalue Tells you in the value is a single value or an array of values sequence_file.insdc_run.multivalue = True
{key}.user_friendly The user friendly name for the property sequence_file.insdc_run.multivalue = INSDC run
{key}.description A short description of the property sequence_file.insdc_run.multivalue = An INSDC (International Nucleotide Sequence Database Collaboration) run accession. Accession must start with DRR, ERR, or SRR.
{key}.format Tell you if the property has a specific format, like a date format project.contact.email.format = email
{key}.required Tells you if the property is required donor_organism.biomaterial_core.biomaterial_id.required = True
{key}.identifiable Tells you if the property is an identifiable field for the current entity donor_organism.biomaterial_core.biomaterial_id.identifiable = True, donor_organism.biomaterial_core.biomaterial_name.identifiable = False
{key}.external_reference Tells you if the property is globaly identifiable and therefore retrievable a retrievable object from ingest donor_organism.uuid.external_reference = True
{key}.example An example of the expected value for this property project.contact.contact_name.example = John,D,Doe

Developer Notes

Requirements for this project are listed in 2 files: requirements.txt and requirements-dev.txt. The requirements-dev.txt file contains dependencies specific for development, and needs to be installed:

pip install -r requirements.txt
pip install -r requirements-dev.txt

Note: This package is currently only compatible with Python 3.

Running the Tests

To run all the tests, use nose package:

nosetests

Developing Code in Editable Mode

Using pip's editable mode, client projects can refer to the latest code in this repository directly without installing it through PyPI. This can be done either by manually cloning the code base:

pip install -e path/to/ingest-client

or by having pip do it automatically by providing a reference to this repository:

pip install -e \
git+https://github.com/HumanCellAtlas/ingest-client.git\
#egg=hca_ingest

For more information on version control support with pip, refer to the VCS support documentation.

Publish to PyPI

  1. Create PyPI Account through the registration page.

    Take note that PyPI requires email addresses to be verified before publishing.

  2. Package the project for distribution.

     python setup.py sdist
    

    Take note that setup.py is configured to build a distribution with name hca_ingest. This PyPI project is currently owned privately and may require access rights to change. Alternatively, the project name in setup.py can be changed so that it can be built and uploaded to a different PyPI entry.

  3. Install Twine

     pip install twine        
    
  4. Upload the distribution package to PyPI.

     twine upload dist/*
    

    Running python setup.py sdist will create a package in the dist directory of the project base directory. Specific packages can be chosen if preferred instead of the wildcard *:

     twine upload dist/hca_ingest-0.1a0.tar.gz
    

ingest-client's People

Contributors

aaclan-ebi avatar danielvaughan avatar daniwelter avatar javfg avatar malloryfreeberg avatar mightyax avatar mweiden avatar prabh-t avatar rolando-ebi avatar sampierson avatar simonjupp avatar tburdett avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ingest-client's Issues

Failure to display user-friendly names in generated spreadsheet.

Using the build_spreadsheet_with_links.py in @hewgreen 's branch (mg-links_builderv2), the following fields failed to display the user-friendly names in the column header (row 1):

donor_organism.organism_age_unit.text
donor_organism.development_stage.text
donor_organism.gestational_age_unit.text
donor_organism.height_unit.text
donor_organism.weight_unit.text
donor_organism.timecourse.unit.text
specimen_from_organism.organ.text
specimen_from_organism.organ_part.text
specimen_from_organism.preservation_storage.storage_time_unit.text
cell_suspension.cell_morphology.cell_size_unit.text
cell_suspension.timecourse.unit.text
collection_protocol.method.text
dissociation_protocol.method.text
enrichment_protocol.enrichment_method.text
library_preparation_protocol.input_nucleic_acid_molecule.text
library_preparation_protocol.library_construction_method.text
library_preparation_protocol.library_preamplification_method.text
library_preparation_protocol.cdna_library_amplification_method.text
sequencing_protocol.instrument_manufacturer_model.text
sequencing_protocol.method.text

It looks like all of these fields reference ontologies, but not all ontologized fields have this failure. For example, the fields donor_organism.human_specific.ethnicity.text and donor_organism.genus_species.text generate the correct user-friendly names.

Excel file attached.

generic_with_links.xlsx

UUID fields aren't recognised as external reference field in the Schema Template lib

Steps To Repro:
schema_template.lookup_property_attributes_in_metadata('project.uuid')

Actual Behavior:
{UnknownKeySchemaException}ERROR: Cannot find key uuid in any schema!

Expected Behavior
should return the same as the old behavior
{
'external_reference' : True,
'identifiable': True
}

It looks like that uuid fields is being explicitly added before as part of the schema template output

property.uuid = {'external_reference': True, 'identifiable': True}

Since the uuid field isn't really part of the metadata schema, the logic to handle it in the current code is not going to be applied.

if property_name in EXTERNAL_REFERENCE_PROPERTIES:

Fill out documentation in spreadsheet generator code

Documentation is missing/needs to be fixed in the following locations:

  • build method of linked_spreadsheet_builder.py
  • all private methods of linked_spreadsheet_builder.py
  • build method of vanilla_spreadsheet_builder.py
  • backbone and protocol_pairings parameter documentation is unclear in spreadsheet_builder_constants.py

New Schema Template doesn't recognize some fields in the schema

Step to repro:

schema_template.lookup_property_attributes_in_metadata('project.contributors.corresponding_contributor')

Expected behavior:
Should be same output as the old schema template

old_schema_template.lookup('project.contributors.corresponding_contributor')
{
	'multivalue': False,
	'format': None,
	'required': False,
	'identifiable': False,
	'external_reference': False,
	'user_friendly': 'Corresponding contributor',
	'description': 'Whether the individual is a primary point of contact for the project.',
	'example': 'Should be one of: yes, or no.',
	'guidelines': None,
	'value_type': 'boolean'
}

Enable generation of ProcessId to generated spreadsheets

This will allow multiple files to be included in a single bundle.

This work will is needed in order to enable the metadata schema integration testing work. We will be generating metadata schemas dynamically to test metadata schema changes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.