Git Product home page Git Product logo

cidgoh / dataharmonizer Goto Github PK

View Code? Open in Web Editor NEW
91.0 7.0 25.0 49.77 MB

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.

License: MIT License

JavaScript 84.38% CSS 0.93% HTML 7.49% Python 6.83% Makefile 0.37%
data harmonization spreadsheet javascript application linkml linkml-schema

dataharmonizer's Introduction

DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, which works of of LinkML data specifications. This open source project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH) at Simon Fraser University, is now a collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others. Read the open-source DataHarmonizer manuscript for more about the application's theory and design.

Watch Rhiannon Cameron and Damion Dooley describe this application on YouTube at the Canadian Research Software Conference (CRSC2021).

Chrome Firefox Edge
49+ 34+ 12+

Pathogen Genomics Templates

Note that the Pathogen Genomics Package of DataHarmonizer templates, which includes Covid-19 and Monkeypox, is available now as a simpler stand-alone zip file at https://github.com/cidgoh/pathogen-genomics-package. This version does not require the use of the developer environment and can be run simply by loading the index.html or main.html files in your local web browser.

alt text

Manuscript

The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information

Microbial Genomics (9:1) 2023

DOI: https://doi.org/10.1099/mgen.0.000908

Ivan S. Gill, Emma J. Griffiths​, Damion Dooley​, Rhiannon Cameron​, Sarah Savić Kallesøe​, Nithu Sara John​, Anoosha Sehar​, Gurinder Gosal​, David Alexander​, Madison Chapel​, Matthew A. Croxen​​, Benjamin Delisle​, Rachelle Di Tullio​, Daniel Gaston​, Ana Duggan​, Jennifer L. Guthrie​, Mark Horsman4​, Esha Joshi​, Levon Kearny​, Natalie Knox​, Lynette Lau​, Jason J. LeBlanc9, Vincent Li​, Pierre Lyons​, Keith MacKenzie1, Andrew G. McArthur​, Emily M. Panousis​, John Palmer​, Natalie Prystajecky​, Kerri N. Smith​, Jennifer Tanner​, Christopher Townend​, Andrea Tyler, Gary Van Domselaar​, William W. L. Hsiao

Installation

This repository contains a full DataHarmonizer development environment including the scripts necessary to generate a code library for API use, as well as the stand-alone version. Instructions for setting this up is in the Development section below. The API is used by the https://data.microbiomedata.org/ project for data collection. Using the API allows DataHarmonizer to be presented in a custom user interface, with a specific template pre-loaded for example, and select controls menu items constructed as desired in the interface.

Stand-Alone DataHarmonizer Functionality

In addition to API use, as detailed in the Development section, the development environment includes a script for generating a stand-alone browser-based version of DataHarmonizer that includes templates for detailing SARS-CoV-2 and Monkeypox sample contextual data. More infectious disease templates will be included in the comming year. Other organizations are adopting this version of DataHarmonizer for their own data management purposes.

Select Template

The default template loaded is the "CanCOGeN Covid-19" template. To change the spreadsheet template, select the white text box to the right of Template, it always contains the name of the template currently active, or navigated to File followed by Change Template. An in-app window will appear that allows you to select from the available templates in the drop-down menu. After selecting the desired template, click Open to activate the template.

change template

A second way to access templates directly, rather than by the hard-coded menu system, is to specify the DataHarmonizer template subfolder via a "template" URL parameter. This enables development and use of customized templates, or new ones, that DH doesn't have programmed in menu.

For example, http://genepio.org/DataHarmonizer/main.html?template=gisaid accesses the /template/gsiaid/ subfolder's template directly.

See more on the Wiki DataHarmonizer templates page.

Usage

You can edit the cells manually, or upload xlsx, xls, tsv, csv and json files via File > Open. You can also save the spreadsheet's contents to your local hard-drive in the aforementioned formats, or File > Export your data as a document formatted for submission a specified portal, database, or repository.

saving and exporting files

Click the Validate button to validate your spreadsheet's values against a standardized vocabulary. You can then browse through the errors using the Next Error button. Missing value are indicated in dark red, while incorrect values are light red. After resolving these errors, revalidate to see if any remain. If there are no more errors the “Next Error” button will change to “No Errors” and then dissapear.

validating cells and checking next error

Double click any column headers for information on the template's vocabulary. This usually includes the definition of the field, guidance on filling in the field, and examples of how data might look structured according to the constraints of the validator.

double click headers for more info

You can quickly navigate to a column by selecting Settings > Jump to.... An in-app window will appear, select the desired column header from the drop-down list or begin typing its name to narrow down the list options. Selecting the column header from the drop down list will immediately relocate you to that column on the spreadsheet.

jump to column

You can also automatically fill a column with a specified value, but only in rows with corresponding values in the first sample ID column. To use this feature select Settings > Fill column.... Select the desired column header from the drop-down list or begin typing its name to narrow down the list options, then specify the value to fill with and click Ok to apply.

fill column, in rows with corresponding sample IDs, with specified value

For more information on available application features, select the Help button followed by Getting Started from within the DataHarmonizer application or navigate to the Getting Started GitHub wiki..

Example Data

The stand-alone version of DataHarmonizer when built is placed in the /web/dist/ folder, with the following structure. Templates with example data testing functionalities can be found within the following folder structure leading to the "exampleInput/" folder, when available:

. TOP LEVEL DIRECTORY
├── images
├── libraries
├── script
└── template
│   ├── templateOfInterest
│   │   └── exampleInput
│   └── ...

Note that the source of the built "template/" folder above is actually in /web/templates/, where example input data should be placed before performing the build process. Here is an example that links to all available test data for the CanCOGeN Covid-19 template:

Version Control

Versioning of templates, features, and functionality is modeled on semantic versioning (i.e. versions are expressed as “DataHarmonizer X.Y.Z”). Changes to vocabulary in template pick lists are updated by incremental increases to the third position in the version (i.e. “Z” position). Changes to fields and features are updated by incremental increases to the second position in the version (i.e. “Y” position). Changes to basic infrastructure or major changes to functionality are updated by incremental increases to the first position in the version (i.e. “X” position).

Descriptions of updates are provided in release notes for every new version.

Discussions contributing to updates may be tracked on the DataHarmonizer GitHub issue tracker.

Development

Code in this repository is split mainly between two folders: lib and web. The lib folder contains the core interface components which are published to NPM and can be used by any client to build a user interface. The web folder contains an implementation of one such interface, using the components defined in lib. The interface implemented in the web folder is packaged and made available to users as releases of this repository.

Prerequisites

For development, you must have Node.js and Yarn installed. If you have Node.js version 16.10 or later (highly recommended) and you have not used Yarn before, you can enable it by running:

corepack enable

Installing

To install the dependencies of this package for development simply run:

yarn

Running Locally

Developing either the library components in lib or the interface in web can be done using the same command:

yarn dev

This will start a webpack development server running locally on localhost:8080. You can connect to localhost:8080 by inputing it into your browser URL bar while yarn dev is running. Changes to either lib or web should be loaded automatically in your browser. This serves as interface for testing and debugging the core library components (in the lib directory) and that interface itself (the web directory).

Publishing and Releasing

To bundle the canonical interface run:

yarn build:web

You can open web/dist/index.html in your browser to test the distributable bundle and verify it runs in "offline".

To bundle the library components into lib/dist for downstream clients to use via API instead of the canonical interface, run:

yarn build:lib

Making templates

With a [schema name] of your choice, work in /web/templates/[schema name]/

  • Add one almost empty file export.js to the same folder. It contains:
// A dictionary of possible export formats
export default {};
  • Assemble one schema.yaml file by hand. It should be a merger of a valid linkml schema.yaml file (your existing schema) and at least an extra dh_interface class. The dh_interface class signals to DH to show the given class as a template menu option. Below we are using an AMBR class as an example:
classes:
  dh_interface:
    name: dh_interface
    description: A DataHarmonizer interface
    from_schema: https://example.com/AMBR # HERE CHANGE TO [schema name] URI
  AMBR:    # HERE CHANGE TO [schema name]
    name: AMBR
    description: The AMBR Project, led by the Harrison Lab at the University of Calgary,
      is an interdisciplinary study aimed at using 16S sequencing as part of a culturomics
      platform to identify antibiotic potentiators from the natural products of microbiota.
      The AMBR DataHarmonizer template was designed to standardize contextual data
      associated with the isolate repository from this work.
    is_a: dh_interface
  • Optionally add all the types: {} from one of the other specification schema.core.yaml file examples existing in /web/templates/, since this allows DH things like the "provenance" slot, and allows use of the whitespaceMinimizedString datatype which blocks unnecessary spaces, but this is not essential.
types:
  WhitespaceMinimizedString:
    name: 'WhitespaceMinimizedString'
    typeof: string
    description: 'A string that has all whitespace trimmed off of beginning and end, and all internal whitespace segments reduced to single spaces. Whitespace includes #x9 (tab), #xA (linefeed), and #xD (carriage return).'
    base: str
    uri: xsd:token
  Provenance:
    name: 'Provenance'
    typeof: string
    description: 'A field containing a DataHarmonizer versioning marker. It is issued by DataHarmonizer when validation is applied to a given row of data.'
    base: str
    uri: xsd:token
  • Generate the schema.json file in that file’s template folder (/web/templates/[schema name]/) by running
python ../../../script/linkml.py -i schema.yaml

This will also add a menu item for your specification by adjusting /web/templates/menu.json.

  • Check the updated /web/templates/menu.json. With this example, the template menu will be "ambr/AMBR".
 "ambr": {
    "AMBR": { # Make sure the right class is called by DH
      "name": "AMBR",
      "status": "published",
      "display": true # Make sure the status is set to true
    }
  • Test your template, by going to the DH root folder and type (as documented on github main code page):
yarn dev

You can then browse to http://localhost:8080 to try out the template.

  • you can then build a stand alone set of JS files in /web/dist/
yarn build:web

The /web/dist/ folder can then be zipped or copied separately to wherever you want to make the app available.

TODO: describe how to use the DataHarmonizer javascript API.

Roadmap

This project is now in production, with new features being added occasionally. The Projects tab indicates anticipated functionality.

Support

If you have any ideas for improving the application, or have encountered any problems running the application, please open an issue for discussion.

Additional Information

For more information about the DataHarmonizer, it's templates, and how to use them, check out the DataHarmonizer Wiki.

Acknowledgement

  • Handsontable was used to build the grid. DataHarmonizer is configured to reference the "non-commercial-and-evaluation" handsontable license "for purposes not intended toward monetary compensation such as, but not limited to, teaching, academic research, evaluation, testing and experimentation"; if this application is used for commercial purposes, this should be revised as per https://handsontable.com/docs/license-key/
  • SheetJS was used to open and save local files. The community edition was used under the Apache 2.0 license.

License

DataHarmonizer javascript, python and other code not mentioned in the Acknowledgement above is covered by the MIT license.

dataharmonizer's People

Contributors

cmrn-rhi avatar cmungall avatar cpauvert avatar ddooley avatar dependabot[bot] avatar griffie avatar ivansg44 avatar kennethbruskiewicz avatar mgopez avatar pkalita-lbl avatar subdavis avatar sujaypatil96 avatar takadonet avatar turbomam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataharmonizer's Issues

Improve mapping to GISAID field "Passage details/history"

Prepend specimen processing if "virus passage" was selected.

Replace all "unknown" values with blanks. So no more "lab host;unknown;passage method".

Suggested algorithm:

If virus passage is selected from specimen processing:
  Add virus passage

If lab host is selected and not null value:
  Add lab host
If passage number is selected and not a null value:
  Add passage number
If passage method is selected and not a null value:
  Add passage method`

Adding age bin 80-89 and 90+

Is it possible to add an age bin for 80-89 and another for 90+?

Our partners have indicated that this would add more granularity to the data given they test several individuals above 100 yo.

Failed Import Notification

When an import fails (e.g. someone is missing a row and then fails to correctly declare which row has the column headers) it would be nice to have a notification of failed import. Especially since a failed import doesn’t show a new/empty sheet if the user has already been using it to validate a previous one. Depending on the user's attention to detail, they may not recognize that they are still working on their previous upload and proceed to work/export as if they were on the expected import.

CNPHI Export - "Animal Type" Issue.

CNPHI "Animal Type" field doesn't accept Humans.
Looks like we are mapping "host (common name)" to "Animal Type"; can we do so but omit "human"?

CNPHI Field Context:

Field: Specimen Source
Input: Human, Animal, Environment

Field: Animal Type
Input: Bat, Cat, Chicken, Civets, Cow, Dog, Lion, Pangolin, Pig, Pigeon, Tiger, Human

Suggestion: Required Fields

Hi,

Considering that we want submitters to include as much information as possible beyond the required fields, perhaps we could amend the DH to show the purple optional columns alongside the yellow required ones when "Show required columns" is selected.

As it stands, showing just the required column headers may not be enabling or encouraging people to add in additional information.

Cheers,

Sarah

Accession Prefix Validation

Implement validation on accessions that have controlled prefixes.

BioProject Accession - Prefix: PRJNA
BioSample Accession - Prefix: SAMN
SRA (run) Accession - Prefix: SRR
GISAID Accession - Prefix: EPI_ISL_

GenBank Accessions - Allowable prefixes for nucleotide direct submissions: U, AF, AY, DQ, EF, EU, FJ, GQ, GU, HM, HQ, JF, JN, JQ, JX, KC, KF, KJ, KM, KP, KR, KT, KU, KX, KY, MF, MG, MH, MK, MN, MT (source: https://www.ncbi.nlm.nih.gov/Sequin/acc.html)

Negative numbers not being invalidated

Fields tagged as "decimal" and "integer" allow negative values. This does not make sense for any of the current fields.

In addition, the reference guide for several of these fields tells users to insert a generic "numerical value". I don't know if its common sense, but stating "positive numerical value" would be more accurate. "Positive integer value" would be even more accurate for integer fields.

"decimal" and "integer" are also not be the best datatype values if we only allow positive values. Wikipedia has a list of descriptors for sets of numbers: https://en.wikipedia.org/wiki/List_of_types_of_numbers. Possible improvements on "integer" are "whole numbers", "natural numbers", and "non-negative integers". Possible improvements on "decimal" are "positive numbers" or "positive decimals". However, I don't believe the users ever see the datatype values, so this is non-important from a ui point-of-view.

include some example datasets

Can we include some example files that people can use to make sure the validator behalf as it should be? I think we should include a good excel file, a good text file, and a file with some errors in it. Thoughts?

Invalid Data Notification on Save/Export

Could have a notification for when a user tries to export/save a validated spreadsheet that has invalid fields. We don't want to prevent export, but this could be helpful for users who may have missed correcting a field by accident. Perhaps the notification could have a 'more info' option that would then list out the invalid cells.

Blank rows between data are not read

We currently ignore blank rows when reading data, which helps reduce the side of data we're working with (there are sometimes a lot of trailing empty rows). But blank rows between non-blank rows are ignored too. They should not be.

Valid imported numbers flagged invalid

All imported passage numbers (from a csv file) were displayed as invalid. If I entered the cell, did not change any values in it and then validated, the numbers were considered valid.

The same thing happened for Number Base Pairs, Consensus Genome Length, Mean Contig Length, and N50.

Ns per 100 kbp immediately recognized all values (int and float) as valid, and recognized non-numerals as invalid (as it should).

COVID-19 Vocab Update

COVID-19 vocab update due to changes in DataHarmonizer Templates "CanCOGEn-COVID" tab. To be completed before next release.

Field: geo_loc_name (province/territory)
Value: Yukon
Comments: Changed from "Yukon Territory" to "Yukon" to be compatible with CNPHI upload.

Field: signs and symptoms
Value: Abnormal lung auscultation
Comments: Corrected spelling mistake "ausculation" -> "auscultation".

Date validation showing YYYY-MM imports as invalid

All dates imported with the YYYY-MM format were designated invalid when they should be valid. Occurred for all three file types (csv, tsv, xlsx).

Dates Tested: 2019-12, 2020-05, 2020-03, 2020-01, 2020-04.

CNPHI Export Header Issues - sequencing centre, primary specimen id, #version

Issues with DataHarmonizer -> CNPHI Export Headers:

  • Does not include "Sequencing Centre" field which is mandatory for CNPHI upload.
  • CNPHI will not accept the field name "primary specimen identification number", it is expecting "primary specimen id".
  • CNPHI will not accept uploads that have the DataHarmonizer version information, e.g. "#validated using data harmonizer version 0.13.1". At least, not on the header row.

import feedback when user upload fails

It would be useful to get some sort of visual feedback/prompt when an import fails and perhaps with troubleshooting suggesting and/or linking to the SOP. Maybe some sort of check that lets the user know they are missing the “database identifiers” row or a column header.

Additionally, is it necessary for the first row to specifically be labelled “database identifiers”, can’t the contents be ignored?

CNPHI Export... Patient Travelled

If a there is data in the DataHarmonizer template under travel history shouldn't the CNPHI export say yes under Patient Travelled? Even when there is no data in the Country of Travel\|Province of Travel\|City of Travel\|Travel start date\|Travel End Date field; it could be that a user didn't fill in those optional fields and just chose to use travel history.

image

Provenance - validation info only outputting to 1 row

The provenance validation version information is only being outputted to one row. Would like to see it for all rows so if the data gets subset or merged (etc.) the version information will be present for all specimen.

image

data loss when importing with 1st row column headers

If the user imports a file where the headers are on the first row, and they declare so on the prompt (‘Which row in your file has the column headers?’ ‘1’) then the second row of the document (first row of data) is truncated (i.e. the row is gone, an empty row does not remain in place). This happens for all three file types (csv, tsv, xlsx).

High Priority: Collection Date - level of precision

For incomplete collection dates (to year, or to month) we need a "Date Unit" field with values "Year", "Month" and "Day".
In the validation step, if a collection date only specifies a month or year, the Date Unit field will specify that. Then the DH should automate the filling in the rest of the missing date parts with "01" so that the date can be accepted by downstream programs that require year-month-day (YYYY-MM-DD).
In the export file for CNPHI, the Date Unit field should be called Precision. We'll map that once the DH adopts the changes above.

CNPHI Export - .CSV option

The DataHarmonizer allows for export to the CNPHI template in .xls, but CNPHI only accepts .csv
Would be very helpful to have the option to export to .csv, ideally as the CNPHI export default.

⭐ Thanks to Sarah Savić Kallesøe for identifying this issue!

DataHarmonizer Version Information in Output Files

I know there is currently discussion on how to add version metadata to DataHarmonizer output files; but while a more sophisticated implementation is being determined - perhaps for the time being commented metadata values should be added directly onto the spreadsheet (e.g. in one of the empty cells in the 'database identifiers' row) to ensure there is some version information on files for current users.

Import fails if first row empty

If the user leaves the first row empty, lists headers in the second row, and declares that the headers start on the second row - the import will fail. This happens for all three file types (csv, tsv, xlsx).

Improve Spreadsheet Navigation

It would be nice to have a way to search for and go to a column based on it's header.
With the headers not being alphabetical, and ctrl+f only working for the text currently displayed in the viewport, it can be frustrating trying to navigate to a specific column. E.g. I know I want to edit 'host age' but I can't recall where it is in the spreadsheet, so I have to slowly scroll through the sheet to find it. This gets tedious quickly and will only get more frustrating overtime as we add more fields.

Is there a way we could have a header search of some sort? When the user types input, it pulls from a header pick-list and then takes you to your selection. Perhaps using something like scrollViewportTo() - with the header names mapped to a header col number that it jumps you to?
scrollViewportTo jsfiddle example

Imported csv date validation problems.

Dates imported in the appropriate ISO 8601 standards are being flagged as invalid and sometimes reformatted into an invalid format and/or a different date upon upload.

EXAMPLES:

  • "2020" (YYYY valid format) is being flagged invalid.
  • "2020-04" (YYYY-MM valid format) converted to "3/31/20" upon import.
  • "2020-01" (YYYY-MM valid format) converted to "12/31/19" upon import.
  • **4/4/2020"" (invalid format) converted to "4/4/20".
  • "2019-12-20" (YYYY-MM-DD valid format) converted to "12/19/19" upon import.

CNPHI Export - Travel History

Incorrect mappings for travel history.

DataHarmonizer: destination of most recent travel (city)
is mapping to...
CNPHI: Country of Travel

DataHarmonizer: destination of most recent travel (country)
is mapping to...
CNPHI: City of Travel

BEFORE (DataHarmonizer format):
image

AFTER (CNPHI export):
image

Organism Example Update

Noticed that example the organism field says "Severe acute respiratory coronavirus 2" but that is no longer an acceptable input. It appears to have been changed to "Severe acute respiratory syndrome coronavirus 2". Have updated to vocab sheet and the SOP example to the new label.

I think people might get very frustrated if we don't get this updated quickly because it's a field everyone has to use and there isn't any source telling them what it's actually supposed to be (from what I can tell).

`Export To... CNPHI` error

DataHarmonizer Release 0.13.1

Tried exporting the validTestData.csv to CNPHI .xls format and ended up with a column variable that looks like it's supposed to be two columns:

Related Specimen ID\|Related Specimen Relationship Type

...and cell contents that don't appear valid (unless this is what CNPHI requests to have as a null value): |

image

Body product term listed under anatomical material

Body product "Fluid (seminal)" is being labelled invalid. Appears to still be listed under "anatomical material" rather than "body product" in data.js. Is correctly listed under body product in data.tsv.

Non-Frozen Cols not matching Frozen Cols size

Release 0.6.0: The non-frozen column rows do not resize to match the contents/height of the frozen column ‘specimen collect sample ID’ rows. I know this might sound confusing so you can reference figures here for further clarification.

Improve performance with large datasets

With the potential of several thousand samples needing validation from stakeholders in Quebec and Ontario, we should improve performance for larger datasets. Telling users to break their data up into chunks should be a last resort.

Scrolling through large datasets is fine, due to only a subset of the data being rendered.

Importing, saving and exporting large datasets freezes the page for a significant amount of time. The bottleneck seems to be the ability of Sheet-JS to read/write large files. Sheet-JS has a XLSX.writeFileAsync file that's worth investigating. For reading, perhaps we can break the binary string Filereader obtains into chunks and have Sheet-JS read them in parallel using promises.

Validating large datasets freezes the page for an incredibly significant amount of time. The bottleneck seems to be with Handsontable's updateSettings function. Perhaps we can take a different approach, that involves iterating through the matrix in parallel and making calls to hot.setCellMeta.

Webworkers could also be useful. We could offload the tasks to the background and provide a loading dialog. Webworkers don't work in Chrome offline, but there may be a workaround https://stackoverflow.com/a/33432215/11472358

Post Selection Pick-List Shrinkage

Once something is selected from a pick-list, if you try to change it (say you clicked the wrong one by accident) the drop-down menu only shows your current selection, and any existing subclasses, as options. Re-selecting doesn't remove the current selection either.

If this cannot be changed it should be noted somewhere in the SOP (as much as it may seem obvious to some that the user should just delete it - causing the pick-list to reform - this may not occur to all users).

DataHarmonizer Provenance in CNPHI export

CNPHI has agreed that we can output the DataHarmonizer provenance version information into a (free-text) column called additional comments.


Example DH-CNPHI Export CSV:

image


Example Successful CNPHI Upload:

image

Maintain extra fields on upload

Potential issue: If we insert the extra fields close to the same index they were originally, the first header row would become inaccurate. Could highlight fields to indicate the first header has no bearing on them. Not a super clean solution.

'Help'/'Support' Section in Validator

This may have been discussed in the past (my apologies if I touch on things that have already been addressed) but can he include a 'help'/'support' section that directs the users to:

  • The SOP
  • Where to make issue/term requests (either linking to GitHub Issues or to a curator email)

There is now a published version of the SOP that we can link too (it will update every time we edit the SOP gdoc), but perhaps we should also have a pdf copy that users can use if they are offline? That latter would certainly require more maintenance - we could include a note at the top that it may not be the latest version and to go to the published version if possible.

Host Age Bin Paste Issue

I've noticed that if I type in 12 months into host age and host age unit, then host age bin will produce the correct bin (0-9), but if I paste it in I get the wrong bin (10-19) and it considers it valid (See picture).

image

CNPHI Export - Symptom Label Conversions

Some symptoms have to have their label converted for the CNPHI export, as CNPHI already has established, equivalent labels and does not wish to change them.

Data Harmonizer "Host (common name)" Label CNPHI "Animal Type" Label
Cow bovine
Pig porcine
Data Harmonizer "Signs and Symptoms" Label CNPHI "Symptoms" Label
Acute Respiratory Distress Syndrome ARDS
Chills (sudden cold sensation) Chills
Conjunctivitis (pink eye) Conjunctivitis
Diarrhea (watery stool) Diarrhea, watery
Encephalitis (brain inflammation) Encephalitis
Fatigue (tiredness) Fatigue
Fever (>=38°C) Fever

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.