linkml / schemasheets Goto Github PK

Structure your data in a FAIR way using google sheets or TSVs. These are then converted to LinkML, and from there other formats

Home Page: https://linkml.io/schemasheets/

Makefile 3.38% Python 96.62%

linkml schemas spreadsheets fair-data shape-languages shacl shex sheets2linkml metadata

schemasheets's Introduction

Schemasheets - make datamodels using spreadsheets

Create a data dictionary / schema for your data using simple spreadsheets - no coding required.

About

Schemasheets is a framework for managing your schema using spreadsheets (Google Sheets, Excel). It works by compiling down to LinkML, which can itself be compiled to a variety of formalisms, or used for different purposes like data validation

Documentation

See the Schema Sheets Manual

Quick Start

pip install schemasheets

You should then be able to run the following commands:

sheets2linkml - Convert schemasheets to a LinkML schema
linkml2sheets - Convert a LinkML schema to schemasheets
sheets2project - Generate an entire set of schema files (JSON-Schema, SHACL, SQL, ...) from Schemasheets

As an example, take a look at the different tabs in the google sheet with ID 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ

The personinfo tab contains the bulk of the metadata elements:

record	field	key	multiplicity	range	desc	schema.org
`>` class	slot	identifier	cardinality	range	description	exact_mappings: {curie_prefix: sdo}
`>`
	id	yes	1	string	any identifier	identifier
	description	no	0..1	string	a textual description	description
Person		n/a	n/a	n/a	a person,living or dead	Person
Person	id	yes	1	string	identifier for a person	identifier
Person, Organization	name	no	1	string	full name	name
Person	age	no	0..1	decimal	age in years
Person	gender	no	0..1	decimal	age in years
Person	has medical history	no	0..*	MedicalEvent	medical history
Event					grouping class for events
MedicalEvent		n/a	n/a	n/a	a medical encounter
ForProfit
NonProfit

This demonstrator schema contains both record types (e.g Person, MedicalEvent) as well as fields (e.g. id, age, gender)

You can convert this like this:

sheets2linkml --gsheet-id 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ personinfo types prefixes -o personinfo.yaml

This will generate a LinkML YAML file personinfo.yaml from 3 of the tabs in the google sheet

You can also work directly with TSVs:

wget https://raw.githubusercontent.com/linkml/schemasheets/main/tests/input/personinfo.tsv 
sheets2linkml personinfo.tsv  -o personinfo.yaml

We recommend using COGS to synchronize your google sheets with local files using a git-like mechanism

Finding out more

Schema Sheets Manual
- Specification
- Internal Datamodel
linkml/schemasheets code repo
linkml/linkml main LinkML repo

schemasheets's People

Contributors

Stargazers

Watchers

Forkers

sujaypatil96 vladimiralexiev lmodel yarikoptic djarecka

schemasheets's Issues

problematic urllib3 or chardet versions for new verbatim stuff

when running schemasheets/get_metaclass_slotvals.py or schemasheets/verbatim_sheets.py

/Users/MAM/Library/Caches/pypoetry/virtualenvs/schemasheets-FMUhH2LU-py3.9/lib/python3.9/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn(

linkml2sheets not picking up on annotation's inner_keys

even with an inner_key specification:

slot	display_hint
`> slot`	annotations
`>`	inner_key: display_hint

poetry run linkml2sheets specification.tsv \
  		--schema path/to/nmdc.yaml  \
  		--output-directory output \
  		--overwrite

generates

slot	display_hint
`> slot`	annotations
`>`	inner_key: display_hint
ess dive datasets
has credit associations	{'display_hint': Annotation(tag='display_hint', value='Other researchers associated with this study.', extensions={}, annotations={})}
study image
relevant protocols
funding sources
applied role
applied roles	{'display_hint': Annotation(tag='display_hint', value='Identify all CRediT roles associated with this contributor. CRediT Information: https://info.orcid.org/credit-for-research-contribution ; CRediT: https://credit.niso.org/', extensions={}, annotations={})}
applies to person

etc.

linkml2sheets doesn't work when given a directory of templates

@putmantime and I have observed that running linkml2sheets on a directory of templates doesn't work, even when all of the individual templates do work

the linkml2sheets help gives this example:

linkml2sheets -s my_schema.yaml sheets/*.tsv -d sheets --overwrite

In the nmdc-schema repo, the following two work

schemasheets/tsv_output/slots.tsv: clean_schemasheets
	linkml2sheets \
		--schema src/schema/nmdc.yaml \
		--output-directory schemasheets/tsv_output/ \
		schemasheets/templates/slots.tsv

schemasheets/tsv_output/classes.tsv: clean_schemasheets
	linkml2sheets \
		--schema src/schema/nmdc.yaml \
		--output-directory schemasheets/tsv_output/ \
		schemasheets/templates/classes.tsv

but this doesn't work

schemasheets/tsv_output/all.tsv: clean_schemasheets
	linkml2sheets \
		--schema src/schema/nmdc.yaml \
		--output-directory schemasheets/tsv_output/ \
		schemasheets/templates/*.tsv

Even though

ls -l schemasheets/templates

-rw-r--r--@ 1 MAM staff 71 Aug 16 17:22 classes.tsv
-rw-r--r--@ 1 MAM staff 58 Aug 16 17:26 prefixes.tsv
-rw-r--r--@ 1 MAM staff 2005 Aug 16 18:01 slots.tsv

The error is

Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/bin/linkml2sheets", line 8, in
sys.exit(export_schema())
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 297, in export_schema
exporter.export(sv, specification=f, to_file=outpath)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 90, in export
writer.writerow(row)
File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 154, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 149, in _dict_to_list
raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'class'
make: *** [schemasheets/tsv_output/all.tsv] Error 1

schemasheets doesn't seem to instantiate mixins yet

allow multiple aliases esp. on PVs

They should be |-delimited

Preparing test inputs and outputs now

add a column for directing rows to different YAML files

The NMDC and MIxS models (and presumably many more) consist of several YAML files.

In order to support faithful round-tripping, we could add a column that specifies that a row from a template should to to a particular YAML file when running sheets2linkml

Remove schemaview_vs_example.py

It's not clear why this is here and why it's in the core
https://github.com/linkml/schemasheets/tree/main/schemasheets

tests corresponding to MIXS workflow

some linkml2sheet annotations consist of only whitespace and pipes

support for generating "attributes" instead of "slots" and "slot_usages"

Per feedback from @mbrush, he would like to represent the LinkML YAML generated from schemasheets to use the "attributes" element instead of defining "slots" independently with slot_usage holding the definitions and descriptions.

Three tests failing on Mark's laptop but in GH actions or several other people's computers

FAILED                               [ 30%]
test_schema_exporter.py:174 (test_types)
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'

    def merge_sheet(self, file_name: str, delimiter='\t') -> None:
        """
        Merge information from the given schema sheet into the current schema
    
        :param file_name: schema sheet
        :param delimiter: default is tab
        :return:
        """
        logging.info(f'READING {file_name} D={delimiter}')
        #with self.ensure_file(file_name) as tsv_file:
        #    reader = csv.DictReader(tsv_file, delimiter=delimiter)
        with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
            schemasheet = SchemaSheet.from_dictreader(reader)
            line_num = schemasheet.start_line_number
            # TODO: check why this doesn't work
            #while rows and all(x for x in rows[-1] if not x):
            #    print(f'TRIMMING: {rows[-1]}')
            #    rows.pop()
            logging.info(f'ROWS={len(schemasheet.rows)}')
            for row in schemasheet.rows:
                try:
>                   self.add_row(row, schemasheet.table_config)

../schemasheets/schemamaker.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)

    def add_row(self, row: Dict[str, Any], table_config: TableConfig):
>       for element in self.row_focal_element(row, table_config):

../schemasheets/schemamaker.py:111: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)
column = None

    def row_focal_element(self, row: Dict[str, Any], table_config: TableConfig,
                          column: COL_NAME = None) -> Generator[None, Element, None]:
        """
        Each row must have a single focal element, i.e the row is about a class, a slot, an enum, ...
    
        :param row:
        :param table_config:
        :return:
        """
        vmap = {}
        main_elt = None
        if table_config.metatype_column:
            tc = table_config.metatype_column
            if tc in row:
                typ = self.normalize_value(row[tc], table_config.columns[tc])
                if not table_config.name_column:
                    raise ValueError(f'name column must be set when type column ({tc}) is set; row={row}')
                name_val = row[table_config.name_column]
                if not name_val:
                    raise ValueError(f'name column must be set when type column ({tc}) is set')
                if typ == 'class':
                    vmap[T_CLASS] = [self.get_current_element(ClassDefinition(name_val))]
                elif typ == 'slot':
                    vmap[T_SLOT] = [self.get_current_element(SlotDefinition(name_val))]
                else:
                    raise ValueError(f'Unknown metatype: {typ}')
        if table_config.column_by_element_type is None:
            raise ValueError(f'No table_config.column_by_element_type')
        for k, elt_cls in tmap.items():
            if k in table_config.column_by_element_type:
                col = table_config.column_by_element_type[k]
                if col in row:
                    v = self.normalize_value(row[col])
                    if v:
                        if '|' in v:
                            vs = v.split('|')
                        else:
                            vs = [v]
                        if elt_cls == Prefix:
                            if len(vs) != 1:
                                raise ValueError(f'Cardinality of prefix col must be 1; got: {vs}')
                            pfx = Prefix(vs[0], 'TODO')
                            self.schema.prefixes[pfx.prefix_prefix] = pfx
                            vmap[k] = [pfx]
                        elif elt_cls == SchemaDefinition:
                            if len(vs) != 1:
                                raise ValueError(f'Cardinality of schema col must be 1; got: {vs}')
                            self.schema.name = vs[0]
                            vmap[k] = [self.schema]
                        else:
                            vmap[k] = [self.get_current_element(elt_cls(v)) for v in vs]
        def check_excess(descriptors):
            diff = set(vmap.keys()) - set(descriptors + [T_SCHEMA])
            if len(diff) > 0:
                raise ValueError(f'Excess slots: {diff}')
        if column:
            cc = table_config.columns[column]
            if cc.settings.applies_to_class:
                if T_CLASS in vmap and vmap[T_CLASS]:
                    raise ValueError(f'Cannot use applies_to_class in class-focused row')
                else:
                    cls = self.get_current_element(ClassDefinition(cc.settings.applies_to_class))
                    vmap[T_CLASS] = [cls]
        if T_SLOT in vmap:
            check_excess([T_SLOT, T_CLASS])
            if len(vmap[T_SLOT]) != 1:
                raise ValueError(f'Cardinality of slot field must be 1; got {vmap[T_SLOT]}')
            main_elt = vmap[T_SLOT][0]
            if T_CLASS in vmap:
                # TODO: attributes
                c: ClassDefinition
                for c in vmap[T_CLASS]:
                    #c: ClassDefinition = vmap[T_CLASS]
                    if main_elt.name not in c.slots:
                        c.slots.append(main_elt.name)
                    if self.unique_slots:
                        yield main_elt
                    else:
                        c.slot_usage[main_elt.name] = SlotDefinition(main_elt.name)
                        main_elt = c.slot_usage[main_elt.name]
                        yield main_elt
            else:
                yield main_elt
        elif T_CLASS in vmap:
            check_excess([T_CLASS])
            for main_elt in vmap[T_CLASS]:
                yield main_elt
        elif T_ENUM in vmap:
            check_excess([T_ENUM, T_PV])
            if len(vmap[T_ENUM]) != 1:
                raise ValueError(f'Cardinality of enum field must be 1; got {vmap[T_ENUM]}')
            this_enum: EnumDefinition = vmap[T_ENUM][0]
            if T_PV in vmap:
                for pv in vmap[T_PV]:
                    #pv = PermissibleValue(text=v)
                    this_enum.permissible_values[pv.text] = pv
                    yield pv
            else:
                yield this_enum
        elif T_PREFIX in vmap:
            for main_elt in vmap[T_PREFIX]:
                yield main_elt
        elif T_TYPE in vmap:
            for main_elt in vmap[T_TYPE]:
                yield main_elt
        elif T_SUBSET in vmap:
            for main_elt in vmap[T_SUBSET]:
                yield main_elt
        elif T_SCHEMA in vmap:
            for main_elt in vmap[T_SCHEMA]:
                yield main_elt
        else:
>           raise ValueError(f'Could not find a focal element for {row}')
E           ValueError: Could not find a focal element for {'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}

../schemasheets/schemamaker.py:318: ValueError

The above exception was the direct cause of the following exception:

    def test_types():
        """
        tests a specification that is dedicated to types
        """
        sb = SchemaBuilder()
        schema = sb.schema
        # TODO: add this functionality to SchemaBuilder
        t = TypeDefinition('MyString', description='my string', typeof='string')
        schema.types[t.name] = t
>       _roundtrip(schema, TYPES_SPEC)

test_schema_exporter.py:184: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_schema_exporter.py:94: in _roundtrip
    schema2 = sm.create_schema(MINISHEET)
../schemasheets/schemamaker.py:61: in create_schema
    self.merge_sheet(f, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'

    def merge_sheet(self, file_name: str, delimiter='\t') -> None:
        """
        Merge information from the given schema sheet into the current schema
    
        :param file_name: schema sheet
        :param delimiter: default is tab
        :return:
        """
        logging.info(f'READING {file_name} D={delimiter}')
        #with self.ensure_file(file_name) as tsv_file:
        #    reader = csv.DictReader(tsv_file, delimiter=delimiter)
        with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
            schemasheet = SchemaSheet.from_dictreader(reader)
            line_num = schemasheet.start_line_number
            # TODO: check why this doesn't work
            #while rows and all(x for x in rows[-1] if not x):
            #    print(f'TRIMMING: {rows[-1]}')
            #    rows.pop()
            logging.info(f'ROWS={len(schemasheet.rows)}')
            for row in schemasheet.rows:
                try:
                    self.add_row(row, schemasheet.table_config)
                    line_num += 1
                except ValueError as e:
>                   raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e
E                   schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}

../schemasheets/schemamaker.py:108: SchemaSheetRowException

Generated linkml schema has title in range instead of name

When we were trying to generate the linkml schema for GA4GH VA in this repo: ga4gh-va, and you look at the source schema here, you'll see when the range for some induced slots were populated with their title, i.e., space separated names rather than their name.

For example: for induced slot variability in the schema, range gets populated as Data Item instead of DataItem.

CC: @gaurav

Classes not populated with slots as expected

Using the personinfo.yaml schema as an example, the export functionality doesn't create tsv file with the Person class containing its slots as expected.

is there a working example of applying annotations with schemasheets

I see them in the MIxS example sheet but I haven't been successful in converting that

Improve #23 with a test

Improve #23 (modelling of example values) with a test (as opposed to the current illustration in the Makefile)

cast minimum_value and maximum_value to int

When we add number values for minimum_value and maximum_value properties, like say 0 and 999, when the linkml model is created the program writes to the linkml file as strings rather than numbers.

Feature Requests for GA4GH-VA shcema Web Docs

Summarizing requests related to Web documentation content and format in this ticket. Providing as one long list for now, but happy to break out into tickets for specific feature requests as needed. @sujaypatil96 @sierra-moxon hope we can coordinate soon on these!

Content/Sections I’d like to see in each Class page (in the following order):

Definition:
a. already provided, and looks fine
b. content comes from the s/s "description" field.
UML-style diagram:
a. already provided using YUML, but I find these YUML diagrams hard to read and not all that useful.
b. It sounds like a new framework will be used to generate diagrams in the near future, so I will hold off on and requests here until I see how the new diagrams look.
Parents:
a. already present as a section on the page, and looks fine
Description:
a. A new section with the title "Description".
b. This should contain content in the 'comments' field of the s/s. Ideally as a bulleted list of sentences rather than one long paragraph/block of text, for improved readability.
c. At present text form the 'comments' column is in a table at the end of each Class page - but I’d like it front and center directly under the Definition.
Implementation and Use:
a. A new section with the title "Implementation and Use"
b. content would ideally be derived from the s/s - but not sure how to do this in practice? . . . I hear that the Annotations feature might let me just create a new 'Implementation and Use' column and give it whatever name I want. Not sure what tooling would be needed to generate a proper section in the Class web page that holds the content.
d. I'd also want this presented as a bulleted list of sentences/short paragraphs, rather than one long blob of text.
Own Attributes:
a. This section already exists in each Class page
b. content of course comes from the s/s
c. prefer 'expanded' form - not tables - as this better accommodates the types and amount of text I want to provide in describing each attribute. (see below)
d. don't think we need the class -> attribute pattern for 'own' attributes (no need class context when you are already on the class page and the section says 'own' )
Inherited Attributes
a. This section already exists in each Class page
b. content generated from s/s but pulling in all attributes from parents of a given class
Data Examples:
a. A new section called "Data Examples"
b. content would be nicely formatted yaml or json data examples - e.g. like those in the VRS RTD docs here), Ideally with some lead in text that describes what is being represented (but this could be part of the data example text block, as a # comment preceding the data itself)
c. Chris suggested a housing these in a 'Data Examples' directory in the repo - and pulling relevant examples in to a Class web page from these example files automatically. These data examples could then serve multiple purposes (documentation, texting/validation, etc.)

Content/Fields I’d like to see in for each Attribute of a class, as shown in a Class page

The attribute name, description, cardinality, and range are already provided and look good as is.
I’d also like to include a 'Comments:' field that holds text from the ‘comments’ column in the s/s - to provide additional clarification on meaning and usage of an attribute.

Put expected output from #17's test somewhere else

#17

#18 (comment)

recommend not checking in output files

where's a good place to put the expected output? Just as a string in the test? It's relatively small in this case.

Add default template and ability to derive templates

schemasheets is powerful and flexible with its template mechanism

It would be useful to have some standard templates:

for people to start filling in de-novo schemas
for use as a starting point for linkml2sheets

These could be standard TSVs that are distributed along with the. pypi package, with convenient commands for seeding files with these.

When going from an existing linkml schema it might be useful to also autogenerate a template that includes all used metaslots

no such option: -d

% poetry run sheets2linkml --help

gives the output below, but the script is called sheets2linkml, not schemasheets and the -d option doesn't seem to be implemented

/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
Usage: sheets2linkml [OPTIONS] [TSV_FILES]...

Convert schemasheets to a LinkML schema

schemasheets -d . my_schema/*tsv

Options:
-o, --output FILENAME output file
-v, --verbose
--help Show this message and exit.

Allow permissible values to be stated in the same row as a slot

currently it's possible to fill in an enum range for the slot and define the enum elsewhere but it can be convenient to list the PVs in the same row. This would have to be for simple PVs, no mapping/meaning

Add ignore rows feature

I love the ignore column specification. Is there some way to ignore rows? That would help illustrate content from an upstream provider that is being excluded from the model.

Could the metatype specification be repurposed to allow for ignoring rows? (If it doesn't support that already?)

Issue locating module 'fairstructure'

I installed the latest using pip install schemasheets and when attempting to execute sheets2project as per the README, I get the following:

ModuleNotFoundError: No module named 'fairstructure'

This was using the example tsvs from the repo, sheets2project -d . examples/input/*.tsv but the same behavior is seen just calling sheets2project

Column designator "type" conflicts with linkml 1.3. metaslot "type"

Given

Type
> type

"type" is a reserved word for schemasheets, as a shorthand for stating this is the name of the TypeDefinition

in linkml 1.3, "type" is introduced as a metaslot, this could cause ambiguity

proposal: in rare cases where disambiguation is required, use metaslot.type

See #74

override range in slot usage?

some shortcut meanings may be wrong

https://linkml.io/schemasheets/datamodel/Shortcuts/

Is thew meaning of schema really linkml:EnumDefinition?

Decide on a name for this tool

Currently fairstructure, but can change

Could go for generic, e.g. schemasheets, sheets2linkml...?

put structured_pattern values from sheets2linkml in the syntax sub-slot

sheets2linkml does honor structured_pattern column specifications

slot	structured_pattern
name	{firstname} {lastname}

but just serializes them like this:

structured_pattern: {firstname} {lastname}

I also tried using syntax as the column specification but got an error

My minimal desired outcome is that the cell contents are placed in a structured pattern's syntax slot:

structured_pattern:
  syntax: {firstname} {lastname}

It would be nice to allow specifications for interpolated and partial_match, too

Does the documentation mention that Google Sheets tab names have some constraints?

For example, if a user wants to use the "test enums" tab from https://docs.google.com/spreadsheets/d/1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ/edit#gid=823426713 , then their sheets2linkml command would look like this

sheets2linkml --gsheet-id 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ test+enums

We should discourage tab names that contain characters requiring more aggressive URL encoding than -> +

Add action to auto-publish docs

model examples

I will be editing schemasheets/schemamaker.py, based on the Annotations code block around line 303

was:

clone from scheammaker ~ 303

linkml2sheets won't populate a prefix template, even if that template works with sheets2linkml when populated

prefix	prefix_reference
`>`prefix	prefix_reference

linkml2sheets alternative?

I have found that the experimental linkml2sheets can't sheetify several of the elements and attributes I care about, and it can't seem to do even a minimal dump on complex/large schemas like MIxS. I have written some code that approximate a LinkML/sheets round trip on the following metaclasses:

See the turbomam/linkml-abuse project.Makefile

annotations
class_definitions
enum_definitions
prefixes
schema_definitions
slot_definitions
subset_definitions
type_definitions

The prefixes and subsets don't seem to include the content of imports

I'm not using a template to determine what gets written to the sheets. I'm iterating over all slots, except for the skipped slots listed below.

If my code was going to be included in any LinkML repo, it would need refactoring for performance and readability. I can do some of that. Even as it is, I have already used this for QC'ing the MIxS schema and plan to use it for round-tripping the NMDC submission portal schema (within sheets_and_friends)

There are some minor systematic changes between the before and after schemas. Thats crudely reported in target/roundtrip.yaml

skipped slots:

all_of
alt_descriptions
annotations
any_of
attributes
classes
classification_rules
default_curi_maps
enum_range
enums
exactly_one_of
extensions
from_schema
implicit_prefix
imports
local_names
name
none_of
prefixes
rules
slot_definitions
slot_usage
slots
structured_aliases
subsets
unique_keys
type_uri

Option to not write "from_schema" slots in sheets2linkml rendered yaml

It doesn't appear to be possible to silence the "from_schema" in sheets2linkml command.
It is very redundant and causes the linkml yaml to balloon.
It would be great to have a parameter where this could be toggled on or off defending on how suitable it is for the given model.

Example:

  laboratory_procedure:
    name: laboratory_procedure
    from_schema: https://w3id.org/include_portal_v1_schema
  parent_sample_id:
    name: parent_sample_id
    from_schema: https://w3id.org/include_portal_v1_schema
  parent_sample_type:
    name: parent_sample_type
    from_schema: https://w3id.org/include_portal_v1_schema
  sample_availability:
    name: sample_availability
    from_schema: https://w3id.org/include_portal_v1_schema
  sample_id:
    name: sample_id
    from_schema: https://w3id.org/include_portal_v1_schema
  sample_type:
    name: sample_type
    from_schema: https://w3id.org/include_portal_v1_schema
  volume:
    name: volume
    from_schema: https://w3id.org/include_portal_v1_schema
  volume_unit:
    name: volume_unit
    from_schema: https://w3id.org/include_portal_v1_schema
  access_url:
    name: access_url
    from_schema: https://w3id.org/include_portal_v1_schema
  data_access:
    name: data_access```

remove vestigial, hard-coded, pre-internal_separator bits

if 'mappings' in metaslot.name and ' ' in v: etc.

linkml2sheets won't populate a schema template

same issue as #70

unintuitively, non-string values require protection by a leading `'` in sheets2linkml gsheet-id mode

This works

sheets2linkml \
		--output $@ \
		--gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core

In that sheet, I protected numerical values and Booleans in the examples column by preceding them with '. I think the same thing is required for dates, and the affirmative boolean value must be represented as 'true, not the magic value of TRUE.

But switch term MIXS:0000001's example to 555, and you get

sheets2linkml \
		--output $@ \
		--gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core_example_555_num

Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 105, in merge_sheet
self.add_row(row, schemasheet.table_config)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 111, in add_row
for element in self.row_focal_element(row, table_config):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 233, in row_focal_element
raise ValueError(f'No table_config.column_by_element_type')
ValueError: No table_config.column_by_element_type

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/bin/sheets2linkml", line 8, in
sys.exit(convert())
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 578, in convert
schema = sm.create_schema(list(tsv_files))
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 61, in create_schema
self.merge_sheet(f, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 108, in merge_sheet
raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e
schemasheets.schemamaker.SchemaSheetRowException: Error in line 1, row={'Structured comment name > slot > >': 'samp_size', 'Item (rdfs:label) title ': 'amount or size of sample collected', 'Definition description ': 'The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.', 'Expected value annotations inner_key: expected_value': 'measurement value', 'Value syntax structured_pattern ': '{float} {unit}', 'Example examples internal_separator: "|"': '555', 'Section slot_group ': 'nucleic acid sequence source', 'migs_eu annotations applies_to_class: migs_eu inner_key: cardinality': 'X', 'migs_ba annotations applies_to_class: migs_ba inner_key: cardinality': 'X', 'migs_pl annotations applies_to_class: migs_pl inner_key: cardinality': 'X', 'migs_vi annotations applies_to_class: migs_vi inner_key: cardinality': 'X', 'migs_org annotations applies_to_class: migs_org inner_key: cardinality': 'X', 'mims annotations applies_to_class: mims inner_key: cardinality': 'C', 'mimarks_s annotations applies_to_class: mimarks_s inner_key: cardinality': 'C', 'mimarks_c annotations applies_to_class: mimarks_c inner_key: cardinality': 'X', 'misag annotations applies_to_class: misag inner_key: cardinality': 'C', 'mimag annotations applies_to_class: mimag inner_key: cardinality': 'C', 'miuvig annotations applies_to_class: miuvig inner_key: cardinality': 'C', 'Preferred unit annotations inner_key: preferred_unit': 'millliter, gram, milligram, liter', 'Occurrence multivalued vmap: {s: false, m: true}': 's', 'MIXS ID slot_uri ': 'MIXS:0000001', 'MIGS ID (mapping to GOLD) annotations inner_key: gold_migs_id': ''}
make: *** [generated/MIxS6_from_gsheet_templates_bad.yaml] Error 1

schemasheets export functionality missing linkml column descriptors

Using the schemasheets functionality that exports a provided linkml schema to a schemasheets specific tsv file, based on a specification tsv file is incomplete. In that, it does not make the output tsv file with the second row containing linkml column descriptors as expected.

export_spec.tsv:

Class	Field	Description	Key	Range
>class	slot	description	identifier	range

Run the export command as follows:

linkml2sheets ~/path/to/export_spec.tsv -s tests/input/personinfo.yaml -o personinfo.tsv

See the output file missing the second row with linkml column descriptors.

add slot usage on identifier column?

make all -> No such file or directory~/edirect/pytest

% make all

poetry run pytest
/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
Creating virtualenv schemasheets-FMUhH2LU-py3.9 in /Users/MAM/Library/Caches/pypoetry/virtualenvs

FileNotFoundError

[Errno 2] No such file or directory: b'/Users/MAM/edirect/pytest'

at /usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py:607 in _execvpe
603│ path_list = map(fsencode, path_list)
604│ for dir in path_list:
605│ fullname = path.join(dir, file)
606│ try:
→ 607│ exec_func(fullname, *argrest)
608│ except (FileNotFoundError, NotADirectoryError) as e:
609│ last_exc = e
610│ except OSError as e:
611│ last_exc = e
make: *** [test] Error 1

three tests failing in main

FAILED tests/test_schema_exporter.py::test_types - schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}
FAILED tests/test_schemamaker.py::test_types - AttributeError: 'TypeDefinition' object has no attribute 'type'
FAILED tests/test_schemamaker.py::test_combined - AttributeError: 'TypeDefinition' object has no attribute 'type'

allow for roundtripping of structured_patterns using inner keys

given a schema

classes:
  Person:
    attributes:
      first:
      last:
      full:
        structured_pattern:
          syntax: "{token} {token}"

we'd like a header column of:

structured_pattern
> inner_key: syntax

such that flat values like "{token} {token}" can be used in the datafile

This should work but an exception is currently thrown

discovered by @turbomam

schemasheets wrong documentation

Hi! Please update the documentation here: at https://linkml.io/schemasheets/intro/converting/ the second call is wrong and should be:
sheets2linkml -o my.yaml src/*.tsv not schemasheets ...

invoke with sheets2linkml?

As opposed to schemasheets? See #10

% sheets2linkml --help

Traceback (most recent call last):
File "/Users/MAM/my_first_ss/venv/bin/sheets2linkml", line 5, in
from fairstructure.schemamaker import convert
ModuleNotFoundError: No module named 'fairstructure'

set up github actions and make a PyPi release

Reorganize package structure

We have a flat list of things under

https://github.com/linkml/schemasheets/tree/main/schemasheets

schema_exporter: linkml2sheets
schemamaker: sheets2linkml
schemasheet_datamodel
sheets_to_project: primarily CLI
schemaview_vs_example: ???

this is a little ad-hoc. For other projects we subdivide into packages, e.g.

import
export
datamodel
cli

This may be overkill here, but we should at least have consistent naming conventions, e.g. if schema_exporter goes from the metamodel to sheets, then the opposite should be called schema_importer

mkdoc uses wrong branch name for "Edit" link

https://linkml.io/schemasheets/intro/converting/ link Edit on GitHub uses branch name "master" but the correct branch is "main".

Please fix the mkdoc configuration.

Add better documentation for when to use metatype

Notes from @cmungall:

metatype is useful for cases where you want to have a single column always represent the element name, and the element type to switch depending on the row. If you do it this style, you always need a “name” column. Further up in the stack trace it was complaining about the name field missing.

Examples should roundtrip

When converting from sheets to linkml, a column that maps to examples will be hard-wired to generate Example(value=v) for each v in the value cell.

When this is reversed back from linkml to sheets, this causes an error

The behavior should be symmetric

More broadly, there should be a well-documented solution for mapping complex values to flat spreadsheet cells, rather than relying on hardwiring

schemasheets CLI script not working?

% pip list | grep schemasheets

schemasheets 0.1.4

% schemasheets

zsh: command not found: schemasheets