linkml / schemasheets Goto Github PK
View Code? Open in Web Editor NEWStructure your data in a FAIR way using google sheets or TSVs. These are then converted to LinkML, and from there other formats
Home Page: https://linkml.io/schemasheets/
Structure your data in a FAIR way using google sheets or TSVs. These are then converted to LinkML, and from there other formats
Home Page: https://linkml.io/schemasheets/
https://linkml.io/schemasheets/datamodel/Shortcuts/
Is thew meaning of schema
really linkml:EnumDefinition?
Notes from @cmungall:
metatype is useful for cases where you want to have a single column always represent the element name, and the element type to switch depending on the row. If you do it this style, you always need a “name” column. Further up in the stack trace it was complaining about the name field missing.
We have a flat list of things under
https://github.com/linkml/schemasheets/tree/main/schemasheets
this is a little ad-hoc. For other projects we subdivide into packages, e.g.
This may be overkill here, but we should at least have consistent naming conventions, e.g. if schema_exporter goes from the metamodel to sheets, then the opposite should be called schema_importer
Per feedback from @mbrush, he would like to represent the LinkML YAML generated from schemasheets to use the "attributes" element instead of defining "slots" independently with slot_usage holding the definitions and descriptions.
I see them in the MIxS example sheet but I haven't been successful in converting that
The NMDC and MIxS models (and presumably many more) consist of several YAML files.
In order to support faithful round-tripping, we could add a column that specifies that a row from a template should to to a particular YAML file when running sheets2linkml
even with an inner_key specification:
slot | display_hint |
---|---|
> slot |
annotations |
> |
inner_key: display_hint |
poetry run linkml2sheets specification.tsv \
--schema path/to/nmdc.yaml \
--output-directory output \
--overwrite
generates
slot | display_hint |
---|---|
> slot |
annotations |
> |
inner_key: display_hint |
ess dive datasets | |
has credit associations | {'display_hint': Annotation(tag='display_hint', value='Other researchers associated with this study.', extensions={}, annotations={})} |
study image | |
relevant protocols | |
funding sources | |
applied role | |
applied roles | {'display_hint': Annotation(tag='display_hint', value='Identify all CRediT roles associated with this contributor. CRediT Information: https://info.orcid.org/credit-for-research-contribution ; CRediT: https://credit.niso.org/', extensions={}, annotations={})} |
applies to person |
etc.
FAILED tests/test_schema_exporter.py::test_types - schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}
FAILED tests/test_schemamaker.py::test_types - AttributeError: 'TypeDefinition' object has no attribute 'type'
FAILED tests/test_schemamaker.py::test_combined - AttributeError: 'TypeDefinition' object has no attribute 'type'
% make all
poetry run pytest
/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
Creating virtualenv schemasheets-FMUhH2LU-py3.9 in /Users/MAM/Library/Caches/pypoetry/virtualenvs
FileNotFoundError
[Errno 2] No such file or directory: b'/Users/MAM/edirect/pytest'
at /usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py:607 in _execvpe
603│ path_list = map(fsencode, path_list)
604│ for dir in path_list:
605│ fullname = path.join(dir, file)
606│ try:
→ 607│ exec_func(fullname, *argrest)
608│ except (FileNotFoundError, NotADirectoryError) as e:
609│ last_exc = e
610│ except OSError as e:
611│ last_exc = e
make: *** [test] Error 1
I have found that the experimental linkml2sheets can't sheetify several of the elements and attributes I care about, and it can't seem to do even a minimal dump on complex/large schemas like MIxS. I have written some code that approximate a LinkML/sheets round trip on the following metaclasses:
See the turbomam/linkml-abuse project.Makefile
The prefixes and subsets don't seem to include the content of imports
I'm not using a template to determine what gets written to the sheets. I'm iterating over all slots, except for the skipped slots listed below.
If my code was going to be included in any LinkML repo, it would need refactoring for performance and readability. I can do some of that. Even as it is, I have already used this for QC'ing the MIxS schema and plan to use it for round-tripping the NMDC submission portal schema (within sheets_and_friends)
There are some minor systematic changes between the before and after schemas. Thats crudely reported in target/roundtrip.yaml
skipped slots:
sheets2linkml does honor structured_pattern
column specifications
slot | structured_pattern |
---|---|
name | {firstname} {lastname} |
but just serializes them like this:
structured_pattern: {firstname} {lastname}
I also tried using syntax
as the column specification but got an error
My minimal desired outcome is that the cell contents are placed in a structured pattern's syntax slot:
structured_pattern:
syntax: {firstname} {lastname}
It would be nice to allow specifications for interpolated
and partial_match
, too
currently it's possible to fill in an enum range for the slot and define the enum elsewhere but it can be convenient to list the PVs in the same row. This would have to be for simple PVs, no mapping/meaning
FAILED [ 30%]
test_schema_exporter.py:174 (test_types)
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'
def merge_sheet(self, file_name: str, delimiter='\t') -> None:
"""
Merge information from the given schema sheet into the current schema
:param file_name: schema sheet
:param delimiter: default is tab
:return:
"""
logging.info(f'READING {file_name} D={delimiter}')
#with self.ensure_file(file_name) as tsv_file:
# reader = csv.DictReader(tsv_file, delimiter=delimiter)
with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
schemasheet = SchemaSheet.from_dictreader(reader)
line_num = schemasheet.start_line_number
# TODO: check why this doesn't work
#while rows and all(x for x in rows[-1] if not x):
# print(f'TRIMMING: {rows[-1]}')
# rows.pop()
logging.info(f'ROWS={len(schemasheet.rows)}')
for row in schemasheet.rows:
try:
> self.add_row(row, schemasheet.table_config)
../schemasheets/schemamaker.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)
def add_row(self, row: Dict[str, Any], table_config: TableConfig):
> for element in self.row_focal_element(row, table_config):
../schemasheets/schemamaker.py:111:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
row = {'Desc': 'my string', 'Extends': 'string', 'Type': '', 'base': '', ...}
table_config = TableConfig(name=None, columns={'Type': ColumnConfig(name='Type', maps_to='type', settings=ColumnSettings(curie_prefix...], all_of=[]), is_element_type=None)}, column_by_element_type={'type': 'Type'}, metatype_column=None, name_column=None)
column = None
def row_focal_element(self, row: Dict[str, Any], table_config: TableConfig,
column: COL_NAME = None) -> Generator[None, Element, None]:
"""
Each row must have a single focal element, i.e the row is about a class, a slot, an enum, ...
:param row:
:param table_config:
:return:
"""
vmap = {}
main_elt = None
if table_config.metatype_column:
tc = table_config.metatype_column
if tc in row:
typ = self.normalize_value(row[tc], table_config.columns[tc])
if not table_config.name_column:
raise ValueError(f'name column must be set when type column ({tc}) is set; row={row}')
name_val = row[table_config.name_column]
if not name_val:
raise ValueError(f'name column must be set when type column ({tc}) is set')
if typ == 'class':
vmap[T_CLASS] = [self.get_current_element(ClassDefinition(name_val))]
elif typ == 'slot':
vmap[T_SLOT] = [self.get_current_element(SlotDefinition(name_val))]
else:
raise ValueError(f'Unknown metatype: {typ}')
if table_config.column_by_element_type is None:
raise ValueError(f'No table_config.column_by_element_type')
for k, elt_cls in tmap.items():
if k in table_config.column_by_element_type:
col = table_config.column_by_element_type[k]
if col in row:
v = self.normalize_value(row[col])
if v:
if '|' in v:
vs = v.split('|')
else:
vs = [v]
if elt_cls == Prefix:
if len(vs) != 1:
raise ValueError(f'Cardinality of prefix col must be 1; got: {vs}')
pfx = Prefix(vs[0], 'TODO')
self.schema.prefixes[pfx.prefix_prefix] = pfx
vmap[k] = [pfx]
elif elt_cls == SchemaDefinition:
if len(vs) != 1:
raise ValueError(f'Cardinality of schema col must be 1; got: {vs}')
self.schema.name = vs[0]
vmap[k] = [self.schema]
else:
vmap[k] = [self.get_current_element(elt_cls(v)) for v in vs]
def check_excess(descriptors):
diff = set(vmap.keys()) - set(descriptors + [T_SCHEMA])
if len(diff) > 0:
raise ValueError(f'Excess slots: {diff}')
if column:
cc = table_config.columns[column]
if cc.settings.applies_to_class:
if T_CLASS in vmap and vmap[T_CLASS]:
raise ValueError(f'Cannot use applies_to_class in class-focused row')
else:
cls = self.get_current_element(ClassDefinition(cc.settings.applies_to_class))
vmap[T_CLASS] = [cls]
if T_SLOT in vmap:
check_excess([T_SLOT, T_CLASS])
if len(vmap[T_SLOT]) != 1:
raise ValueError(f'Cardinality of slot field must be 1; got {vmap[T_SLOT]}')
main_elt = vmap[T_SLOT][0]
if T_CLASS in vmap:
# TODO: attributes
c: ClassDefinition
for c in vmap[T_CLASS]:
#c: ClassDefinition = vmap[T_CLASS]
if main_elt.name not in c.slots:
c.slots.append(main_elt.name)
if self.unique_slots:
yield main_elt
else:
c.slot_usage[main_elt.name] = SlotDefinition(main_elt.name)
main_elt = c.slot_usage[main_elt.name]
yield main_elt
else:
yield main_elt
elif T_CLASS in vmap:
check_excess([T_CLASS])
for main_elt in vmap[T_CLASS]:
yield main_elt
elif T_ENUM in vmap:
check_excess([T_ENUM, T_PV])
if len(vmap[T_ENUM]) != 1:
raise ValueError(f'Cardinality of enum field must be 1; got {vmap[T_ENUM]}')
this_enum: EnumDefinition = vmap[T_ENUM][0]
if T_PV in vmap:
for pv in vmap[T_PV]:
#pv = PermissibleValue(text=v)
this_enum.permissible_values[pv.text] = pv
yield pv
else:
yield this_enum
elif T_PREFIX in vmap:
for main_elt in vmap[T_PREFIX]:
yield main_elt
elif T_TYPE in vmap:
for main_elt in vmap[T_TYPE]:
yield main_elt
elif T_SUBSET in vmap:
for main_elt in vmap[T_SUBSET]:
yield main_elt
elif T_SCHEMA in vmap:
for main_elt in vmap[T_SCHEMA]:
yield main_elt
else:
> raise ValueError(f'Could not find a focal element for {row}')
E ValueError: Could not find a focal element for {'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}
../schemasheets/schemamaker.py:318: ValueError
The above exception was the direct cause of the following exception:
def test_types():
"""
tests a specification that is dedicated to types
"""
sb = SchemaBuilder()
schema = sb.schema
# TODO: add this functionality to SchemaBuilder
t = TypeDefinition('MyString', description='my string', typeof='string')
schema.types[t.name] = t
> _roundtrip(schema, TYPES_SPEC)
test_schema_exporter.py:184:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_schema_exporter.py:94: in _roundtrip
schema2 = sm.create_schema(MINISHEET)
../schemasheets/schemamaker.py:61: in create_schema
self.merge_sheet(f, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = SchemaMaker(schema=SchemaDefinition(name='TEMP', id_prefixes=[], definition_uri=None, local_names={}, conforms_to=None...), element_map=None, metamodel=None, cardinality_vocabulary=None, default_name=None, unique_slots=None, gsheet_id=None)
file_name = '/Users/MAM/Documents/gitrepos/schemasheets/tests/output/mini.tsv'
delimiter = '\t'
def merge_sheet(self, file_name: str, delimiter='\t') -> None:
"""
Merge information from the given schema sheet into the current schema
:param file_name: schema sheet
:param delimiter: default is tab
:return:
"""
logging.info(f'READING {file_name} D={delimiter}')
#with self.ensure_file(file_name) as tsv_file:
# reader = csv.DictReader(tsv_file, delimiter=delimiter)
with self.ensure_csvreader(file_name, delimiter=delimiter) as reader:
schemasheet = SchemaSheet.from_dictreader(reader)
line_num = schemasheet.start_line_number
# TODO: check why this doesn't work
#while rows and all(x for x in rows[-1] if not x):
# print(f'TRIMMING: {rows[-1]}')
# rows.pop()
logging.info(f'ROWS={len(schemasheet.rows)}')
for row in schemasheet.rows:
try:
self.add_row(row, schemasheet.table_config)
line_num += 1
except ValueError as e:
> raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e
E schemasheets.schemamaker.SchemaSheetRowException: Error in line 2, row={'Type': '', 'base': '', 'uri': '', 'Desc': 'my string', 'Extends': 'string'}
../schemasheets/schemamaker.py:108: SchemaSheetRowException
@putmantime and I have observed that running linkml2sheets
on a directory of templates doesn't work, even when all of the individual templates do work
the linkml2sheets
help gives this example:
linkml2sheets -s my_schema.yaml sheets/*.tsv -d sheets --overwrite
In the nmdc-schema repo, the following two work
schemasheets/tsv_output/slots.tsv: clean_schemasheets
linkml2sheets \
--schema src/schema/nmdc.yaml \
--output-directory schemasheets/tsv_output/ \
schemasheets/templates/slots.tsv
schemasheets/tsv_output/classes.tsv: clean_schemasheets
linkml2sheets \
--schema src/schema/nmdc.yaml \
--output-directory schemasheets/tsv_output/ \
schemasheets/templates/classes.tsv
but this doesn't work
schemasheets/tsv_output/all.tsv: clean_schemasheets
linkml2sheets \
--schema src/schema/nmdc.yaml \
--output-directory schemasheets/tsv_output/ \
schemasheets/templates/*.tsv
Even though
ls -l schemasheets/templates
-rw-r--r--@ 1 MAM staff 71 Aug 16 17:22 classes.tsv
-rw-r--r--@ 1 MAM staff 58 Aug 16 17:26 prefixes.tsv
-rw-r--r--@ 1 MAM staff 2005 Aug 16 18:01 slots.tsv
The error is
Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/bin/linkml2sheets", line 8, in
sys.exit(export_schema())
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 297, in export_schema
exporter.export(sv, specification=f, to_file=outpath)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/nmdc-schema-MTtWF7zd-py3.9/lib/python3.9/site-packages/schemasheets/schema_exporter.py", line 90, in export
writer.writerow(row)
File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 154, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/usr/local/Cellar/[email protected]/3.9.13_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/csv.py", line 149, in _dict_to_list
raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'class'
make: *** [schemasheets/tsv_output/all.tsv] Error 1
It doesn't appear to be possible to silence the "from_schema" in sheets2linkml command.
It is very redundant and causes the linkml yaml to balloon.
It would be great to have a parameter where this could be toggled on or off defending on how suitable it is for the given model.
Example:
laboratory_procedure:
name: laboratory_procedure
from_schema: https://w3id.org/include_portal_v1_schema
parent_sample_id:
name: parent_sample_id
from_schema: https://w3id.org/include_portal_v1_schema
parent_sample_type:
name: parent_sample_type
from_schema: https://w3id.org/include_portal_v1_schema
sample_availability:
name: sample_availability
from_schema: https://w3id.org/include_portal_v1_schema
sample_id:
name: sample_id
from_schema: https://w3id.org/include_portal_v1_schema
sample_type:
name: sample_type
from_schema: https://w3id.org/include_portal_v1_schema
volume:
name: volume
from_schema: https://w3id.org/include_portal_v1_schema
volume_unit:
name: volume_unit
from_schema: https://w3id.org/include_portal_v1_schema
access_url:
name: access_url
from_schema: https://w3id.org/include_portal_v1_schema
data_access:
name: data_access```
Hi! Please update the documentation here: at https://linkml.io/schemasheets/intro/converting/ the second call is wrong and should be:
sheets2linkml -o my.yaml src/*.tsv
not schemasheets ...
It's not clear why this is here and why it's in the core
https://github.com/linkml/schemasheets/tree/main/schemasheets
Currently fairstructure, but can change
Could go for generic, e.g. schemasheets, sheets2linkml...?
I installed the latest using pip install schemasheets
and when attempting to execute sheets2project as per the README, I get the following:
ModuleNotFoundError: No module named 'fairstructure'
This was using the example tsvs from the repo, sheets2project -d . examples/input/*.tsv
but the same behavior is seen just calling sheets2project
I love the ignore
column specification. Is there some way to ignore rows? That would help illustrate content from an upstream provider that is being excluded from the model.
Could the metatype
specification be repurposed to allow for ignoring rows? (If it doesn't support that already?)
prefix | prefix_reference |
---|---|
> prefix |
prefix_reference |
% poetry run sheets2linkml --help
gives the output below, but the script is called sheets2linkml
, not schemasheets
and the -d
option doesn't seem to be implemented
/usr/local/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
Usage: sheets2linkml [OPTIONS] [TSV_FILES]...
Convert schemasheets to a LinkML schema
schemasheets -d . my_schema/*tsv
Options:
-o, --output FILENAME output file
-v, --verbose
--help Show this message and exit.
Summarizing requests related to Web documentation content and format in this ticket. Providing as one long list for now, but happy to break out into tickets for specific feature requests as needed. @sujaypatil96 @sierra-moxon hope we can coordinate soon on these!
Definition:
a. already provided, and looks fine
b. content comes from the s/s "description" field.
UML-style diagram:
a. already provided using YUML, but I find these YUML diagrams hard to read and not all that useful.
b. It sounds like a new framework will be used to generate diagrams in the near future, so I will hold off on and requests here until I see how the new diagrams look.
Parents:
a. already present as a section on the page, and looks fine
Description:
a. A new section with the title "Description".
b. This should contain content in the 'comments' field of the s/s. Ideally as a bulleted list of sentences rather than one long paragraph/block of text, for improved readability.
c. At present text form the 'comments' column is in a table at the end of each Class page - but I’d like it front and center directly under the Definition.
Implementation and Use:
a. A new section with the title "Implementation and Use"
b. content would ideally be derived from the s/s - but not sure how to do this in practice? . . . I hear that the Annotations feature might let me just create a new 'Implementation and Use' column and give it whatever name I want. Not sure what tooling would be needed to generate a proper section in the Class web page that holds the content.
d. I'd also want this presented as a bulleted list of sentences/short paragraphs, rather than one long blob of text.
Own Attributes:
a. This section already exists in each Class page
b. content of course comes from the s/s
c. prefer 'expanded' form - not tables - as this better accommodates the types and amount of text I want to provide in describing each attribute. (see below)
d. don't think we need the class -> attribute pattern for 'own' attributes (no need class context when you are already on the class page and the section says 'own' )
Inherited Attributes
a. This section already exists in each Class page
b. content generated from s/s but pulling in all attributes from parents of a given class
Data Examples:
a. A new section called "Data Examples"
b. content would be nicely formatted yaml or json data examples - e.g. like those in the VRS RTD docs here), Ideally with some lead in text that describes what is being represented (but this could be part of the data example text block, as a # comment preceding the data itself)
c. Chris suggested a housing these in a 'Data Examples' directory in the repo - and pulling relevant examples in to a Class web page from these example files automatically. These data examples could then serve multiple purposes (documentation, texting/validation, etc.)
When we add number values for minimum_value and maximum_value properties, like say 0 and 999, when the linkml model is created the program writes to the linkml file as strings rather than numbers.
recommend not checking in output files
where's a good place to put the expected output? Just as a string in the test? It's relatively small in this case.
I will be editing schemasheets/schemamaker.py
, based on the Annotations code block around line 303
was:
clone from scheammaker ~ 303
For example, if a user wants to use the "test enums" tab from https://docs.google.com/spreadsheets/d/1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ/edit#gid=823426713 , then their sheets2linkml
command would look like this
sheets2linkml --gsheet-id 1wVoaiFg47aT9YWNeRfTZ8tYHN8s8PAuDx5i2HUcDpvQ test+enums
We should discourage tab names that contain characters requiring more aggressive URL encoding than
-> +
When we were trying to generate the linkml schema for GA4GH VA in this repo: ga4gh-va, and you look at the source schema here, you'll see when the range for some induced slots were populated with their title, i.e., space separated names rather than their name.
For example: for induced slot variability
in the schema, range gets populated as Data Item
instead of DataItem
.
CC: @gaurav
% pip list | grep schemasheets
schemasheets 0.1.4
% schemasheets
zsh: command not found: schemasheets
This works
sheets2linkml \
--output $@ \
--gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core
In that sheet, I protected numerical values and Booleans in the examples column by preceding them with '
. I think the same thing is required for dates, and the affirmative boolean value must be represented as 'true
, not the magic value of TRUE
.
But switch term MIXS:0000001's example to 555, and you get
sheets2linkml \
--output $@ \
--gsheet-id 1zsxvjvifDcmkt72v9m1_VKa2m73_THDJapJYK6dqidw core_example_555_num
Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 105, in merge_sheet
self.add_row(row, schemasheet.table_config)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 111, in add_row
for element in self.row_focal_element(row, table_config):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 233, in row_focal_element
raise ValueError(f'No table_config.column_by_element_type')
ValueError: No table_config.column_by_element_type
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/bin/sheets2linkml", line 8, in
sys.exit(convert())
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 578, in convert
schema = sm.create_schema(list(tsv_files))
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 61, in create_schema
self.merge_sheet(f, **kwargs)
File "/Users/MAM/Library/Caches/pypoetry/virtualenvs/mixs-linkml-GchukLmP-py3.9/lib/python3.9/site-packages/schemasheets/schemamaker.py", line 108, in merge_sheet
raise SchemaSheetRowException(f'Error in line {line_num}, row={row}') from e
schemasheets.schemamaker.SchemaSheetRowException: Error in line 1, row={'Structured comment name > slot > >': 'samp_size', 'Item (rdfs:label) title ': 'amount or size of sample collected', 'Definition description ': 'The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.', 'Expected value annotations inner_key: expected_value': 'measurement value', 'Value syntax structured_pattern ': '{float} {unit}', 'Example examples internal_separator: "|"': '555', 'Section slot_group ': 'nucleic acid sequence source', 'migs_eu annotations applies_to_class: migs_eu inner_key: cardinality': 'X', 'migs_ba annotations applies_to_class: migs_ba inner_key: cardinality': 'X', 'migs_pl annotations applies_to_class: migs_pl inner_key: cardinality': 'X', 'migs_vi annotations applies_to_class: migs_vi inner_key: cardinality': 'X', 'migs_org annotations applies_to_class: migs_org inner_key: cardinality': 'X', 'mims annotations applies_to_class: mims inner_key: cardinality': 'C', 'mimarks_s annotations applies_to_class: mimarks_s inner_key: cardinality': 'C', 'mimarks_c annotations applies_to_class: mimarks_c inner_key: cardinality': 'X', 'misag annotations applies_to_class: misag inner_key: cardinality': 'C', 'mimag annotations applies_to_class: mimag inner_key: cardinality': 'C', 'miuvig annotations applies_to_class: miuvig inner_key: cardinality': 'C', 'Preferred unit annotations inner_key: preferred_unit': 'millliter, gram, milligram, liter', 'Occurrence multivalued vmap: {s: false, m: true}': 's', 'MIXS ID slot_uri ': 'MIXS:0000001', 'MIGS ID (mapping to GOLD) annotations inner_key: gold_migs_id': ''}
make: *** [generated/MIxS6_from_gsheet_templates_bad.yaml] Error 1
same issue as #70
They should be |-delimited
Preparing test inputs and outputs now
Given
Type
> type
"type" is a reserved word for schemasheets, as a shorthand for stating this is the name of the TypeDefinition
in linkml 1.3, "type" is introduced as a metaslot, this could cause ambiguity
proposal: in rare cases where disambiguation is required, use metaslot.type
See #74
https://linkml.io/schemasheets/intro/converting/ link Edit on GitHub uses branch name "master" but the correct branch is "main".
Please fix the mkdoc configuration.
schemasheets is powerful and flexible with its template mechanism
It would be useful to have some standard templates:
These could be standard TSVs that are distributed along with the. pypi package, with convenient commands for seeding files with these.
When going from an existing linkml schema it might be useful to also autogenerate a template that includes all used metaslots
See also
Using the schemasheets functionality that exports a provided linkml schema to a schemasheets specific tsv file, based on a specification tsv file is incomplete. In that, it does not make the output tsv file with the second row containing linkml column descriptors as expected.
export_spec.tsv:
Class Field Description Key Range
>class slot description identifier range
Run the export command as follows:
linkml2sheets ~/path/to/export_spec.tsv -s tests/input/personinfo.yaml -o personinfo.tsv
See the output file missing the second row with linkml column descriptors.
when running schemasheets/get_metaclass_slotvals.py
or schemasheets/verbatim_sheets.py
/Users/MAM/Library/Caches/pypoetry/virtualenvs/schemasheets-FMUhH2LU-py3.9/lib/python3.9/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn(
Improve #23 (modelling of example values) with a test (as opposed to the current illustration in the Makefile)
given a schema
classes:
Person:
attributes:
first:
last:
full:
structured_pattern:
syntax: "{token} {token}"
we'd like a header column of:
structured_pattern
> inner_key: syntax
such that flat values like "{token} {token}" can be used in the datafile
This should work but an exception is currently thrown
discovered by @turbomam
see also linkml/linkml#1006
When converting from sheets to linkml, a column that maps to examples
will be hard-wired to generate Example(value=v)
for each v in the value cell.
When this is reversed back from linkml to sheets, this causes an error
The behavior should be symmetric
More broadly, there should be a well-documented solution for mapping complex values to flat spreadsheet cells, rather than relying on hardwiring
if 'mappings' in metaslot.name and ' ' in v:
etc.
Using the personinfo.yaml schema as an example, the export functionality doesn't create tsv file with the Person class containing its slots as expected.
As opposed to schemasheets
? See #10
% sheets2linkml --help
Traceback (most recent call last):
File "/Users/MAM/my_first_ss/venv/bin/sheets2linkml", line 5, in
from fairstructure.schemamaker import convert
ModuleNotFoundError: No module named 'fairstructure'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.