Expected behavior
The XML/INI should have to conform to a schema (or similar) which we can validate against when exporting and importing models. It should be simple to modify and occur smoothly.
Actual behavior
There is something similar to a schema used right now, a DTD. But the process for using it and the logic inside the code for export/import of models is not smooth. https://github.com/kmcos/kmcos/blob/master/kmcos/kmc_project_v0.3.dtd
The Problem
I wanted to add a string to the XML in order to put that information into kmc_settings to be able to access that string from the model during runtime. To do so was a big pain. Accordingly, I ended up generalizing the string into a connected_variables dictionary , so that a person is less likely to need to go through this hassle again. However, the code and way of doing things should still be solved in a better way in the future.
The Current Logic
The current logic is something like this:
- A person's build file makes a Project class object (typically "kmc_model")
- A person's build file makes an xml file (or ini file).
- That xml is then read back in and validated , which occurs against a DTD. A new Project class object is made from is read back in.
- Then the source code compilation occurs, along with making kmc_settings. What is in kmc_settings roughly mirrors the original Project class object, but it is actually from the new Project class object that has been created from the xml. Unfortunately, creating the xml, reading the xml, and the DTD, all have hardcoded things with unusual parsing syntax because the way the data is structured in these different formats is currently not compatible. It's not like a JSON to YAML or XML to JSON conversion. Due to this hardcoding, it is non-trivial to add new variables to them.
The current steps for adding a variable
I found the correct order by trial and error. It should not be like this.
- I first added a new variable into the Project class init function, inside types.py. self.connected_variables = {}
- I needed to make a new dtd file: https://github.com/kmcos/kmcos/blob/master/kmcos/kmc_project_v0.4.dtd
- Inside types.py, I needed to change several hardcoded places:
xml_api_version = (0,4)
kmcproject_v0_04_dtd = 'kmc_project_v0.4.dtd' # This is not good variable naming and should change. It also should not be hardcoded like this, it should be parsing the string so that arbitrary numbers of versions can be added without updating the hardcoding each time.
# There were other lines lower down inside types.py, in def import_xml_file, that I needed to update. For example, supported_versions = .... And these hardcodings need to be improved. I improved one line of the hardcodings, but additional lines of the hardcoding still needs to be removed there.
# it was especially confusing because all of these ways of writing the string of the version number are used: 0.4, 0_4, (0,4). And the variable names do not make it very obvious that these hardcoded places need to be updated!
- I needed to add my new variable into the writing of types.py so that connected_variables would become added to the xml, and needed to give a "heading" to the subitem contained in it, which I ended up calling connected_variables_string.
- I needed to add both connected_variables and the connected_variables_string into the dtd file as things to expect. The DTD is not user friendly. JSON Schema would be better.
- In the write_settings of io.py I needed to add
out.write("connected_variables = " + )
- I needed to add 'kmc_project_v0.4.dtd' into package_data of setup.py
Suggested Solution
We should cleanup the logic inside types.py so that ...
(a) the newest dtd is detected automatically rather than being hardcoded (just read filenames in the directory with some specific namestring in front, like kmcos_dtd_0.4)
(b) the internal variable names and supported versions should similarly not have hardcoding.
(c) We should switch to JSON and JSON Schema. Today the ecosystem for that is better. While JSON Schema were not really available at the time the core code was written, XML Schema was. Using an XML schema probably would have been a better choice than the DTD, but at this point we should probably just migrate to JSON Schema.
These changes are low priority since right now the changes being focused on are the ones that would increase kmcos adoption (not specialized applications kmcos development). Also, the new connected_variables will allow a variety of specialized applications, as long as they are intended to be from python and not from the backend.