As I worked through the massive changes in the logic introduced via the last merge, I noticed that the new declaration logic doesn't work well with the previous logic used in resolving module references (e.g. foreign key references) in the table definition string.
When a table definition refers to another table in another schema as in the following case
definition = """
A.Subject (manual) # subjects defined in schema A
-> B.Setups
"""
Then this reference to module B
has to be resolved appropriately. Unless B
is a module that's known by that very name to the Python interpreter (that is, you actually have B.py
on the top level, and not inside any package), then B
is not descriptive enough to be resolved down to a module that holds the definition of the target table B.Setups
. To get around this problem, I have previously introduced three-step resolution strategy that utilizes the information about the module that holds this definition (in this case, the module A
). The steps are as follows:
- Within the module
A
, look to see if there is a module imported by the name B
. This means that if inside the module A
I have a statement like import v1_project.schemata.B as B
(and therefore there is a local variable B
that is a module object) then the module name B
in the table definition string will resolve to the module v1_project.schemata.B
, and looks for the table Setups
in there.
- If no such imports are found and if
A
actually resides in a package (so that A
is actually package.A
), then look for a module named B
in the same immediate parent package. Thus it'll check for package.B
and if found, uses that.
- If neither of above 2 approaches works, then assume that
B
is actually a globally accessible module name, and attempt to import the module. From an organization and distribution point of view, I think it is simply unreasonable to assume that all modules with table definitions to be a top level module.
The point here is that, the resolution of any reference to a module that is not the module containing the table definition (i.e. -> B.Setups
), requires the information provided by the module holding the definition (and the package it belongs to).
An alternative is of course to make the module reference in the definition explicit, such that in the above case, rather than -> B.Setups
, it would read v1_project.schemata.B.Setups
instead. The problem with this is that, not only now it is wordy and tedious to write, but it binds the location of the module to a strict package structure. The beauty of the previous implementation was that you can do relative reference to other schema (module), with an ability to make the target explicit via import, if preferred.
The move of making table declaration into a separate function really breaks the above logic, because it assumes that the table definition can exist independently from the module in which they are found. Unlike in Matlab where you can pull up any class name up to the top level by adding it to the path, Python modules inside a package really requires the package names to reach the module, and as far as I know there is no easy way to make a module in a package accessible at the top namespace throughout the Python interpreter, and even if there is, I'd think it'll be rather awkward.
Given these, I suggest that we place the declare
and related functionality back into the Base
class. I could of course make the declare
function take in the name of the module that holds the definition, but if the most common use of declare
is by Base
derivatives, and thus pretty much always need information about the module in which the derived class is defined, I really don't see a whole lot of benefit in separating declare
into an independent function.