Here is what I think the new DSL (in R) should look like! I use 👍 to indicate the versions of the syntax that I think we should implement/use among the ones I've brainstormed.
Feedback requested:
- What are your reactions to the ✨ New ✨ ideas? : Are you convinced they are good ideas based on your gut + our qual lab study? What are your biggest concerns about introducing/incorporating them?
- For the ✨ New ✨ language constructs that you like/are on board with, which of the syntaxes do you prefer? Maybe you don't like any of them, why not? Note: I expect the query ideas will become clearer when we discuss the interaction model.
Variables declaration
Not many changes here. Just a couple changes:
- new Participant class
- new condition function (still a measure under the hood)
# Specify participant unit (new class)
member <- Participant("member", cardinality=386)
motivation <- ordinal(unit=member, name="motivation", order=[1, 2, 3, 4, 5, 6])
age <- numeric(unit=member, name="age", number_of_instances=1)
pounds_lost <- numeric(unit=member, name="pounds_lost")
# Specify group unit
group <- Unit("group", cardinality=40) # 40 groups
# Specify condition (new function)
condition <- condition(unit=group, "treatment", cardinality=2, number_of_instances=1)
Data measurement
No changes. Keep above variable declarations + nests within:
nests_within(member, group)
Conceptual modeling
cm <- ConceptualModel() # explicitly handle to abide by functional programming model in R
✨New: Distinguish known vs. suspected relationships
Four different ways to realize this idea in the syntax listed below.
👍 Syntax 1: Unique functions (seems the easiest to remember and write)
suspect_causes(motivation, pounds_lost, cm)
suspect_causes(condition, pounds_lost, cm)
know_causes(age, pounds_lost, cm)
suspect_causes(age, motivation, cm)
Syntax 2: Nested know/suspect function calls
suspect(causes(motivation, pounds_lost), cm) # Modify cm object
suspect(causes(condition, pounds_lost), cm)
know(causes(age, pounds_lost), cm)
suspect(causes(age, motivation), cm)
Syntax 3: Add parameter
causes(motivation, pounds_lost, "suspect", cm)
causes(condition, pounds_lost, "suspect", cm)
causes(age, pounds_lost, "know", cm)
Syntax 4: Assume vs. Assert (check and error out) vs. Assess
Pro: Familiar concepts to programmers (e.g., assert statements)
Note: But maybe this is something that fits better later on in the interaction depending on query?
assess(causes(motivation, pounds_lost), cm) # suspect
assess(causes(condition, pounds_lost), cm) # suspect
assume(causes(age, pounds_lost), cm) # know
assert(causes(age, pounds_lost), cm) # know
✨New: Replace associates_with, use only causes + unobserved variables
Declare unobserved variable:
# Mediation
# age -> midlife_crisis -> pounds_lost
# age -> midlife_crisis -> motivation
midlife_crisis <- ts.Unobserved()
suspect_causes(age, midlife_crisis, cm)
suspect_causes(midlife_crisis, pounds_lost, cm)
suspect_causes(midlife_crisis, motivation, cm)
# Could have used know_causes as well - Does not change behavior later on.
# Common ancestor
# latent_var -> age, latent_var -> motivation
latent_var <- ts.Unobserved()
causes(latent_var, age, cm)
causes(latent_var, motivation, cm)
✨New: Express relationships with more specificity
- In these scenarios, "causes()" would be syntactic sugar (or vice versa, depending on how you view it)
- The benefit of these functions is (i) closer mapping to the detail with which how analysts think about causal relationships and (ii) more thorough documentation to facilitate analysts' reflection
👍 Syntax 1: Add when/then constructs
- "if" is a keyword in R, so have to use when. Could also use "_if" or something like that, but I think when reads nicer and is easier to remember.
- Have to infer/follow-up when not all categories in a categorical variable is stated - e.g., condition below
# Read as: I suspect that when motivation increases, then pounds_lost also increases in my conceptual model.
suspect(when(motivation, "increases").then(pounds_lost, "increases"), cm)
# Read as: I suspect that when condition is treatment, then pounds_lost increases in my conceptual model.
suspect(when(condition, "==`treatment`").then(pounds_lost, "increases"), cm) # implied: if(condition, "!=treatment").then(pounds_lost, "decreases")?
# Read as: I suspect that when condition is not treatment, then pounds_lost decreases in my conceptual model.
suspect(when(condition, "!=`treatment`").then(pounds_lost, "decreases"), cm) # implied: if(condition, "==`treatment`").then(pounds_lost, "increases")?
# Read as: I know that when age increases, then pounds_lost increases in my conceptual model.
know(when(age, "increases").then(pounds_lost, "increases"), cm)
# Read as: I suspect that when age increases, then pounds_lost increases in my conceptual model.
suspect(when(age, "increases").then(motivation, "increases"), cm)
Syntax 2: Add when construct (no explicit then construct)
Note: what comparisons (i.e., "increases") are valid depends on data type
# Read as: I suspect that when motivation increases, pounds_lost also increases in my conceptual model.
suspect(when(motivation, "increases", pounds_lost, "increases"), cm)
# Read as: I suspect that when condition is treatment, pounds_lost increases in my conceptual model.
suspect(when(condition, "==`treatment`", pounds_lost, "increases"), cm) # implied: when(condition, "==control", pounds_lost, "decreases")
# Read as: I suspect that when condition is not treatment, then pounds_lost decreases in my conceptual model.
suspect(when(condition, "!=`treatment`").then(pounds_lost, "decreases"), cm) # implied: if(condition, "==`treatment`", pounds_lost, "increases")?
# Read as: I know when age increases, pounds_lost increases in my conceptual model.
know(when(age, "increases", pounds_lost, "increases"), cm)
# Read as: I suspect that when age increases, motivation increases in my conceptual model.
suspect(when(age, "increases", motivation, "increases"), cm)
✨New: Specify potential interactions
Trying to express the following interaction: motivation * age on pounds_lost
👍 Syntax 1: Multiple when clauses: Each tuple could be its own when clause (conjunction)
Note: what comparisons (i.e., "increases") are valid depends on data type
suspect(when((motivation, "==low"), (age, "increases")).then(pounds_lost, "increases"), cm)
# allow if to have many tuples of relationships
suspect(when((motivation, "==low"), (age, "increases")).then(pounds_lost, "baseline"), cm)
suspect(when((motivation, "==high"), (age, "increases")).then(pounds_lost, "increases"), cm)
Syntax 2: No tuple distinction, similar to some existing R/Python libraries
when(age, "increases", motivation "decreases").then(pounds_lost, "decreases")
Queries to issue
How to use: Include variables in the "iv"/"ivs" parameter what the end-user really cares about; system adds adjustment sets for other variables that should be included to account for confounding.
✨New: Assess a conceptual model holistically
Returns set of statistical models + conditional independencies we would expect to observe if the data supports the conceptual model.
Based on the conditional independencies that we test, we can see if we have evidence for the conceptual model. If we have evidence for all the conditional independencies (which suggests the conceptual model is not wrong), yay. If only some of the conditional independencies are supported, this suggests the conceptual model needs to change.
ts.assess(conceptual_model=cm)
Assess direct effect of a IV on a DV
Returns statistical model that has IV and the minimal adjustment set to measure influence of IV on DV
#Example: How does condition affect pounds_lost?
ts.query(iv=condition, dv=pounds_lost, assuming=cm)
Assess direct effects of multiple IVs on a DV
Treat the IVs as the "exposures" and find the minimal adjustment set to adjust for as the other covariates.
If the variables included should not all be in one model because doing so may introduce inaccurate causal effect estimation (e.g., due to mediating causal structure), show warning.
#Example: How do these IVs influence pounds_lost?
ts.query(ivs=[condition, motivation], dv=pounds_lost, assuming=cm)
✨New: What are all the variables that influence a DV?
- This is mostly useful for suggesting queries/statistical models the end-user could issue to make the interaction model more interactive.
- Enumerates all sets of variables that influence CM while still avoiding confounding among the possible IVs (due to mediation, for example)
- Likely computationally intensive
ts.query(dv=pounds_lost, assuming=cm)