chrfrantz / ig-parser Goto Github PK

View Code? Open in Web Editor NEW

7.0 3.0 3.0 3.67 MB

Parser for IG 2.0 Statements encoded in IG Script Notation

License: GNU General Public License v3.0

Go 57.56% CSS 0.52% HTML 3.64% Dockerfile 0.03% Shell 0.16% JavaScript 38.10%

ig-script institutional-grammar ig-parser institutional-analysis institutional-grammar-2-0 institutional-statement

ig-parser's Introduction

IG Parser

Parser for IG 2.0 Statements based on the IG Script Notation.

Contact: Christopher Frantz ([email protected])

Institutional Grammar 2.0 Website: https://newinstitutionalgrammar.org

Deployed IG Parser:

Tabular output: https://ig-parser.newinstitutionalgrammar.org
Visual output: https://ig-parser.newinstitutionalgrammar.org/visual

Note: Either version allows interactive switching to the respective other while preserving encoded statement information.

See Revision history for a detailed overview of changes.

See Contributors for an overview of contributions to the project. We explicitly encourage external contributions. Please feel free to get in touch if you plan to contribute to the repository. Please create an issue to report bugs, or to propose features (alternatively also per mail).

Overview

IG Parser is a parser for IG Script, a formal notation for the representation institutional statements (e.g., policy statements) used in the Institutional Grammar 2.0. The parser can be used locally, as well as via a web interface that produces tabular output of parsed statements (currently supporting Google Sheets format). In the following, you will find a brief introduction to the user interface, followed by a comprehensive introduction to the syntactic principles and essential features of IG Script. This includes a set of examples showcasing all features and various levels of complexity, while highlighting typical mistakes in the encoding. As a final aspect, the deployment instructions for IG Parser are provided.

The conceptual background of the Institutional Grammar 2.0 is provided in the corresponding article and book, augmented with supplementary operational coding guidelines.

User Interface Guide

The user interface consists of various entry fields, followed by parameters (specific to each version of the parser) and an output section that will contain the generated output. The following subsections highlight the key features of each element.

General Entry Fields

The initial entry field holds the 'Original Statement'. The purpose of this field is to keep track of the original statement during coding, but also to include it in the output if you choose to do so (see 'Parameters' section in the UI; discussed below). Prior to coding, it also allows you to vet the statement for obvious challenges in coding (e.g., imbalanced parentheses, non-supported symbol), which you can do by clicking on 'Validate 'Original Statement' input'. This will allow you copy the content to the 'Encoded Statement' field (it warns you if encoded code exists to prevent accidental overwriting of the existing coding).

The 'Encoded Statement' area provides the actual coding editor. It features both an 'Advanced Mode' that includes additional input options (see below) as well as color-coding of the encoded statement. Alternatively, you can use the standard mode that does not provide any advanced features beyond basic bracket matching, and operates by directly encoding in the IG Script syntax (described later). This version is more useful if using mobile devices or facing accessibility issues based on the color-coding.

Advanced Editor

The advanced user interface has four main input options available to annotate statements:

Firstly the option of simply writing out the symbols manually in the encoded statement field to annotate the text. This means manually typing each symbol and creating the brackets for each symbol.
The second option is to highlight the section or span you want to annotate with a symbol and then click on the button for that specific symbol below the editor. This will then encapsulate that section with the specified symbol.
Thirdly there is a "Selection Mode" toggle above the editor, which reverses this interaction flow. So instead of first highlighting and then selecting a symbol, you first select a symbol and can then highlight text to annotate the selection with the symbol. The symbol stays selected until either a new symbol is selected or the selection mode is toggled off. The selection mode is only available on devices using a mouse as an input device.
Finally, there is the "edit mode". By clicking the "Text mode" button the editor switches to edit mode which disables typing in the editor in exchange for allowing keybinds to be used for annotation. This mode works by first selecting the text you want to annotate, and then clicking a button corresponding with the symbol you want to annotate the selection with. For example for a direct object the key is "r". To add a property instead for symbols which support it the same key is used, in addition to the "shift" key. So, in this instance the keybinding for a direct object property is "shift+r".

In addition to these features there are toggles for nesting, which enables the correct brackets for nested symbols, semantic annotation which adds the semantic annotation brackets to symbols on creation. There are also buttons for undo, redo and for displaying a quick guide of the supported symbols and keybinds in the editor.

Another feature of the website is keyboard interaction. Where each element can be navigated using the "tab" key or "shift+tab" to navigate in reverse. Additionally, each button can be pressed using the "enter" key and the parameter toggles can be clicked using the "space" key. Further for selecting or highlighting text in the editor the combination of "ctrl+shift" and the arrow keys can be used. This makes the website accessible using a keyboard as the only input.

Toggling between variants (editors, parser versions)

You can interactively toggle between basic and the advanced mode by clicking on 'Toggle advanced editor features'. You can further interactively switch between the tabular output mode as well as the visual output mode of the parser. In all those cases no coding information is lost.

Output-specific parameters

Below the editor area you will find output-specific parameters, all of which have a simple help (by hovering over their label; some allow for opening of external pages for more extensive guidance).

Tabular output

For the tabular parser version, the parameters include
- the 'Statement ID' (which is the ID based on which individual statement IDs are generated),
- options to generate output based on the different levels of expressiveness (IG Core, IG Extended, IG Logico),
- the selective inclusion of a header row in the generated output,
- the option to include the original statement (from the field 'Original Statement') as well as the IG-Script-encoded statement (Field 'Encoded Statement') in the output (either only in the first content line, or for all generated output lines),
- the selection of the output format, which currently includes Google Sheets-parseable output (you can paste it directly into Google Sheets spreadsheets), or as CSV (which can be used in statistical programming tools, or in Excel, etc.)

By clicking on 'Generate tabular output', the input is parsed and output generated, which can be copied into the clipboard for transferral into a tool of your choice.

Visual output

For the visual parser version, the parameters include
- the inclusion of 'IG Logico' annotations in the output
- the inclusion of the Degree of Variability (a metric introduced as part of the IG 2.0 - see the conceptual guidance),
- the inclusion of component properties as tree nodes attached to their parent components (as opposed to just labels),
- the display of a fully decomposed binary tree structure (this is useful for "debugging" your interpretation of the tree structure)
- the choice to print activation conditions on top of the tree (to make statements more readable by reflecting the logical precedence of activation conditions over the rest of the statement),
- the specification of the canvas dimensions (this is useful to customize the size to your needs and graph scale -- a future feature will be to automate the scaling)

By clicking on 'Generate visual output', the input information is parsed and the statement tree structure displayed.

Usage considerations

To support efficient coding, specifically for complex statements it is often useful to encode and evaluate those in visual mode, before generating the tabular output for downstream processing. Use the interactive switching features for this purpose.
As indicated before, when hovering of a label on the UI, a short description is displayed highlighting its key function (or variably opens a new tab, where more support is provided, e.g., for the 'Encoded Statement').
Note that the UI stores the last entry in your browser. When reopening your browser, it should hence display the statement you have been working on (not the history). This feature is useful if internet connection is lost, or if you work with interruptions (so you can continue at a later stage), or if your browser crashes. No information is stored on the server side, but only in your specific browser (not across browsers). To delete this information, you can delete your browser cache.

In the following, you will find an overview of the actual IG Script syntax used for the encoding of institutional statements entered into the 'Encoded Statement' field.

IG Script

IG Script is a notation introduced in the context of the Institutional Grammar 2.0 (IG 2.0) that aims at a deep structural representation of legal statements alongside selected levels of expressiveness. While IG 2.0 highlights the conceptual background, the objective of IG Script is to provide an accessible, but formal approach to provide a format-independent representation of institutional statements of any type (e.g., regulative, constitutive, hybrid). While the parser currently supports exemplary export formats (e.g., tabular format and visual output), the tool is open to be extended to support other output formats (e.g., XML, JSON, YAML). The introduction below focuses on the operational coding. Syntactic and semantic foundations are provided elsewhere.

Principles of IG Script Syntax

IG Script centers around a set of fundamental primitives that can be combined to parse statements comprehensively, including:

Basic component coding
Component combinations
Nested statements
Nested statement combinations
Component pair combinations
Object-Property relationships
Semantic Annotations

Component Coding

Component Coding provides the basic building block for any statement encoding. A component is represented as

componentSymbol(naturalLanguageText)

componentSymbol is one of the supported symbols for the different component types (see below). An example for an annotated Attributes component is A(Farmer).

naturalLanguageText is the human-readable text annotated as the entity the annotation describes. The text is open-ended and can include special symbols, including parentheses (e.g., A(Farmer (e.g., organic farmer))). Exceptions to this rule are discussed in the context of combinations).

The scope of a component is specified by opening and closing parentheses.

All components of a statement are annotated correspondingly, without concern for order, or repetition. The parser further tolerates multiple component annotations of the same kind. Multiple Attributes, for example (e.g., A(Farmer) D(must) I(comply) A(Certifier)), are effectively interpreted as a combination (i.e., A(Farmer [AND] Certifier) D(must) I(comply)) in the parsing process.

Any symbols outside the encoded components and combinations of components are ignored by the parser.

The parser supports a fixed set of component type symbols that uniquely identify a given component type.

Supported Component Type Symbols:

A - Attributes
A,p - Attributes Property
D - Deontic
I - Aim
Bdir - Direct Object
Bdir,p - Direct Object Property
Bind - Indirect Object
Bind,p - Indirect Object Property
Cac - Activation Condition
Cex - Execution Constraint
E - Constituted Entity
E,p - Constituted Entity Property
M - Modal
F - Constitutive Function
P - Constituting Properties
P,p - Constituting Properties Property
O - Or else

Component Combinations

Statements often express alternatives or combinations of activities that need to be administered, thus reflecting logical combinations of entities, such as actors, actions, conditions, etc. To combine individual components logically, IG Script supports the notion of Component Combinations. As indicated above, separate components of the same type are interpreted as AND-combined. To explicitly specify the nature of the logical relationship (e.g., conjunction, inclusive/exclusive disjunction), combinations need to be explicitly specified in the following format:

componentSymbol(componentValue1 [logicalOperator] componentValue2)

Note: Per default, combinations are scoped to cover both component values (i.e., componentValue1 and componentValue2). In practice, it may be relevant to scope combinations more narrowly within a statement to capture the statement semantics, e.g., A((production [AND] handling) responsible) reflecting production responsible and handling responsible.

Examples:

A(actor [XOR] owner) (Alternative: A((actor [XOR] owner)))
A(involved (driver [OR] passenger)) (showcasing partial scoping of combination -- resulting in involved driver or involved passenger)
I(report [XOR] review)

Component combinations can be nested arbitrarily deep, i.e., any component can be a combination itself, for example:

componentSymbol(componentValue1 [logicalOperator] (componentValue2 [logicalOperator] componentValue3))

Example:

A(certifier [AND] (owner [XOR] inspector))
A((operator [OR] certifier) [AND] (owner [XOR] inspector))

Where components are linked by the same logical operator, the indication of precedence is optional.

Example:

A(certifier [AND] owner [AND] inspector)

Supported logical operators:

[AND] - conjunction (i.e., "and")
[OR] - inclusive disjunction (i.e., "and/or")
[XOR] - exclusive disjunction (i.e., "either or")
[NOT] - negation (i.e., "not") -- Note: not yet fully supported by parser

Invalid operators (e.g., [AN]) will be ignored in the parsing process.

Nested Statements

Selected components can be substituted by statements entirely. For example, the activation condition could consist of a statement on its own. The syntax is as follows:

componentSymbol{ componentSymbol(naturalLanguageText) ... }

As before, components are annotated using parentheses, and are now augmented with braces that delineate the nested statements.

Essentially, a fully annotated statement (e.g., A(), I(), Cex()) is framed by the statement annotation (e.g., Cac{ A(), I(), Cex() }).

Nesting can occur to arbitrary depth, i.e., a nested statement can contain another nested statement, e.g., Cac{ A(), I(), Cac{ A(), I(), Cac() } }, and can further include combinations as introduced in the following.

It is important to note that component-level nesting is limited to specific component types and properties.

Components for which nested statements are supported:

A,p - Attributes Property
Bdir - Direct Object
Bdir,p - Direct Object Property
Bind - Indirect Object
Bind,p - Indirect Object Property
Cac - Activation Condition
Cex - Execution Constraint
E,p - Constituted Entity Property
P - Constituting Properties
P,p - Constituting Properties Property
O - Or else

Nested Statement Combinations

Nested statements can, analogous to components, be combined to an arbitrary depth and using the same logical operators as for component combinations. Two constraints are to be considered:

Combinations of statements need to be surrounded by braces (i.e., { and }).
Only components of the same kind can be combined, with the component being indicated left to the surrounding braces (e.g., Cac{ ... }).

The following example correctly reflects the combination of two nested AND-combined activation conditions:

Cac{ Cac{ A(), I(), Cex() } [AND] Cac{ A(), I(), Cex() } }

This example, in contrast, will fail (note the differing component types Cac and Cex):

Cac{ Cac{ A(), I(), Cex() } [AND] Cex{ A(), I(), Cex() } }

Another important aspect are the outer braces surrounding the nested statement combination, i.e., form a componentSymbol{ ... [AND] ... } pattern (where logical operators can vary, of course).

Unlike the first example, the following one will result in an error due to missing outer braces:

Cac{ A(), I(), Cex() } [AND] Cac{ A(), I(), Cex() }

Nesting can be of multi levels, i.e., similar to component combinations, braces can be used to signal precedence when linking multiple component-level nested statements.

Example: Cac{ Cac{ A(), I(), Cex() } [AND] { Cac{ A(), I(), Cex() } [XOR] Cac{ A(), I(), Cex() } } }

Note: the inner brace indicating precedence (the expression ... { Cac{ A(), I(), Cex() } [XOR] Cac{ A(), I(), Cex() } } in the previous example) does not require the leading component symbol.

Non-nested and nested components can be used in the same statement (e.g., ... Cac( text ), Cac{ A(text) I(text) Cac(text) } ... ). Those are implicitly AND-combined.

Component Pair Combinations

Where a range of components together form an alternative in a given statement (require linkage to another range of components by logical operators), IG Script supports the ability to indicate such so-called component pair combinations or component tuple combinations.

For instance the statement A(actor) D(must) {I(perform action) on Bdir(object1) Cex(in a particular way) [XOR] I(prevent action) on Bdir(object2) by Cex(some specific means)} draws on the same actor, but -- in contrast to combinations of nested statements -- we see combinations of pairs of different components in this statement, effective rendering those as two distinct statements. Braces (without indication of the component symbol as done for nested statement combinations) are used to indicate the scope of a given component pair (here I(perform action) on Bdir(object1) Cex(in a particular way) as first component pair, and I(prevent action) on Bdir(object2) by Cex(some specific means) as the second one, both of which are combined by [XOR]; but both statements have the same attribute actor and deontic must). Operationally, the parser expands this into two distinct (but logically linked) statements:

A(actor) D(must) I(perform action) on Bdir(object1) Cex(in a particular way)
[XOR]
A(actor) D(must) I(prevent action) on Bdir(object2) by Cex(some specific means)

Note further that either pair or tuple can consist of an arbitrary number of components and can be imbalanced and use different components on either side. For example the statement A(actor) D(must) {I(perform action) [XOR] I(prevent action) on Bdir(object2) and affects Bind(object3) by Cex(some specific means)}, with a single component on the left side (I(perform action)) and multiple on the right side (I(prevent action) on Bdir(object2) and affects Bind(object3) by Cex(some specific means)) is equally valid input and decomposes into the statements

A(actor) D(must) I(perform action)
[XOR]
A(actor) D(must) I(prevent action) on Bdir(object2) and affects Bind(object3) by Cex(some specific means)

The use of component pairs can occur on any level of nesting, including top-level statements (as shown in the first example), within nested statements, statement combinations, and embed basic component combinations (e.g., A(actor) D(must) {I(perform action) on Bdir((objectA [AND] objectB)) Cex(in a particular way) [XOR] I(prevent action) on Bdir(objectC) by Cex(some specific means)}).

Furthermore, an arbitrary number of component pairs/tuples can be combined by using the syntax (similar to nested statement combinations) to indicate precedence amongst multiple component pairs with varying logical operators.

Example: A(actor1) {I(action1) Bdir(directobject1) [XOR] {I(action2) Bdir(directobject2) Bind(indirectobject2) [AND] I(action3) Bdir(directobject3) Cex(constraint3)}} Cac(condition1)

In this example the center part reflects the combination of different component pairs using the following logical pattern { ... [XOR] { ... [AND] ... }}, all of which share the same attribute (actor1) and activation condition (condition1).

Object-Property Relationships

Entities such as Attributes, Direct Object, Indirect Object, Constituted Entity and Constitutive Properties often carry private properties specific to a particular instance of that component (e.g., where multiple components of the same type exist).

An example is Bdir,p(shared) Bdir1,p(private) Bdir1(object1) Bdir(object2), where both Direct Objects (Bdir) have a shared property (Bdir,p(shared)), but only one has an additional private property (Bdir1,p(private)) that is exclusively linked to object1 (Bdir1(object1)).

In IG Script this is reflected based on suffices associated with the privately related components, where both need to carry the same suffix (i.e., 1 to signal direct linkage between Bdir1,p and Bdir1 in the above example).

The basic syntax (without annotations -- see below) is componentSymbolSuffix(component content), where the component symbol (componentSymbol) reflects the entity or property of concern, and the suffix (Suffix) is the identifier of the private linkage between particular instances of the related components (i.e, the suffix 1 identifies the relationship between Bdir1,p and Bdir1). The syntax further supports suffix information on properties (e.g., Bdir1,p1(content), Bdir1,p2(content2)) to reflect dependency structures embedded within given components or their properties (here: content as the first property, and content2 as the second property of Bdir1 -- where of analytical relevance).

The coding of component-property relationships ensures that the specific intra-statement relationships are correctly captured and accessible to downstream analysis.

Suffixes can be attached to any component type, but private property linkages (i.e., linkages between particular types of components/properties) are currently supported for the following component-property pairs:

A and A,p
Bdir and Bdir,p
Bind and Bind,p
E and E,p
P and P,p
I and Cex

Note that the extended syntax that supports Object-Property Relationships is further augmented with the ability to capture IG Logico's Semantic Annotations as discussed in the following.

Semantic Annotations

In addition to the parsing of component annotations and combinations of various kinds, the parser further supports semantic annotations of components according to the taxonomies outlined in the Institutional Grammar 2.0 Codebook.

The syntax (including support for suffices introduced above) is componentSymbolSuffix[semanticAnnotation](component content), i.e., any component can be augmented with [semantic annotation content], e.g., Cac[context=state](Upon certification).

This also applies to nested components, e.g., Cac[condition=violation]{A[entity=actor,animate](actor) I[act=violate](violates) Bdir[entity=target,inanimate](something)}, as well as for compound components, e.g., Bdir[type=target](leftObject [XOR] rightObject), in which case the annotation type=target is attached to both leftObject and rightObject in the generated output.

Examples

In the following, you will find selected examples that highlight the practical use of the features introduced above. These can be tested and explored using the parser.

Simple regulative statement:
- A(Operator) D(must) I(comply) Bdir(with regulations)
Component combination:
- A(Operator) D(must) I(comply with [AND] respond to) Bdir(regulations)
Component-level nesting:
- A(Farmer) D(must) I(comply) with Bdir(provisions) Cac{A(Farmer) I(has lodged) for Bdir(application) Bdir,p(certification)}
Component-level nesting over multiple levels:
- A(Farmer) D(must) I(comply) with Bdir(Organic Farming provisions) Cac{A(Farmer) I(has lodged) for Bdir(application) Bdir,p(certification) Cex(successfully) with the Bind(Organic Farming Program) Cac{E(Organic Program) F(covers) P,p(relevant) P(region)}}
Component-level nesting on various components (e.g., Bdir and Cac) embedded in combinations of nested statements (Cac):
- A(Program Manager) D(may) I(administer) Bdir(sanctions) Cac{Cac{A(Program Manager) I(suspects) Bdir{A(farmer) I(violates [OR] does not comply) with Bdir(regulations)}} [OR] Cac{A(Program Manager) I(has witnessed) Bdir,p(farmer's) Bdir(non-compliance) Cex(in the past)}}
Complex statement; showcasing combined use of various features (e.g., component-level combinations, nested statement combinations (activation conditions, Or else)):
- A,p(National Organic Program's) A(Program Manager), Cex(on behalf of the Secretary), D(must) I(inform), I(review [AND] (recertify [XOR] sanction)) Bdir(approved (certified production [AND] handling operations [AND] accredited certifying agents)) Cex(according to the guidelines provided in the (Act [XOR] regulations in this part)) if Cac{Cac{A(Program Manager) I(suspects [OR] establishes) Bdir(violations)} [AND] Cac{E(Program Manager) F(is authorized) for the P,p(relevant) P(region)}}, or else O{O{A,p(Manager's) A(supervisor) D(may) I(suspend [XOR] revoke) Bdir,p(Program Manager's) Bdir(authority)} [XOR] O{A(regional board) D(may) I(warn [OR] fine) Bdir,p(violating) Bdir(Program Manager)}}
Object-Property Relationships; showcasing linkage of private nodes with specific component instances (especially where implicit component-level combinations occur), alongside a shared property that apply to both objects (note that for this example, the AND-linkage between the different direct objects is implicit):
- A,p(Certified) A(agent) D(must) I(request) Bdir,p(independently) Bdir1,p(authorized) Bdir1(report) and Bdir2,p(audited) Bdir2(financial documents).
Semantic annotations; showcasing the basic use of annotations on arbitrary components:
- A,p[property=qualitative](Certified) A[role=responsible](organic farmers) D[stringency=obligation](must) I[type=act](submit) Bdir[type=inanimate](report) to Bind[role=authority](Organic Program Representative) Cac[context=tim](at the end of each year).

Common issues

Parentheses/braces need to match. The parser calls out if a mismatch exists. Parentheses are applied when components are specified (or combinations thereof), and braces are used to signal component-level nesting (i.e., statements embedded within components) and statement combinations (i.e., combinations of component-level nested statements).
- One important special case to bear in mind is that the input itself (i.e., the content to be encoded) may contain parentheses (e.g., ... under the following conditions 1) the actor must not ..., 2) the actor must not .... In such cases, the parentheses must be removed as part of the preprocessing of the text input, since the parser cannot differentiate between parentheses used for the encoding and the ones contained in the content.
Ensure whitespace between symbols and logical operators to ensure correct parsing. However, excess words between parentheses and logical operator are permissible. (Note: The parser has builtin tolerance toward various issues, but may not handle this correctly under all circumstances.)
- This example works: A(actor) I(act)
- This one is incorrect: A(actor)I(act) due to missing whitespace.
- This example works: (A(actor1) [AND] A(actor2))
- This one is incorrect: (A(actor1)[AND] A(actor2)) due to missing whitespace between Attributes component and operator.
Suffices to indicate object-property relationships need to be specified immediately following the component identifier, not following the property indicator.
- This example works: A1(actor1) A1,p(approved)
- This one does not: A1(actor1) A,p1(approved) -- here the suffix 1 would not be linked to the Attributes (A1), but qualify the property as the first property (as opposed to second (i.e., A,p2), etc.
In nested statement combinations, only components of the same kind can be logically linked (e.g., Cac{Cac{ ... } [AND] Cac{ ... }} will parse; Cac{Cac{ ... } [AND] Bdir{ ... }} will not parse).
Component pairs have a similar syntax as nested statement combinations (e.g., {I() Bdir() [AND ] I() Bdir()}), but no presence of leading component symbol since they, unlike nested statement combinations, allow for the combination of pairs of different components (e.g., action-object pairs), not just single components of the same kind (see Cac{ ... } syntax in the previous comment on nested statement combinations). This encoding leads to distinctively different outcomes in the statement structure (observable in tabular and visual output): For nested statement combinations the individual nested components generate individual nested statements linked to the same main statement (i.e., everything is still retained as a single statement), component pair combination encoding leads to the generation (extrapolation) of entirely separate but logically-linked statements in which the shared parts of the encoding are replicated across those additional statements. The following statements may be useful to observe the difference in the generated output structure (best done using the visual version of the parser):
Nested statement combination -- Combinations of a specific nested component type
- A(actor1) I(action1) Bdir{Bdir{A(actor2) I(action2)} [XOR] Bdir{A(actor2) I(action2)}} Cac(condition1)
Component pair combination -- Combination of multiple different component types (nested or non-nested)
- A(actor1) {I(action1) Bdir(object1) [XOR] I(action2) Bdir(object2)} Cac(condition1)

Deployment

This section is particularly focused on the setup of IG Parser, not the practical use discussed above. IG Parser can both be run on a local machine, or deployed on a server. The corresponding instructions are provided in the following, alongside links to the prerequisites.

Note that the server-deployed version is more reliable when considering production use.

Local deployment

The purpose of building a local executable is to run IG Parser on a local machine (primarily for personal use on your own machine).

Prerequisites:
- Install Go (Programming Language)
- Open a console (e.g., Linux terminal, Windows command line (cmd) or PowerShell (powershell))
- Clone this repository into a dedicated folder on your local machine
- Navigate to the repository folder
- Compile IG Parser in the corresponding console
  - Under Windows, execute go build -o ig-parser.exe ./web
    - This creates the executable ig-parser.exe in the repository folder
  - Under Linux, execute go build -o ig-parser ./web
    - This creates the executable ig-parser in the repository folder
- Run the created executable
  - Under Windows, run ig-parser (or ig-parser.exe) either via command line or by doubleclicking
  - Under Linux (or Windows PowerShell), run ./ig-parser
- Once started, it should automatically open your browser and navigate to http://localhost:8080/visual. Alternatively, use your browser to manually navigate to one of the URLs listed in the console output. By default, this is the URL http://localhost:8080 (and http://localhost:8080/visual respectively)
- Press Ctrl + C in the console window to terminate the execution (or simply close the console window)

Server deployment

The purpose of deploying IG Parser on a server is to provide a deployment that allows remote use on the local network or the internet, as well as for production-level deployment (see comments at the bottom).

Prerequisites:
- Install Docker
  - Quick installation of docker under Ubuntu LTS: sudo apt install docker.io
- Install Docker Compose
  - Quick installation of docker under Ubuntu LTS: sudo apt install docker-compose
- If not already installed (check with git version), install Git (optional if IG Parser sources are downloaded as zip file)
  - Quick installation of git under Ubuntu LTS: sudo apt install git
Deployment Guidelines
- Clone (or download and unzip) this repository into dedicated local folder
- Navigate into cloned (or unzipped) folder
- Make deploy.sh executable (chmod 740 deploy.sh)
- Run deploy.sh with superuser permissions (sudo ./deploy.sh)
  - This script automatically deploys the latest version of IG Parser by undeploying old versions, before pulling the latest version, building and deploying it.
  - For manual start, run sudo docker-compose up -d. Run sudo docker-compose down to stop the execution.
- Open browser and enter the server address and port 4040 (e.g., http://server-ip:4040)
Service Configuration & Additional Considerations
- By default, the Docker-deployed web service listens on port 4040, and logging is enabled in the subfolder ./logs.
- The service automatically restarts if it crashes or if the docker daemon restarts.
- Adjust the docker-compose.yml file to modify any of these characteristics.
- The service is exposed as http service by default. For production-level deployment, consider using an environment that provides additional security features (e.g., SSL, DDoS protection, etc.), as is the case for the deployed version linked at the top of this page.

ig-parser's People

Contributors

Stargazers

Watchers

Forkers

bi-wei kjaerandsen ipastore

ig-parser's Issues

Print shared elements in visual output

Currently, only first-order components and corresponding nested elements are included in visual output.

A useful feature would be the inclusion of shared information (sharedLeft, sharedRight) in the output in order to retrace the full statement content from the output.

Extend syntax to capture component combinations with distinctive logical operators and private properties

Components that carry private properties and are embedded in a component combination cannot be represented with logical operators other than an implicit AND.

In the following example, it is not possible to reflect an [OR] linkage between A1(farmer) and A2(citizen), where only farmer has a private property (A1,p(first)).

Example: A(Program Manager) D(may) I(administer) Bdir(sanctions) {Cac{A(Program Manager) I(suspects) Bdir{A1,p(first) A1(farmer) A2(citizen) I((violates [OR] does not comply)) with Bdir(regulations)}} [OR] Cac{A(Program Manager) I(has witnessed) Bdir,p(farmer's) Bdir(non-compliance) Cex(in the past)}}

Proposed syntax: ... Bdir{A1,p(first) (A1(farmer) [OR] A2(citizen)) ... or ... Bdir{(A1,p(first) A1(farmer) [OR] A2(citizen)) ...

Wrong parsing of private properties of component combinations

The following statements' P7,p is attached to all properties in the output, even though the properties are private.

Statement: P5,p1(public) P5(access) P5,p2(to the coasts for recreation purposes), (G) P7((coordination [AND] simplification)) P7,p(in order to ensure expedited governmental decision making for the management of coastal resources),

The problem appears to occur when combinations in constituting properties are present.

Simpler example: Bdir1((left [OR] right)) Bdir1,p(private) Bdir(general object) Bdir,p(shared)

Compound component linkages cannot be parsed correctly

Compound linkages as such as "inspect facility" [AND] "report violations" cannot be captured by the coding, since they would read I(inspect [AND] report) Bdir(facility [AND] violations), resulting in four statements, as opposed to two, i.e., "inspect facility, inspect violations, report facility, report violations".

Suggested syntax: ((I(inspect) Bdir(facility)) [AND] (I(report) Bdir(violations)))

This would equally apply to other forms of combinations (e.g., A,p(Regional) A(Manager) [AND] A,p(Local) A(Inspector)).

Nested multi-statement combinations with different logical operators (and precedence indication) do not resolve correctly

Combinations consisting of three or more horizontally nested statements that are linked by different logical operators (e.g., [AND] and [OR]) are not correctly resolved in output.

Where parentheses are used to indicate precedence, the logical operators are not correctly resolved (all resolve to [AND] irrespective of actual value).

Example: Bdir{A(actor2) I(aim2) (Cac{A(actor3) I(aim3) Bdir(something)} [OR] (Cac{A(actor4) I(aim4) Bdir(something else)} [AND] Cac{A(actor55)}))}

Where braces are used to indicate precedence, the parsing fails on the nesting component and output for this component is entirely suppressed.

Example: Bdir{A(actor2) I(aim2) {Cac{A(actor3) I(aim3) Bdir(something)} [OR] {Cac{A(actor4) I(aim4) Bdir(something else)} [AND] Cac{A(actor55)}}}}

Add raw and encoded statement to tabular output

When generating the decomposed tabular output, introduce additional preceding columns that hold the original statement (i.e., the statement as it appears in text) and the encoded statements (i.e., the statement encoded in IG Script). This should appear in additional to the regular decomposed output and might be controllable via options.

Parsing continues (erroneously) due to wrong annotation of combinations (but correct bracket count)

If a statement has a correct bracket count (equal number of opening and closing parentheses or brackets), the parsing nevertheless continues.

Example: Bdir,p(the poor [AND] those in vulnerable situations)) Bdir((exposure [AND] vulnerability)

Note the incomplete combinations - but errors balance out.

Logical operators in multi-combination component entries do not reflect all logical linkages in output

As a user, the Google Sheets output generation does not correctly resolve all logical linkages for multi-combination component entries. For instance, the component Cex(shared left (left [XOR] right) as well as (green [AND] blue) shared right) does not correctly reflect the second logical linkage (AND) in the output.

Make header output optional

The Google Sheets output includes header output for each parsed statement. As as user, I would want to be able to selectively suppress the output.

Nested component-pair combinations as properties don't expand correctly in tabular output

The statement Such E(notification) M(shall) F(provide): (1) A P(description of each noncompliance); (2) The P(facts upon which the notification of noncompliance is based); and (3) The P1(date) P1,p{by which the A(certified operation) D(must) {I(rebut [XOR] correct) Bdir,p(each) Bdir(noncompliance) [AND] I(submit) Bdir,p(supporting) Bdir(documentation) of Bdir,p(each such correction) Cac(when correction is possible)}}. prints incorrectly.

When printing as IG Core output, the last component pair is omitted (i.e., ... [AND] I(submit) Bdir,p(supporting) Bdir(documentation) of Bdir,p(each such correction) Cac(when correction is possible)}}.).

When printing as IG Extended, the references to the nested statements are not correctly displayed or missing (e.g., only showing {123}.1 as opposed to {123}.1,{123}.2 in Constituting Properties Property Reference field).

Furthermore, the logical linkages between the expanded component pairs statements are incomplete for the first statement ([][{123}.2]) and missing for the second pair.

Output (elements of concern emphasized):

Statement ID|Attributes|Attributes Property|Attributes Property Reference|Deontic|Aim|Direct Object|Direct Object Reference|Direct Object Property|Direct Object Property Reference|Indirect Object|Indirect Object Reference|Indirect Object Property|Indirect Object Property Reference|Activation Condition|Activation Condition Reference|Execution Constraint|Execution Constraint Reference|Constituted Entity|Constituted Entity Property|Constituted Entity Property Reference|Modal|Constitutive Function|Constituting Properties|Constituting Properties Reference|Constituting Properties Properties|Constituting Properties Properties Reference|Or Else Reference|Logical Linkage (Statements)|Logical Linkage (Components)|
'123.1| | | | | | | | | | | | | | | | | |notification| | |shall|provide|description of each noncompliance| | | | | |[bAND].P.[123.2];[bAND].P.[123.3]|
'123.2| | | | | | | | | | | | | | | | | |notification| | |shall|provide|facts upon which the notification of noncompliance is based| | | | | |[bAND].P.[123.1];[bAND].P.[123.3]|
'123.3| | | | | | | | | | | | | | | | | |notification| | |shall|provide|date| | |{123}.1| | |[bAND].P.[123.1];[bAND].P.[123.2]|
'{123}.1.1|certified operation| | |must|rebut|noncompliance| |each| | | | | | | | | | | | | | | | | | | |[][{123}.2]|[XOR].I.[{123}.1.2]|
'{123}.1.2|certified operation| | |must|correct|noncompliance| |each| | | | | | | | | | | | | | | | | | | |[][{123}.2]|[XOR].I.[{123}.1.1]|
'{123}.2.1|certified operation| | |must|submit|documentation| |supporting| | | | | |when correction is possible| | | | | | | | | | | | | | |[bAND].Bdir,p.[{123}.2.2]|
'{123}.2.2|certified operation| | |must|submit|documentation| |each such correction| | | | | |when correction is possible| | | | | | | | | | | | | | |[bAND].Bdir,p.[{123}.2.1]|

Semantic annotations for nesting components are not shown in visual output

Where components are nesting institutional statements (or combinations thereof), the annotation of this component is not displayed as part of the visual output.

Example:
Cac[state]{A[actor](actor) I[activity](aim) Bdir[artifact](object)}

The annotation state is not displayed in the visual tree, whereas all other annotations are reflected. But the annotation appears in the Google Sheets output.

Combinations of various forms of component-level nesting does not parse

The combined use of different nesting patterns does not resolve during parsing. The example is directly from the README.

Example: A(Program Manager) D(may) I(administer) Bdir(sanctions) {Cac{A(Program Manager) I(suspects) Bdir{A(farmer) I((violates [OR] does not comply)) with Bdir(regulations)}} [OR] Cac{A(Program Manager) I(has witnessed) Bdir,p(farmer's) Bdir(non-compliance) Cex(in the past)}}

Externalize UI configuration into dedicated config file

Currently, the UI can be manipulated from within handler. This should be externalized.

Complexity calculation in visual output for component pair combinations leads to failure

The complexity calculation (DoV) for statements containing component pair combinations leads to unrecoverable failure in visual output version.

Add support for batch processing of datasets

At the current stage, the parser only processes individual statements, both for tabular and visual output. It should include a feature to allow iterative processing of large numbers of coded statements and allow for batched outputs (e.g., statement trees in visual mode, concatenated atomic statements in tabular mode).

Annotations on combinations (i.e., preceding the opening brace) are not parsed and ignored in output

If a component-level nested combination is preceded by an annotation (but without component), the entire nested combination is ignored in the output.

Example: [state]{Cac[state]{A[role=monitored,type=animate](Operations) I[act=violate](were (non-compliant [OR] violated)) Bdir[type=inanimate](organic farming provisions)} [AND] Cac[state]{A[role=enforcer,type=animate](Manager) I[act=terminate](has concluded) Bdir[type=activity](investigation)}}

The leading [state] will lead to matching failure --> likely in Regex.

Google Sheets output does not parse if input contains quotation marks (")

Google Sheets cannot parse the generated tabular structure if the original statement (and encoded statement) includes quotation marks.

Example:
E,p(Verifiable) E(observations) F[Rtype=information](are labeled) P("Needs ID") Cac{Cac(until) E(they) {FRtype=boundary P(Research Grade status) [OR] F[Rtype=aggregation](are voted) P(Casual) Cex(via the Data Quality Assessment)}}

One direction would be substitution as part of output generation. For example:

E,p(Verifiable) E(observations) F[Rtype=information](are labeled) P('Needs ID') Cac{Cac(until) E(they) {FRtype=boundary P(Research Grade status) [OR] F[Rtype=aggregation](are voted) P(Casual) Cex(via the Data Quality Assessment)}}

Nested elements in property tree are not printed in correct color

For nested elements in property trees, in the visual output the elements are not printed in the correct color. The challenge is that the property name (including ,p) is not inherited across all nested nodes (the visual output reacts to the component name).

Example: P10(assistance) P10,p1(to support comprehensive (planning [AND] conservation [AND] management) for living marine resources)

Automated detection of nested statement combinations if coded as component pairs

Introduce ability to automatically recode nested statement combinations encoded as component pairs as nested statement combinations.

Encoding such as { Cac{ ... } [XOR] Cac{ ... } } (i.e., encoding as component pair combination) should be automatically detected (criteria: single component of same type on either side of combination) and recoded as nested statement combination (i.e., Cac{ Cac{ ... } [XOR] Cac{ ... } }) to support coding consistency.

Excessive symbols lead to parser fail in statement combinations

In the following statement, the first comma between braces, as well as the period between braces at the end of the combination syntax leads to a parsing error. This will require review of the regex for combination detection.

{,Cac{when the A(Program Manager) I(believes) that Bdir{a A,p(certified) A(operation) I((has violated [OR] is not in compliance)) Bdir(with (the Act [OR] regulations in this part))}}, [OR] Cac{when a A((certifying agent [OR] State organic program’s governing State official)) I(fails to enforce) Bdir((the Act [OR] regulations in this part)).}.}

Parsing of properties in component-level nested statements fails; Or else combinations are not parsed.

The following example statement from the README does not work (yet). Contentious aspects is the nested P,p(relevant) that prevents the parsing of the nested statements, as well as the combined Or else component (that does not appear in output).

A,p(National Organic Program's) A(Program Manager), Cex(on behalf of the Secretary), D(must) I(inspect), I((review [AND] (revise [AND] resubmit))) Bdir(approved (certified production and [AND] handling operations and [AND] accredited certifying agents)) Cex(for compliance with the (Act or [XOR] regulations in this part)) if {Cac{A(Program Manager) I((suspects [OR] establishes)) Bdir(violations)} [AND] Cac{E(Program Manager) F(is authorized) for the P,p(relevant) P(region)}}, or else {O{A,p(Manager's) A(supervisor) D(may) I((suspend [XOR] revoke)) Bdir,p(Program Manager's) Bdir(authority)} [XOR] O{A(regional board) D(may) I((warn [OR] fine)) Bdir,p(violating) Bdir(Program Manager)}}

Multi-level nested statements that include nested combinations is not correctly parsed (intermediate structure is ignored in output)

Statements that contain combinations on higher nesting levels are not correctly parsed and represented.

Example: A(actor1) I(aim1) Bdir{A(actor2) I(aim2) {Cac{A(actor3) I(aim3) Bdir(something)} [OR] Cac{A(actor4) I(aim4) Bdir(something else)}}}

During parsing the intermediate nested element (Bdir) is omitted in the output, but the Cac combinations (embedded within the nested object) are correctly parsed.

Linear (multi-level) nesting without combinations parses correctly:

A(actor1) I(aim1) Cac{A(actor2) I(aim2) Cac{A(actor3) I(aim3) Bdir(something) Cac{A(actor4) I(aim4) Bdir(something else) Cac{A(actor5) I(act5) Cac{A(actor6) I(act6) Cac{A(actor7) I(act7)}}}}}}

Linear parsing across components works properly as well:

A(actor1) I(aim1) Cac{A(actor2) I(aim2) Cac{A(actor3) I(aim3) Bdir(something) Bdir{A(actor4) I(aim4) Bdir(something else) Cac{A(actor5) I(act5) Bind{A(actor6) I(act6) Cac{A(actor7) I(act7)}}}}}}

Copying multi-line text on Firefox does not work properly.

When trying to copy multi-line text in the editor using the Firefox browser the text copied to the clipboard is not the full highlighted section. For example in the editor when selecting the whole third example statement only a blank line is copied to the clipboard, while for other examples there are cases where some of the text is copied, but not all. Copying text on the first line only seems to function, but not the text which spans multiple lines.

This was tested using Firefox on a Windows 10 machine without any extensions. The same behaviour was present in the Ace.js kitchen sink demo. The same behaviour could not be recreated using Firefox on Ubuntu 22.04.3 LTS, where the copying worked as expected.

Property annotations are not completely represented in labels if multiple logically linked properties exist

In visual output, the enumerated printing of properties as labels associated with particular components are not properly printed. In the following example, the second property (revocation) is omitted in the visual output with collapsed property tree.

Example: A(Program Manager) D(may) I(initiate) Bdir,p((suspension [XOR] revocation)) Bdir(proceedings)

The printing in expanded form (i.e., property tree) is correct.

Second-order nested private properties are not properly associated with corresponding component

In the following statement, the element P10,p1,p1 is not uniquely associated with the P10 property, but interpreted as shared across all properties (i.e., P1and P10). This requires consideration of multi-level property structures, while additionally private to P10 (i.e., occurs twice in P10 property tree).

Statement: P1(protection) P1,p1(of natural resources, including (wetlands [AND] floodplains [AND] estuaries [AND] beaches [AND] dunes [AND] barrier islands [AND] coral reefs [AND] fish and wildlife and their habitat) within the coastal zone) P10(assistance) P10,p1(to support comprehensive (planning [AND] conservation [AND] management) for living marine resources) P10,p1,p1(including planning for (the siting of (pollution control [AND] aquaculture facilities) within the coastal zone [AND] improved coordination between ((State [AND] Federal) coastal zone management agencies [AND] (State [AND] wildlife) agencies)))

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.