akoehn / alto Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 15.81 MB

License: Other

Scala 0.09% Shell 0.05% ANTLR 0.32% Java 90.40% Groovy 8.05% Jupyter Notebook 1.09%

alto's People

Contributors

Stargazers

Watchers

alto's Issues

Warmup in Alto Lab

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

When a task in Alto Lab has warmup > 0, it will attempt to run the warmup and the experiment at the same time. This defeats the purpose of warmup, and may mess up time measurements.

Cause of the problem: In the warmup iteration (CommandLineInterface.java:256), the call to program.run() returns before all warmup tasks are completed. In fact, it already returns before all warmup tasks have been submitted to the ForkJoinPool in Program.java. These tasks are still submitted to the pool. It is almost as if program#run were called in its own separate thread, and the task submissions took place in the background. But I can find no place where a new thread is being spawned.

This happens both in verbose mode and non-verbose, so the problem is probably not related to the use of the ConsoleProgressBar.

Workaround: Don't use warmup for now.

asConcreteAutomatonBottomUp is broken

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

In TreeAutomaton.asConcreteAutomatonBottomUp the two lines:

processAllRulesBottomUp(rule -> ret.addRule(ret.createRule(getStateForId(rule.getParent()), rule.getLabel(this), getStatesFromIds(rule.getChildren()))));
finalStates.stream().forEach(finalState -> ret.addFinalState(finalState));

are wrong. The final states are only added correctly, if the new automaton numbers the states in the same way that the old automaton does (not guaranteed). For the rules, the weight is not copied.

Clean up object visualization in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

There is currently a menu item Tools -> Visualize input object in GUIMain. This mostly works, but the context menus in the visualization window (e.g. copy as tikz) don't quite work. Fix it.

long waiting time for weight updates

Original report by rknaebel (Bitbucket: rknaebel, GitHub: rknaebel).

So it is not really a problem but often if alto learns lots of weights for a grammar the terminal says that alto is done but after training it needs lots of time to update the new weights in the grammar.
At this point I m often not sure whether alto is already crashed or just updating... Please improve the time alto needs to update the weights or make some progress bar or something else to show that alto is currently processing. Preference for the first one.

Condensed-BU parsing does not work for AM algebra in GUI

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

GenericCondensedIntersectionAutomaton problems with loop in decomposition automaton

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

When running the test in the attached file there are trees that the SiblingFinder based intersection contains that the intersection based on GenericCondensedIntersectionAutomaton does not contain. Running the test will output one of the trees that are not found. The problems seems to be that loops in the decomposition automaton are processed correctly.

Read instance comments

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

The Instance class in the corpus package supports optional comments. These can currently be set programmatically, and will be written into the corpus file correctly. But when reading the corpus, they are simply skipped as comment lines.

Change this so the comments before each instance in the corpus are read into the comment field of the Instance object when reading a corpus.

BinarizingAlgebra returns wrong signature

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

BinarizingAlgebra uses its own internal signature, which is different from the signature of the super class, but when getSignature is changed to return the local signature then a test in CoarseToFineParserTest fails:

testPtb(de.up.ling.irtg.automata.coarse_to_fine.CoarseToFineParserTest) Time elapsed: 0.025 sec <<< ERROR! java.lang.NegativeArraySizeException

the problems seems to be that a certain grammar can no longer be read in in binary format. Returning the correct signature would be better, in case one needs to work with it.

Tooltip on derivation tree node should show rule of automaton

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

This would be really useful in debugging.

Challenge is that the derivation tree may be relative to a chart, and there may be multiple rules in the chart that use the same terminal symbol. Thus we have to recompute or somehow remember the rule tree that gave rise to this derivation tree.

unify ckyDfsInBottomUpOrder methods

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

The methods

GenericCondensedIntersectionAutomaton.ckyDfsForStatesInBottomUpOrder ,
NonCondensedIntersectionAutomaton.ckyDfsForStatesInBottomUpOrder,
CondensedCoarsestParser.ckyDfsForStatesInBottomUpOrder and
TreeAutomaton.ckyDfsInBottomUpOrder
seem to do mostly the same thing -- can we write one more abstract version to unify them all?

Maxent training hangs GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

When training a maxent IRTG from the GUI, the GUI can become unresponsive after training is completed. I have seen this happen when training a Geoquery grammar (340 rules, with a RuleNameFeature for each rule) on a Geoquery training corpus (500 instances).

Furthermore, the progress bar that shows during training only updates occasionally, not fluidly.

I suspect this is because the Swing EDT receives a lot of events during training that the invokeLater in withProgressListener never executes.

Select intersection and invhom algorithms in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

We now have a large number of such algorithms implemented in Alto, and users need to be able to use them from the GUI. Perhaps we can add dropdown boxes for selecting them to the "Parse ..." dialogue (where we can enter inputs).

Ideally, the dialogue should specify reasonable defaults for the algorithms.

fix Tree#select for nodes with more than 10 children

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

I last checked this a few months ago, not sure if it still exists.

the Tree#select method uses digits to describe paths, and fails if a node has more than 10 children (i.e. 0-9 are all used up). There is at least a partial implementation that circumvents this issue, but it is not the default and may have slower runtime.

alab sometimes arbitrarily produces NaN results

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

See e.g. Experiment 775, line 14.

Homomorphism applyRaw computes unnecessary subtrees

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

applyRaw computes the homomorphic images for subtrees that never influence the end-result. Not doing this might save computations and also enable use to have partially defined homomorphisms.

Accepts works only for Automata that support getRulesBottomUp

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

TreeAutomaton.accepts(Tree) relies on run(Tree) which in turn relies on runRaw(Tree), which only works if the automaton supports getRulesBottomUp, in the interest of supporting specialized implementations the method should check what kind of queries the the automaton actually supports and then use an analogue of run(Tree) which works top down instead of bottom up (even if that means that we may be unable to exploit bottom up determinism).

SortedLanguageIterator computes MANY items

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Parse "Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 ." with binarized.irtg from the PTB tutorial and open the language window. This will take several seconds to initialize the SortedLanguageIterator. It derives millions of items, despite the fact that we are only looking for the 1-best item and the parse chart only has 60000 rules. Something's wrong here, we should fix it.

getRulesBottomUp in StringAlgebra.CkyAutomaton not restrictive enoug

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

When asked for rules bottom up, with a certain label (other than concat), getRulesBottomUp will always return all the rules that read that label. This is independent of the children that where given. This is not in line with the contract of the method, as I understand it. I currently cannot judge how much damage to the whole system would result from fixing this, I will test this later.

As a reminder to myself: hasRuleWithPrefix could also easily be made much more restrictive, without sacrificing efficiency.

WideStringAlgebra does not redefine protected evaluate

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

The WideStringAlgebra redefines the concatenation symbols to be conc+num, but the evaluate(String,List<List>) method is not redefined, this should be remedied in order to make its behaviour consistent with the decomposition automata that are generated

altolab: warmup runs parallel to experiment, not before

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

In altolab experiments -- e.g. for Task 41, try

alab 41 -Vinvhom='veryLazyInvhom(decomp, irtg.[graph].hom)' -Vintersection='explicitFromVeryLazy(veryLazyIntersection(irtg.auto, invhom))'

-- the warmup and the main experiment instances seem to run in parallel, instead of all warmup first.

Reflection issues with Java 9

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Currently, Alto does not work with Java 9 because of the following reflection-related problems:

org.simplericity.macify triggers an error "Illegal reflective access by org.simplericity.macify.eawt.DefaultApplication (file:/private/tmp/alto/target/alto-2.2-SNAPSHOT-jar-with-dependencies.jar) to method com.apple.eawt.Application.getApplication()". There does not seem to be a more recent version of macify that fixes this.
Alto Lab doesn't find static methods in Program#getAllAnnotatedStaticMethods, probably because no classes are found in Program#getAllClassesIn_irtg_Classpath (but this needs to be double-checked).

Ideas for fixing these issues:

Replace macify with something that does the same thing, but works with Java 9.
Replace Alto Lab's class-finding mechanism with a class-finding mechanism like in InputCodec and OutputCodec, where the classes that can be used from Alto Lab tasks must be explicitly declared in a file.

InverseHomomorphism Automaton is not fully correct

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

InverseHomomorphismAutomaton seems to have problems, especially when computing rules top down that have no children on the rhs automaton side. But there needs to be more thorough overall testing of the algorithms in the class.

Chart computation for unannotated corpora should be delayed

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Right now, it happens immediately when the corpus is loaded from the GUI, and all charts are held in memory. This is infeasible for larger corpora. Instead, we should simply load the unannotated instances and then compute the charts by need.

Nulls when num(IRTG Interpretations) > num(Corpus Interpretations)

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

If the IRTG used to read a corpus has more interpretations that the corpus declares, then the instances will have input object maps with null values in them.

altolab web interface can't handle NaN in CSV export

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

When exporting a CSV for an experiment with NaN entries (e.g. experiment 775), an Internal Server Error is produced and no CSV is exported.

Make CTF parsing available in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Coarse-to-fine parsing is currently only available through Alto Lab etc. It should also be usable from the GUI. Maybe it can be merged into the "Parse ..." dialog, but there is the added challenge that we have to ask for an extra input file (the fine-to-coarse mapping).

Check for cyclicity as part of TreeAutomaton#evaluateInSemiring

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Right now, evaluateInSemiring will return an (incorrect) value if the tree automaton has cycles. We should check for cycles in TreeAutomaton#getStatesInBottomUpOrder and throw an exception if a cycle occurs.

This has two advantages. First, it ensures that evaluateInSemiring is only used when it returns the correct value. Second, it makes the use of TreeAutomaton#isCyclic in JLanguageViewer unnecessary (the if block can just be replaced by catching the exception). For some reason, this method is incredibly slow for some grammars, and it would be good to get rid of it.

Clean up decomposition automata in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

There is currently a menu item Tools -> Compute decomposition automaton in GUIMain. I think this function works 90%, but I seem to remember that there are sometimes bugs. Test it and fix if needed.

multiedges in SGraphs

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

add multi-edge handling to both the SGraph and BoundaryRepresentation classes

Atomic interpretations field for SetAlgebra missing from GUI

Original report by Nikos Engonopoulos (Bitbucket: engonopoulos, ).

The atomic interpretations field which used to be present on the GUI in the Inputs dialog is now missing. As a result one cannot parse set objects with a grammar for REG (e.g. with reg.irtg in the examples/ directory), because the first order model is always empty.

Use of input codecs in GUI

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

The method for reading in inputs for algebras in the GUI relies on the parseString() method. It would be more flexible, if input codecs where used.

Merge SetAlgebra into SubsetAlgebra

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

SetAlgebra is just a SubsetAlgebra (with slightly different operations), specialized to relations over a model. Let's refactor it so it just creates a SubsetAlgebra internally and we can avoid code duplication.

Alto Lab Handling of nulĺ results

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

When using alto-lab on falken-3 with e.g.:

java -Xmx8G -cp alto-2.1-SNAPSHOT-jar-with-dependencies.jar de.up.ling.irtg.laboratory.CommandLineInterface 64 --data 24 -c "with additional data 24" --reload

sometimes NULL results (e.g. if there is no parse tree for a given input) will cause errors such as:

Exception in thread "ForkJoinPool-2-worker-1" java.lang.NullPointerException
at de.up.ling.irtg.laboratory.JsonResultManager.acceptResult(JsonResultManager.java:92)
at de.up.ling.irtg.laboratory.Program.lambda$run$2(Program.java:781)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

This does not stop the experiment from running however.

Condensed intersection for recursive decomposition grammars

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).

Now works for direct loops (q -> {...}(q)) , as far as I can tell. I need to come back to this later for general recursion.

The GenericCondensedIntersectionAlgorithm only works if the right (condensed) automaton has no recursion. It will e.g. undergenerate if this automaton has rules of the form "q -> {f}(q)" and this rule is processed before the other rules that can expand q.

Condensed automata arise, for example, in parsing where the IRTG has rules like

A -> r(B)
[i] ?1

where i is an input interpretation. This is not a rare case, so we should figure out how to deal with this correctly.

One situation in which this happens in particular is if we introduce a "super-startsymbol" in order to model a probability distribution over different "real" start symbols. The rules for the super-start-symbol map to ?1 on all interpretations.

invisible feature weights

Original report by rknaebel (Bitbucket: rknaebel, GitHub: rknaebel).

If one selects a feature-weight pair after training a maximum entropy model
then the selected entry becomes invisible.

makeAllRulesExplicit does not work reliably for BU automata

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).

Calling makeAllRulesExplicit processes all rules, but does not reliably store them. In particular, some decomposition automata appear empty in the gui, even when adding makeAllRulesExplicit to the code.

isCyclic is very inefficient and only works correctly when applied to concrete automata

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

isCyclic may revisit states again that it has already explored, which makes it very inefficient. It also uses a fixed array to keep track of the states which have been seen on the current path down towards a terminal, which is initialized at the beginning for the size of the states known at that point, which means that the code does not work for lazy automata.

Check Stability of TreeAutomaton.getWeightRaw with bottom up non-determinism

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).

The getWeightRaw method creates a pair for every state and weight that it finds bottom up, but when multiple state combinations create the same parent state, then this should lead to multiple pairs, that could be summed up, this might lead to exponential behaviour for certain trees and automata.

akoehn / alto Goto Github PK

alto's People

Contributors

Stargazers

Watchers

alto's Issues

Recommend Projects

Recommend Topics

Recommend Org