akoehn / alto Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
When a task in Alto Lab has warmup > 0, it will attempt to run the warmup and the experiment at the same time. This defeats the purpose of warmup, and may mess up time measurements.
Cause of the problem: In the warmup iteration (CommandLineInterface.java:256), the call to program.run() returns before all warmup tasks are completed. In fact, it already returns before all warmup tasks have been submitted to the ForkJoinPool in Program.java. These tasks are still submitted to the pool. It is almost as if program#run were called in its own separate thread, and the task submissions took place in the background. But I can find no place where a new thread is being spawned.
This happens both in verbose mode and non-verbose, so the problem is probably not related to the use of the ConsoleProgressBar.
Workaround: Don't use warmup for now.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
In TreeAutomaton.asConcreteAutomatonBottomUp the two lines:
processAllRulesBottomUp(rule -> ret.addRule(ret.createRule(getStateForId(rule.getParent()), rule.getLabel(this), getStatesFromIds(rule.getChildren()))));
finalStates.stream().forEach(finalState -> ret.addFinalState(finalState));
are wrong. The final states are only added correctly, if the new automaton numbers the states in the same way that the old automaton does (not guaranteed). For the rules, the weight is not copied.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
There is currently a menu item Tools -> Visualize input object in GUIMain. This mostly works, but the context menus in the visualization window (e.g. copy as tikz) don't quite work. Fix it.
Original report by rknaebel (Bitbucket: rknaebel, GitHub: rknaebel).
So it is not really a problem but often if alto learns lots of weights for a grammar the terminal says that alto is done but after training it needs lots of time to update the new weights in the grammar.
At this point I m often not sure whether alto is already crashed or just updating... Please improve the time alto needs to update the weights or make some progress bar or something else to show that alto is currently processing. Preference for the first one.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
When running the test in the attached file there are trees that the SiblingFinder based intersection contains that the intersection based on GenericCondensedIntersectionAutomaton does not contain. Running the test will output one of the trees that are not found. The problems seems to be that loops in the decomposition automaton are processed correctly.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
The Instance class in the corpus package supports optional comments. These can currently be set programmatically, and will be written into the corpus file correctly. But when reading the corpus, they are simply skipped as comment lines.
Change this so the comments before each instance in the corpus are read into the comment field of the Instance object when reading a corpus.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
BinarizingAlgebra uses its own internal signature, which is different from the signature of the super class, but when getSignature is changed to return the local signature then a test in CoarseToFineParserTest fails:
testPtb(de.up.ling.irtg.automata.coarse_to_fine.CoarseToFineParserTest) Time elapsed: 0.025 sec <<< ERROR! java.lang.NegativeArraySizeException
the problems seems to be that a certain grammar can no longer be read in in binary format. Returning the correct signature would be better, in case one needs to work with it.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
This would be really useful in debugging.
Challenge is that the derivation tree may be relative to a chart, and there may be multiple rules in the chart that use the same terminal symbol. Thus we have to recompute or somehow remember the rule tree that gave rise to this derivation tree.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
The methods
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
When training a maxent IRTG from the GUI, the GUI can become unresponsive after training is completed. I have seen this happen when training a Geoquery grammar (340 rules, with a RuleNameFeature for each rule) on a Geoquery training corpus (500 instances).
Furthermore, the progress bar that shows during training only updates occasionally, not fluidly.
I suspect this is because the Swing EDT receives a lot of events during training that the invokeLater in withProgressListener never executes.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
We now have a large number of such algorithms implemented in Alto, and users need to be able to use them from the GUI. Perhaps we can add dropdown boxes for selecting them to the "Parse ..." dialogue (where we can enter inputs).
Ideally, the dialogue should specify reasonable defaults for the algorithms.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
I last checked this a few months ago, not sure if it still exists.
the Tree#select method uses digits to describe paths, and fails if a node has more than 10 children (i.e. 0-9 are all used up). There is at least a partial implementation that circumvents this issue, but it is not the default and may have slower runtime.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
See e.g. Experiment 775, line 14.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
applyRaw computes the homomorphic images for subtrees that never influence the end-result. Not doing this might save computations and also enable use to have partially defined homomorphisms.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
TreeAutomaton.accepts(Tree) relies on run(Tree) which in turn relies on runRaw(Tree), which only works if the automaton supports getRulesBottomUp, in the interest of supporting specialized implementations the method should check what kind of queries the the automaton actually supports and then use an analogue of run(Tree) which works top down instead of bottom up (even if that means that we may be unable to exploit bottom up determinism).
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Parse "Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 ." with binarized.irtg from the PTB tutorial and open the language window. This will take several seconds to initialize the SortedLanguageIterator. It derives millions of items, despite the fact that we are only looking for the 1-best item and the parse chart only has 60000 rules. Something's wrong here, we should fix it.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
When asked for rules bottom up, with a certain label (other than concat), getRulesBottomUp will always return all the rules that read that label. This is independent of the children that where given. This is not in line with the contract of the method, as I understand it. I currently cannot judge how much damage to the whole system would result from fixing this, I will test this later.
As a reminder to myself: hasRuleWithPrefix could also easily be made much more restrictive, without sacrificing efficiency.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
The WideStringAlgebra redefines the concatenation symbols to be conc+num, but the evaluate(String,List<List>) method is not redefined, this should be remedied in order to make its behaviour consistent with the decomposition automata that are generated
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
In altolab experiments -- e.g. for Task 41, try
alab 41 -Vinvhom='veryLazyInvhom(decomp, irtg.[graph].hom)' -Vintersection='explicitFromVeryLazy(veryLazyIntersection(irtg.auto, invhom))'
-- the warmup and the main experiment instances seem to run in parallel, instead of all warmup first.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Currently, Alto does not work with Java 9 because of the following reflection-related problems:
Ideas for fixing these issues:
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
InverseHomomorphismAutomaton seems to have problems, especially when computing rules top down that have no children on the rhs automaton side. But there needs to be more thorough overall testing of the algorithms in the class.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Right now, it happens immediately when the corpus is loaded from the GUI, and all charts are held in memory. This is infeasible for larger corpora. Instead, we should simply load the unannotated instances and then compute the charts by need.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
If the IRTG used to read a corpus has more interpretations that the corpus declares, then the instances will have input object maps with null values in them.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
When exporting a CSV for an experiment with NaN entries (e.g. experiment 775), an Internal Server Error is produced and no CSV is exported.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Coarse-to-fine parsing is currently only available through Alto Lab etc. It should also be usable from the GUI. Maybe it can be merged into the "Parse ..." dialog, but there is the added challenge that we have to ask for an extra input file (the fine-to-coarse mapping).
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Right now, evaluateInSemiring will return an (incorrect) value if the tree automaton has cycles. We should check for cycles in TreeAutomaton#getStatesInBottomUpOrder and throw an exception if a cycle occurs.
This has two advantages. First, it ensures that evaluateInSemiring is only used when it returns the correct value. Second, it makes the use of TreeAutomaton#isCyclic in JLanguageViewer unnecessary (the if block can just be replaced by catching the exception). For some reason, this method is incredibly slow for some grammars, and it would be good to get rid of it.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
There is currently a menu item Tools -> Compute decomposition automaton in GUIMain. I think this function works 90%, but I seem to remember that there are sometimes bugs. Test it and fix if needed.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
add multi-edge handling to both the SGraph and BoundaryRepresentation classes
Original report by Nikos Engonopoulos (Bitbucket: engonopoulos, ).
The atomic interpretations field which used to be present on the GUI in the Inputs dialog is now missing. As a result one cannot parse set objects with a grammar for REG (e.g. with reg.irtg in the examples/ directory), because the first order model is always empty.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
The method for reading in inputs for algebras in the GUI relies on the parseString() method. It would be more flexible, if input codecs where used.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
SetAlgebra is just a SubsetAlgebra (with slightly different operations), specialized to relations over a model. Let's refactor it so it just creates a SubsetAlgebra internally and we can avoid code duplication.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
When using alto-lab on falken-3 with e.g.:
java -Xmx8G -cp alto-2.1-SNAPSHOT-jar-with-dependencies.jar de.up.ling.irtg.laboratory.CommandLineInterface 64 --data 24 -c "with additional data 24" --reload
sometimes NULL results (e.g. if there is no parse tree for a given input) will cause errors such as:
Exception in thread "ForkJoinPool-2-worker-1" java.lang.NullPointerException
at de.up.ling.irtg.laboratory.JsonResultManager.acceptResult(JsonResultManager.java:92)
at de.up.ling.irtg.laboratory.Program.lambda$run$2(Program.java:781)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
This does not stop the experiment from running however.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Now works for direct loops (q -> {...}(q)) , as far as I can tell. I need to come back to this later for general recursion.
The GenericCondensedIntersectionAlgorithm only works if the right (condensed) automaton has no recursion. It will e.g. undergenerate if this automaton has rules of the form "q -> {f}(q)" and this rule is processed before the other rules that can expand q.
Condensed automata arise, for example, in parsing where the IRTG has rules like
A -> r(B)
[i] ?1
where i is an input interpretation. This is not a rare case, so we should figure out how to deal with this correctly.
One situation in which this happens in particular is if we introduce a "super-startsymbol" in order to model a probability distribution over different "real" start symbols. The rules for the super-start-symbol map to ?1 on all interpretations.
Original report by rknaebel (Bitbucket: rknaebel, GitHub: rknaebel).
If one selects a feature-weight pair after training a maximum entropy model
then the selected entry becomes invisible.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
Calling makeAllRulesExplicit processes all rules, but does not reliably store them. In particular, some decomposition automata appear empty in the gui, even when adding makeAllRulesExplicit to the code.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
isCyclic may revisit states again that it has already explored, which makes it very inefficient. It also uses a fixed array to keep track of the states which have been seen on the current path down towards a terminal, which is initialized at the beginning for the size of the states known at that point, which means that the code does not work for lazy automata.
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
The getWeightRaw method creates a pair for every state and weight that it finds bottom up, but when multiple state combinations create the same parent state, then this should lead to multiple pairs, that could be summed up, this might lead to exponential behaviour for certain trees and automata.
Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).
Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).
BinarizingAlgebra always creates its own signature, which means there is no way to ensure that the signature is the same as some other signature.
Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.