Git Product home page Git Product logo

alto's People

Contributors

akoehn avatar alexanderkoller avatar ctnlp avatar jgroschwitz avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

alto's Issues

Warmup in Alto Lab

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


When a task in Alto Lab has warmup > 0, it will attempt to run the warmup and the experiment at the same time. This defeats the purpose of warmup, and may mess up time measurements.

Cause of the problem: In the warmup iteration (CommandLineInterface.java:256), the call to program.run() returns before all warmup tasks are completed. In fact, it already returns before all warmup tasks have been submitted to the ForkJoinPool in Program.java. These tasks are still submitted to the pool. It is almost as if program#run were called in its own separate thread, and the task submissions took place in the background. But I can find no place where a new thread is being spawned.

This happens both in verbose mode and non-verbose, so the problem is probably not related to the use of the ConsoleProgressBar.

Workaround: Don't use warmup for now.

asConcreteAutomatonBottomUp is broken

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


In TreeAutomaton.asConcreteAutomatonBottomUp the two lines:

processAllRulesBottomUp(rule -> ret.addRule(ret.createRule(getStateForId(rule.getParent()), rule.getLabel(this), getStatesFromIds(rule.getChildren()))));
finalStates.stream().forEach(finalState -> ret.addFinalState(finalState));

are wrong. The final states are only added correctly, if the new automaton numbers the states in the same way that the old automaton does (not guaranteed). For the rules, the weight is not copied.

long waiting time for weight updates

Original report by rknaebel (Bitbucket: rknaebel, GitHub: rknaebel).


So it is not really a problem but often if alto learns lots of weights for a grammar the terminal says that alto is done but after training it needs lots of time to update the new weights in the grammar.
At this point I m often not sure whether alto is already crashed or just updating... Please improve the time alto needs to update the weights or make some progress bar or something else to show that alto is currently processing. Preference for the first one.

GenericCondensedIntersectionAutomaton problems with loop in decomposition automaton

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


When running the test in the attached file there are trees that the SiblingFinder based intersection contains that the intersection based on GenericCondensedIntersectionAutomaton does not contain. Running the test will output one of the trees that are not found. The problems seems to be that loops in the decomposition automaton are processed correctly.

Read instance comments

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


The Instance class in the corpus package supports optional comments. These can currently be set programmatically, and will be written into the corpus file correctly. But when reading the corpus, they are simply skipped as comment lines.

Change this so the comments before each instance in the corpus are read into the comment field of the Instance object when reading a corpus.

BinarizingAlgebra returns wrong signature

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


BinarizingAlgebra uses its own internal signature, which is different from the signature of the super class, but when getSignature is changed to return the local signature then a test in CoarseToFineParserTest fails:

testPtb(de.up.ling.irtg.automata.coarse_to_fine.CoarseToFineParserTest) Time elapsed: 0.025 sec <<< ERROR! java.lang.NegativeArraySizeException

the problems seems to be that a certain grammar can no longer be read in in binary format. Returning the correct signature would be better, in case one needs to work with it.

unify ckyDfsInBottomUpOrder methods

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).


The methods

  • GenericCondensedIntersectionAutomaton.ckyDfsForStatesInBottomUpOrder ,
  • NonCondensedIntersectionAutomaton.ckyDfsForStatesInBottomUpOrder,
  • CondensedCoarsestParser.ckyDfsForStatesInBottomUpOrder and
  • TreeAutomaton.ckyDfsInBottomUpOrder
    seem to do mostly the same thing -- can we write one more abstract version to unify them all?

Maxent training hangs GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


When training a maxent IRTG from the GUI, the GUI can become unresponsive after training is completed. I have seen this happen when training a Geoquery grammar (340 rules, with a RuleNameFeature for each rule) on a Geoquery training corpus (500 instances).

Furthermore, the progress bar that shows during training only updates occasionally, not fluidly.

I suspect this is because the Swing EDT receives a lot of events during training that the invokeLater in withProgressListener never executes.

Select intersection and invhom algorithms in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


We now have a large number of such algorithms implemented in Alto, and users need to be able to use them from the GUI. Perhaps we can add dropdown boxes for selecting them to the "Parse ..." dialogue (where we can enter inputs).

Ideally, the dialogue should specify reasonable defaults for the algorithms.

fix Tree#select for nodes with more than 10 children

Original report by Jonas Groschwitz (Bitbucket: jgroschwitz, GitHub: jgroschwitz).


I last checked this a few months ago, not sure if it still exists.

the Tree#select method uses digits to describe paths, and fails if a node has more than 10 children (i.e. 0-9 are all used up). There is at least a partial implementation that circumvents this issue, but it is not the default and may have slower runtime.

Accepts works only for Automata that support getRulesBottomUp

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


TreeAutomaton.accepts(Tree) relies on run(Tree) which in turn relies on runRaw(Tree), which only works if the automaton supports getRulesBottomUp, in the interest of supporting specialized implementations the method should check what kind of queries the the automaton actually supports and then use an analogue of run(Tree) which works top down instead of bottom up (even if that means that we may be unable to exploit bottom up determinism).

SortedLanguageIterator computes MANY items

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


Parse "Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 ." with binarized.irtg from the PTB tutorial and open the language window. This will take several seconds to initialize the SortedLanguageIterator. It derives millions of items, despite the fact that we are only looking for the 1-best item and the parse chart only has 60000 rules. Something's wrong here, we should fix it.

getRulesBottomUp in StringAlgebra.CkyAutomaton not restrictive enoug

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


When asked for rules bottom up, with a certain label (other than concat), getRulesBottomUp will always return all the rules that read that label. This is independent of the children that where given. This is not in line with the contract of the method, as I understand it. I currently cannot judge how much damage to the whole system would result from fixing this, I will test this later.

As a reminder to myself: hasRuleWithPrefix could also easily be made much more restrictive, without sacrificing efficiency.

Reflection issues with Java 9

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


Currently, Alto does not work with Java 9 because of the following reflection-related problems:

  • org.simplericity.macify triggers an error "Illegal reflective access by org.simplericity.macify.eawt.DefaultApplication (file:/private/tmp/alto/target/alto-2.2-SNAPSHOT-jar-with-dependencies.jar) to method com.apple.eawt.Application.getApplication()". There does not seem to be a more recent version of macify that fixes this.
  • Alto Lab doesn't find static methods in Program#getAllAnnotatedStaticMethods, probably because no classes are found in Program#getAllClassesIn_irtg_Classpath (but this needs to be double-checked).

Ideas for fixing these issues:

  • Replace macify with something that does the same thing, but works with Java 9.
  • Replace Alto Lab's class-finding mechanism with a class-finding mechanism like in InputCodec and OutputCodec, where the classes that can be used from Alto Lab tasks must be explicitly declared in a file.

Make CTF parsing available in GUI

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


Coarse-to-fine parsing is currently only available through Alto Lab etc. It should also be usable from the GUI. Maybe it can be merged into the "Parse ..." dialog, but there is the added challenge that we have to ask for an extra input file (the fine-to-coarse mapping).

Check for cyclicity as part of TreeAutomaton#evaluateInSemiring

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


Right now, evaluateInSemiring will return an (incorrect) value if the tree automaton has cycles. We should check for cycles in TreeAutomaton#getStatesInBottomUpOrder and throw an exception if a cycle occurs.

This has two advantages. First, it ensures that evaluateInSemiring is only used when it returns the correct value. Second, it makes the use of TreeAutomaton#isCyclic in JLanguageViewer unnecessary (the if block can just be replaced by catching the exception). For some reason, this method is incredibly slow for some grammars, and it would be good to get rid of it.

Merge SetAlgebra into SubsetAlgebra

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


SetAlgebra is just a SubsetAlgebra (with slightly different operations), specialized to relations over a model. Let's refactor it so it just creates a SubsetAlgebra internally and we can avoid code duplication.

Alto Lab Handling of nulĺ results

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


When using alto-lab on falken-3 with e.g.:

java -Xmx8G -cp alto-2.1-SNAPSHOT-jar-with-dependencies.jar de.up.ling.irtg.laboratory.CommandLineInterface 64 --data 24 -c "with additional data 24" --reload

sometimes NULL results (e.g. if there is no parse tree for a given input) will cause errors such as:

Exception in thread "ForkJoinPool-2-worker-1" java.lang.NullPointerException
at de.up.ling.irtg.laboratory.JsonResultManager.acceptResult(JsonResultManager.java:92)
at de.up.ling.irtg.laboratory.Program.lambda$run$2(Program.java:781)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

This does not stop the experiment from running however.

Condensed intersection for recursive decomposition grammars

Original report by Alexander Koller (Bitbucket: akoller, GitHub: akoller).


Now works for direct loops (q -> {...}(q)) , as far as I can tell. I need to come back to this later for general recursion.


The GenericCondensedIntersectionAlgorithm only works if the right (condensed) automaton has no recursion. It will e.g. undergenerate if this automaton has rules of the form "q -> {f}(q)" and this rule is processed before the other rules that can expand q.

Condensed automata arise, for example, in parsing where the IRTG has rules like

A -> r(B)
[i] ?1

where i is an input interpretation. This is not a rare case, so we should figure out how to deal with this correctly.

One situation in which this happens in particular is if we introduce a "super-startsymbol" in order to model a probability distribution over different "real" start symbols. The rules for the super-start-symbol map to ?1 on all interpretations.

isCyclic is very inefficient and only works correctly when applied to concrete automata

Original report by Christoph Teichmann (Bitbucket: cteichmann, GitHub: cteichmann).


isCyclic may revisit states again that it has already explored, which makes it very inefficient. It also uses a fixed array to keep track of the states which have been seen on the current path down towards a terminal, which is initialized at the beginning for the size of the states known at that point, which means that the code does not work for lazy automata.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.