Pros
- some initial progress
- semantically interesting topic (people like movies)
Cons
- Graph sparsity
Ice's dataset (paper)
Pros
- the data is there
Cons
- too small: 20k questions
- IR-heavy task where simple algos can do very well
- small splash
WebQueryTable + [turk/generated] questions (paper)
Pros
- 200k tables to generate questions from
Cons
- IR-heavy
- many tables may not be interesting
- small splash
Pros
- dense graph
- open domain
- would make a big splash
Cons
- encoding a subset of the KB as a graph for traversal/generation would take significant effort
- IR-heavy: scaling to massive kb's requires significant algorithmic innovation, might detract from reasoning
Pros
- encoding is easy (not as much of an IR focus)
- would make a big splash ("ai solves gre!")
- encoding text is more fun than encoding graphs/KBs (nlp ftw!!)
Cons
- where to get stories? reddit? generate them? have turkers write them? use other short story datasets?
- how to get questions? extract graph from story? turkers (how to encourage compositionality)?
- you start from squad paragraph, or roc stories, or a photo (we can try out several options).
- you extract a small structured representation out of that. (graph) - using corenlp tools.
- two options: a. somehow (not sure how yet) you prime the turkers to ask compositional questions, b. you use the small graph to create auto-generated compositional questions, as we planned to do with wikimovies
Pros
- lots of small graphs
Cons
- lots of unknowns: what works best?
- predicted graphs
- how to prime turkers for compositionality