allenai / allennlp-demo Goto Github PK
View Code? Open in Web Editor NEWCode for the AllenNLP demo.
Home Page: https://demo.allennlp.org
License: Apache License 2.0
Code for the AllenNLP demo.
Home Page: https://demo.allennlp.org
License: Apache License 2.0
From @maartensap, it would be better...
to not have to click through the 10 generated annotations, and instead like... show them all in a box, maybe in a bulleted list of something...
See the dependency parser with "He ate spaguetti with meatballs". http://staging.allennlp.org/dependency-parsing/MjU5Njg0
For example, http://demo.allennlp.org/event2mind works fine but http://demo.allennlp.org/event2mind/ 404's. I frequently share URLs in the latter format because I copy and paste a permalinked URL and then manually remove the suffix. Ideally either format would work.
In the coreference example, sometimes the color squares do not fill the entire box.
This task is based on the recent aspirational design for the new Attention Visualization:
Sub-tasks
<Pane />
(#88)<Drawer />
component<DrawerTrigger />
component that is anchored to the bottom of the output pane<HeatMap />
component tree as data object<Collapsible />
component and NPM dependency from demoRe: Item 4, "Model Internals" <Drawer />
component will probably live outside the model component as a child of <App />
. Thus, we will need to send the attention visualization tree to <Drawer />
via a callback function passed down as props from <App />
.
Later this year we anticipate investing in our demo infrastructure. Our goals are to 1) add the ability to easily host external models and 2) improve robustness (in engineering sense).
Motivation: We’d like to add many more models to the demo, but the present architecture isn’t scalable as everything runs in a single process on a single thread. In addition, we would like to be able to add demos for contributed models that are not merged into AllenNLP.
Success criteria: we close out the following issues:
"Memory leak in the demo" - #37
"Add the ability to host external models in the demo" - #38
"Support hosting multiple models for the same task" - #49
We'll flesh this issue out as we refine our plan.
The Constituency Parsing demo was changed to have vertical sections so the visualization fit on screens--small ones in particular. However, now we have a problem that the titles are centered differently which is jarring.
We have a visualization of attention, which has helped researchers significantly in debugging their models and understanding how they are working. The current demo is reasonable, but a bit rough. Particularly with the doubly-nested folding menus that animate open and closed and longer words.
Aspirational design
Issues:
Features:
Design Update (10/1/18): see #44 (comment)
@joelgrus is working on creating a new introductory tutorial. Once we have that, we want to re-work the tutorial page so a single tutorial is highlighted, with links out to the more advanced tutorials.
This issue is based on this comment by @matt-gardner.
demo.allennlp.org should have a standard format for displaying this information. Something like
Original paper: [title and venue]
Original authors: [list, with affiliation]
This version implemented by: [list, with affiliation]
Description:
It'll make it more obvious who should get credit for what when someone is looking at the demo.
Going to a link such as http://staging.allennlp.org/constituency-parsing/MjU5NjQy shows the demo output but does not fill in the fields that were used for input. Ideally it would populate those fields as well.
We need to:
There are structural issues with the current <Pane />
system that are causing annoying scrolling bugs. A refactor is necessary for:
Presently many users build models in their own source repositories. If we wanted to host such a model in the AllenNLP demo currently we would need to port it into http://github.com/allenai/allennlp
itself. However this is often impractical or even unwanted.
One potential strategy for serving external models is to containerize each model behind a rest interface. This would also allow the demo to scale easier and it would allow us to isolate bad models (such as the model that's leaking memory in #37).
<XLabels />
and <DataGrid />
with single <HeatMap />
componentThe demo takes a very long time to start when deploying a new version, because several large models need to be downloaded. We deploy our demo to Google Cloud, but models are downloaded from AWS S3. They also seem to take an unusually long amount of time to download...
Having all CSS in a single file is difficult to maintain and encourages weird bugs. We want to split out the CSS into separate modules. Ideally, there would be a 1:1 correspondence between each React component and its relevant CSS module.
Something about modularizing CSS (by importing a css file at the component level) creates a broken <link>
tag (resource returned 404) instead of an inline <style>
tag. Hunch is that it's some outdated React-related dependency in the Docker file.
This is what gets built locally with latest npm:
This is what gets built via Docker:
The resource pointed to in the <link>
tag does not exist.
Currently we have duplicated data on the Models page and in the AllenNLP demo. For example, both maintain a list of models and descriptions for each of those models.
We should have a closer integration so we duplicate less and "rebrand" beyond "demo".
"Interactive AllenNLP" is more than a demo--it's AllenNLP in action.
Specifically I would like to move the model information from https://allennlp.org/models into https://allennlp.org/demo. We would then rebrand the page to just contain our gallery of contributed models.
This issue is a follow-up to the Multi-NER PR that has already been merged: #45
Improvements to be made:
Motivation: we have multiple models for many of our tasks, but the demo design supports only one model per task. Researchers would like compare different approaches on the same task interactively through our demo.
Success Criteria: we a clean design for multiple models per task that can scale to many (3x) more models than we have today. We also implement multiple models for at least two tasks. It should also be easy (e.g. just a configuration change) to add an additional model for an existing task.
Currently the AllenNLP demo seems to cache aggressively. If I release with a new model, I need to force refresh to see the option. Similarly, if a model is used it continues to appear for the user as an option, and 404's on POST.
We need to break the cache on the demo so the user always sees the latest and the greatest.
Currently the demo has some glaring bugs. We want to fix this so that it's somewhat usable--even though it won't be fully usable until we improve the design.
On https://allennlp.org/models we list each of the models, but don't have any way of connecting back to the demo.
We could add a forth "button" that opens a window an pulls up the corresponding demo for that model.
We're looking to grow our demo significantly, and when we do it's less clear what we should host on https://allennlp.org/models. We could move the functionality with dropdowns for code examples for predictions, evaluating, and training there to our demo itself, so we would have easy code snippets for all of our demo-ed models.
Presently, we already duplicate the model descriptions across the website and the demo.
Example:
Given the following input data:
requestData = "Björk and Thom Yorke are musicians. They have fans. Thom Yorke is English.";
And the response data coming from the server:
responseData = {
document: [ // Token list
"Björk", // 0
"and", // 1
"Thom", // 2
"Yorke", // 3
"are", // 4
"musicians", // 5
".", // 6
"They", // 7
"have", // 8
"fans", // 9
".", // 10
"Thom", // 11
"Yorke", // 12
"is", // 13
"English", // 14
"." // 15
],
clusters: [
[ // Cluster 0 spans:
[0,3], // "Björk and Thom Yorke"
[7,7] // "They"
],
[ // Cluster 1 spans:
[2,3], // "Thom Yorke"
[11,12] // "Thom Yorke"
]
]
};
How do we translate that data into a tree object like this?
tokenTree = {
contents: [
{
cluster: 0,
contents: [
"Björk",
"and",
{
cluster: 1,
contents: [
"Thom",
"Yorke"
]
}
]
},
"are",
"musicians",
".",
{
cluster: 0,
contents: [
"They"
]
},
"have",
"fans",
".",
{
cluster: 1,
contents: [
"Thom",
"Yorke"
]
},
"is",
"English",
"."
]
};
Translating the response data into a tree will make it easier to build the target JSX markup:
<p>
<Highlight cluster="0">
<span>Björk</span> {/* Token 0 */}
<span>and</span> {/* Token 1 */}
<Highlight cluster="1">
<span>Thom</span> {/* Token 2 */}
<span>Yorke</span> {/* Token 3 */}
</Highlight>
</Highlight>
<span>are</span> {/* Token 4 */}
<span>musicians</span> {/* Token 5 */}
<span>.</span> {/* Token 6 */}
<Highlight cluster="0">
<span>They</span> {/* Token 7 */}
</Highlight>
<span>have</span> {/* Token 8 */}
<span>fans</span> {/* Token 9 */}
<span>.</span> {/* Token 10 */}
<Highlight cluster="1">
<span>Thom</span> {/* Token 11 */}
<span>Yorke</span> {/* Token 12 */}
</Highlight>
<span>is</span> {/* Token 13 */}
<span>English</span> {/* Token 14 */}
<span>.</span> {/* Token 15 */}
</p>
Which will in turn give us this UI:
Relevant links:
allennlp-demo/demo/src/components/CorefComponent.js
euclid/panda/web/src/fragment/Fragment.tsx
We're prioritizing the visualization of semantic parsing output first. Eventually, we'll want to revisit the mechanics of inputting tabular data. Currently, this is handled via raw text in a <textarea>
which is not an ideal user experience.
Not only do we want to improve our demo visualizations (e.g. #76 and #64) but we would like to add more semantic parser visualizations to complete the semantic parsing toolkit. Doing this will probably involve reworking the left pane so we can group the Semantic Parsing demos together--otherwise the list starts to become unmanageable.
In keeping with the recent push to brand all web properties with full Allen AI logo.
from allennlp.common.util import JsonDict, peak_memory_mb
from allennlp.service.db import DemoDatabase, PostgresDemoDatabase
from allennlp.service.permalinks import int_to_slug, slug_to_int
from allennlp.service.predictors import Predictor, DemoModel
service.db should be server.db
DemoModel isn't there in Master anymore
You need to manually click on "Run". Ideally you could hit enter as well.
This issue is related to the PR: Adding (hidden) demo for wikitables parser (#25).
The AllenNLP demo has a memory leak that causes pods to fail in Kubernetes. Recently we added the dependency parser and pods were restarting every few minutes, so I doubled the memory available on each machine. Regardless, I'm still seeing a few restarts overnight.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
allennlp-demo-prod-597b67546-mglpq 2/2 Running 3 18h
allennlp-demo-prod-597b67546-tfdnj 2/2 Running 2 18h
allennlp-demo-prod-597b67546-xzvjh 2/2 Running 4 18h
The user isn't seeing any impact because the pods restart when they crash and we have three replicas, but we are presently addressing the situation by spending more money (on replicas) and our ability to add more models is limited.
$ kubectl describe pod allennlp-demo-prod-597b67546-mglpq
Name: allennlp-demo-prod-597b67546-mglpq
Namespace: default
Node: gke-allennlp-demo-cluster-new-pool-bdd08a8c-z0c5/10.9.254.14
Start Time: Mon, 27 Aug 2018 13:46:54 -0700
Labels: app=allennlp-demo-prod
pod-template-hash=153623102
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container cloudsql-proxy
Status: Running
IP: 10.4.14.10
Controlled By: ReplicaSet/allennlp-demo-prod-597b67546
Containers:
allennlp:
Container ID: docker://61c5625be2673d770391807f03e0060eb3948076a216fe75a8f1f616f048de2c
Image: allennlp/demo:e68ab81a7a52a1ee13831f90358f2698e9c90209
Image ID: docker-pullable://allennlp/demo@sha256:c16c141bba76bd4b73015ffb22739210e2ebc7c8427a70d22a60884579194761
Port: <none>
State: Running
Started: Tue, 28 Aug 2018 08:04:03 -0700
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Tue, 28 Aug 2018 05:06:44 -0700
Finished: Tue, 28 Aug 2018 08:04:02 -0700
Ready: True
Restart Count: 3
Limits:
cpu: 1
memory: 3000Mi
Requests:
cpu: 1
memory: 3000Mi
Readiness: http-get http://:8000/ delay=15s timeout=1s period=3s #success=1 #failure=3
Environment:
ALLENNLP_DEBUG: TRUE
DEMO_POSTGRES_HOST: 127.0.0.1
DEMO_POSTGRES_PORT: 5432
DEMO_POSTGRES_DBNAME: demo
DEMO_POSTGRES_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
DEMO_POSTGRES_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-790xb (ro)
cloudsql-proxy:
Container ID: docker://23cdfbe24b70f664d6b94ddd913f08649d165c1ff34efa2f767c3d4c7c559c68
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy@sha256:5c690349ad8041e8b21eaa63cb078cf13188568e0bfac3b5a914da3483079e2b
Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=ai2-allennlp:us-central1:allennlp-demo-database=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Mon, 27 Aug 2018 13:47:35 -0700
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-790xb (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
default-token-790xb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-790xb
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 16m (x72 over 18h) kubelet, gke-allennlp-demo-cluster-new-pool-bdd08a8c-z0c5 Readiness probe failed: Get http://10.4.14.10:8000/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 13m (x227 over 18h) kubelet, gke-allennlp-demo-cluster-new-pool-bdd08a8c-z0c5 Readiness probe failed: Get http://10.4.14.10:8000/: dial tcp 10.4.14.10:8000: getsockopt: connection refused
I first noticed these issues around January 1, 2018.
The different demos were largely created by copying and pasting code to create new visualizations. We should go back over the demos and see how much code we can consolidate.
Here is an example of the approach we've taken to date: #41 (comment)
To replicate, fill in some input text and hit "run". The text will have disappeared. Ideally it would stay even after the model has been run.
Try out "Constituency Parsing" and "Dependency Parsing". Others might have this issue too.
Motivation
Right now the front-end code for our demo is sort of a mess. All of the Component
s for each model/task contain a ton of duplicated code and lots of bespoke UX components.
This means that right now to add a new model to the demo takes hundreds of lines of JavaScript. By factoring out common code (and cleaning it up), we should be able to make it so that a new model takes only a few dozen lines of JavaScript. This will make it much simpler to add new models and avoid subtle UI bugs as we grow the demo.
As part of this process, we should add unit tests and (potentially) TypeScript/Flow type annotations to make our demo harder to break and easier to keep running.
Success Criteria
This bug became known during the effort to modularize CSS.
Steps to reproduce:
Check out modular-css
branch from Modularizing CSS (#56):
git pull https://github.com/aaronsarnat/allennlp-demo.git modular-css
Build via docker and ssh into the docker image:
$ docker build -t foobar .
...
$ docker run -it --entrypoint /bin/bash foobar
#
Verify that the bundled css file was created in the docker image:
# cd demo/build/static/css
# ls
main.bb9de2e1.css main.bb9de2e1.css.map
Exit out of the docker image and run the server via docker:
# exit
$ docker run -p 8000:8000 foobar
Visit http://localhost:8000/
in browser.
Hard-refresh to make sure nothing is being cached (hold shift key and click the refresh browser toolbar button). Notice the page appears unstyled.
Open up Chrome inspector and locate the link tag that looks like this: <link href="/static/css/main.bb9de2e1.css" rel="stylesheet">
Right click on the href
attribute for that <link>
tag and select "Open in new tab". Notice that the resource is not found.
As of a few weeks ago, the demo was demanding upwards around 10GB of local memory to run. Today, I was unable to run the demo without increasing my local memory allocation to 14GB and swap to 4GB. (see the log of the docker build/run script and resulting errors I was getting below).
I was monitoring memory usage while running docker run
and it peaked at over 13GB. I was able to get the demo to run after increasing the memory allocation, but I'm concerned about the memory requirements as more demos are added and my machine only has 16GB of RAM.
$ docker build -t allennlp/demo:$GIT_HASH .
... Omitting build script log -Aaron
Successfully built a1ef1def8fb2
Successfully tagged allennlp/demo:26048b42e199e90bcd6e4e9608c9c16d416ef6ed
$
$ docker run -p 8000:8000 allennlp/demo:$GIT_HASH
/usr/local/lib/python3.6/site-packages/allennlp/service/predictors/__init__.py:23: FutureWarning: allennlp.service.predictors.* has been depreciated. Please use allennlp.predictors.*
"Please use allennlp.predictors.*", FutureWarning)
/usr/local/lib/python3.6/site-packages/allennlp/service/predictors/predictor.py:6: FutureWarning: allennlp.service.predictors.* has been deprecated. Please use allennlp.predictors.*
" Please use allennlp.predictors.*", FutureWarning)
demo db credentials not provided, so not using demo db
/usr/local/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
"num_layers={}".format(dropout, num_layers))
INFO:__main__:loading open-information-extraction model
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/local/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/local/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app.py", line 300, in <module>
main()
File "./app.py", line 72, in main
predictor = demo_model.predictor()
File "/stage/allennlp/server/models.py", line 21, in predictor
archive = load_archive(self.archive_file)
File "/usr/local/lib/python3.6/site-packages/allennlp/models/archival.py", line 105, in load_archive
resolved_archive_file = cached_path(archive_file)
File "/usr/local/lib/python3.6/site-packages/allennlp/common/file_utils.py", line 86, in cached_path
return get_from_cache(url_or_filename, cache_dir)
File "/usr/local/lib/python3.6/site-packages/allennlp/common/file_utils.py", line 174, in get_from_cache
response = requests.head(url, allow_redirects=True)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 98, in head
return request('head', url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 495, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
[INFO/MainProcess] process shutting down
$
Related PR:
Draft pitch:
"Model Internals" <Drawer />
component will probably live outside the model component as a child of <App />
. Thus, we will need to send the attention visualization tree to <Drawer />
via a callback function passed down as props from <App />
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.