Git Product home page Git Product logo

docs's People

Contributors

amir-zeldes avatar bsandra avatar dan-zeman avatar ermanh avatar fginter avatar gcelano avatar ialfina avatar jipgogak avatar jnivre avatar kajad avatar l12maro avatar languagestructure avatar liljao avatar manning avatar marinecourtin avatar mcdm avatar mcguis26 avatar msimi avatar nschneid avatar olesar avatar osenova avatar phelanj9292 avatar rueter avatar sebschu avatar simonettamontemagni avatar spyysalo avatar stellamarks avatar tatiana-merz avatar tlynn747 avatar ulyavedenina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

Shorter feature names?

Some feature names, originally taken from Interset, may seem too long. In general, I do not favor extremely short names because longer names are more self-explanatory. However, I would not mind shortening two names: Definiteness and Negativeness. By removing the ness part, we would get Definite and Negative, which is probably understandable enough. Any opinions?

Link to relation page in the auto-generated "all" page

It would be really nice if the auto-generated "all" page could auto-include links to the per-relation page (+ possibly also the USD relation documentation). This would be useful for editing these as well as when details are omited.

Files named "aux" cause problems

The docs.git repository cannot be fully cloned in Microsoft Windows because for silly historical reasons, this system disallows files with certain three-letter names, including "aux" (regardless case and .extension). The files affected in this repository are

_en-dep/aux.md
_fi-dep/aux.md
_ud-dep/aux.md
_ud-pos/AUX.md

Any chance to rename these so that the repository gets more portable?

Thanks
Dan

validate.py crashes on empty HEAD

For https://github.com/universaldependencies/tools:

git pull
$ cat test-cases/nonvalid/empty-head.conll
# not valid: HEAD is empty
1   have    have    VERB    VB  Tens=Pres       root    _   _
$ python validate.py --no-lists < test-cases/nonvalid/empty-head.conll
[Line         2]: Empty value in column HEAD
Traceback (most recent call last):
File "validate.py", line 302, in <module>
validate(inp,out,args,tagsets)
File "validate.py", line 239, in validate
validate_tree(tree)
File "validate.py", line 223, in validate_tree
deps.setdefault(int(cols[HEAD]),set()).add(int(cols[ID]))
ValueError: invalid literal for int() with base 10: ''

Two styles of hyperlinks

I often use HTML to create hyperlinks, partly because I like it and partly because some types of links are currently not supported in the []() syntax (see #40): <a href="../ud-pos/ADJ.html">adjectives</a>.

Occasionally I also use the []() syntax, as in [ud-dep/case]().

Now I realized that the results differ in style: the HTML-defined link is underlined while the []()-defined link is not. (For example, see the first three paragraphs in http://universaldependencies.github.io/docs/ud-feat/Case.html) Is this intentional?

Phrase-level vs. word-level modification

Yoav says: Maybe we should consider some mechanism for distinguishing phrase level vs. word level modification.

Context: Constituent trees can naturally express distinctions like
(almost (at (my house)))
vs.
(at ((almost my) house))
while in dependency trees this distinction could get lost (disregarding word order, which may be different in other languages anyway).

Another example is a shared modifier of coordination:
(Peter ((bought and ate) (an apple))) ... Peter and apple are arguments of both verbs, i.e. the whole coordination. In dependencies, they will look as if they modify only the first conjunct, i.e. "bought".
(((Peter bought) and (Mary ate)) (an apple))
((Peter (bought (an apple))) and (Mary (ate (a pear))))

No milestone set—we should think about this for the next version.

What to do with words that are mentioned rather than used?

Copying from e-mails:

@dan-zeman: I am quite okay with "yes" and "no" being interjections rather than particles. Either way would be a bit arbitrary. It is also clear that ambiguous usages should be separated ("no" in "no way", or in other languages where the same word translates as English responsive "no" or functional "not"). What about usages such as "I am waiting for his "yes" on the matter." Still interjection, or a noun?

@jnivre: The “waiting for his ‘yes’ example” is not peculiar to interjections, but is the general problem of what to do with words that are mentioned rather than used. Is ‘precede’ a verb or a noun in “He pronounced ‘precede’ in a funny way”?

SD parser fails on sentence-terminal space

To replicate:

~~~ sdparse
extra space 
dep(extra, space)
~~~

Adding an extra space character to the end of extra space causes the SD parser to read the whole entry as text (no dependencies), producing a visualization where the text is extra space dep(extra, space).

relation table in merged document

suggestion from @manning :

is there a way to get additional content to propagate
over to the single document version? E.g., at present
the USD relation table doesn’t appear in the single
document version.

Update documentation of USD relations in the light of new general principles

We need to go through the documentation of universal relations and minimally make sure that all the text and examples are compatible with the general principles. Ideally, we should also add more information for relations to which some general principle applies (for example, explain how to treat multiple auxiliaries under "aux" and "auxpass"). Whoever does this may want to address issue #50 (avoid phrase structure language) at the same time.

format documentation: feature documentation details

(More format documentation nitpicking, thought I'd avoid spamming everybody and try issues instead.)

A few questions re: http://universaldependencies.github.io/docs/format.html#morphological-annotation:

  • What characters are permitted in feature names and values? Presumably at least = and , are disallowed to keep the syntax unambiguous, but is e.g. non-ASCII alphabetical OK? Or just [a-zA-Z]?
  • What is the precise definition of alphabetical sort? As the documentation doesn't enforce capitalization, resources may define e.g. a case feature. Is then case < Def (intuitively correct) or case > Def (ASCII order)? (Naive implementations would tend to produce the latter.)

@fginter , @jnivre : clarifications would be much appreciated!

more sensible information popups

As of 7dacf61, mousing over span and relation annotations produces a basic information popup with type and (for spans) marked text.

This information is entirely redundant with that already shown in the visualization.

The info popup is likely to be useful for displaying feature values, but should probably not be shown at all in cases where no additional information (wrt. base visualization) is available.

(comments welcome!)

Broken links in merged documents

Many links between pages in a single collection are broken in merged (single-page) documents (see e.g. http://universaldependencies.github.io/docs/ud-pos-all.html).

Documents created as automatic merges of pages in particular collections, such as http://universaldependencies.github.io/docs/ud-pos-all.html, are currently found in the documentation root directory (docs/), while the individual documents are found in the collection-specific subdirectory (e.g. docs/ud-pos/). Consequently, relative links that work between the individual collection documents (e.g. <a href="DET">DET</a>) are broken in the merged document.

Possible ways to resolve this:

  • Place each merged document in the same directory as the documents it merges
  • Only use absolute links (e.g. href="http://universaldependencies.github.io/docs/ud-pos/DET.html")
  • Use the special variable {{ relative }} in links (e.g. href="{{ relative }}ud-pos/DET.html")
  • Use the auto-linking mechanism (e.g. [DET]()) and update the code to adjust accordingly

(Related to #16, but distinct.)

Allow different text for auto-link

from @dan-zeman :

"""
can I link to a label using a text other than the label itself? For example, how would I rewrite the following in your syntax?

<a href="../ud-pos/NOUN.html">nouns</a>

"""

this should be supported.

jekyll conflicts w/visualizations w/words ending in dash

For the input

<div class="sd-parse" tabs="yes">
Go to the righ- to the left .
reparandum(left-7, righ--4)
[...]
</div>

jekyll replaces the double-dash in righ--4 with an mdash, which causes the embedded visualization to fail.

General case: jekyll processing should be off in all visualization divs.

yes / no

We need a survey of existing corpora w.r.t. the words "yes" and "no" (the latter as a response, not as in "we have no bananas"). How are they annotated there? Do we want to tag them PART, or INTJ?

Multi-line glossing

Sampo, is it possible to support with brat doing interlinear glossing as is standard in linguistics for texts in different languages or just if you want to give more information about the morphology, etc. (http://en.wikipedia.org/wiki/Interlinear_gloss). I think that will be very useful for giving examples in different languages.

validate.py crashes on duplicate ID

To replicate, for https://github.com/universaldependencies/tools,

git pull
$ cat test-cases/nonvalid/duplicate-id.conll 
# not valid: IDs must be sequential integers (1, 2, ...)
1   valid   valid   NOUN    SP  _   0   ROOT    _   _
1   .   .   .   FS  _   1   p   _   _
$ cat test-cases/nonvalid/duplicate-id.conll | python validate.py 
[...]
File "validate.py", line 211, in proj
proj(dependent,s,deps)
RuntimeError: maximum recursion depth exceeded

merged documents presenting particular linguistic constructions

suggestion from @manning :

it would also be useful to have sections that presented
particular linguistic constructions. The existing manual
section on copulas is an example of this, but I imagine
others ranging from linguistic topics (tough movement,
correlative comparatives, …) to practical topics
(address blocks, itemized lists, …).

Guidelines for language-specific documentation

We need a page summarizing what a language-specific description should contain as a minimum. @jnivre suggested the following:

For the language-specific entry pages, I am not sure how to structure it, but I think the following info should be compulsory:

  1. A description of how words and tokens are defined (including whether range tokens are used, etc.)
  2. A description of the morphological annotation, including the use of universal tags, universal features used, and language-specific features if any.
  3. A description of the syntactic annotation, including the use of universal dependencies, and language-specific relations if any.
  4. A list of known discrepancies with the universal guidelines (Ryan’s language-specific diff).

What else do we need?

Joakim

Pages about features cannot be edited on-line

I am at
http://universaldependencies.github.io/docs/ud-feat/Gender.html
and I click on "edit page". It takes me to
https://github.com/universaldependencies/docs/edit/pages-source/ud-feat/Gender.md
which does not exist and triggers a 404 error.

The missing bit is probably that the server forgets to translate "ud-feat" to "_ud-feat". I do not know where this is done for the other folders.

The source file "_ud-feat/Gender.md" exists and can be edited off-line, then pushed to GitHub. It is just the on-line editor that does not work.

format documentation: some (mostly) HEAD and DEPS questions

Some more questions regarding the format spec (http://universaldependencies.github.io/docs/format.html):

  1. It was decided (#33) that feature names and values must have the form [A-Z0-9][a-zA-Z0-9]*. Are dependency relations similarly required to have a particular form (e.g. [a-z]+)?
  2. May a CoNLL-U sentence to have no words with HEAD = 0 (root relations)?
  3. May a CoNLL-U sentence to have more than one word with HEAD = 0 (root relations)? (yes for CoNLL-X)
  4. May head 0 (root) dependencies occur also in secondary dependencies (DEPS)?

(+extra non-dep question:) May multiword token ranges overlap? Intuitively no, but this doesn't seem to be explicit in the docs. Consider e.g. (nonsense example)

1   I   I   PRON    PRN Num=Sing|Per=1  2   nsubj   _   _
2-3 haven't _   _   _   _   _   _   _   _
2   have    have    VERB    VB  Tens=Pres   0   root    _   _
3-4 nota    _   _   _   _   _   _   _   _
3   not not ADV RB  _   2   neg _   _
4   a   a   DET DT  _   5   det _   _

Standardize on `~~~ sdparse` syntax (instead of `<div class="sd-parse">`)?

The visualization system supports two equivalent ways to write examples, one using Markup-line block syntax (~~~) and the other HTML (<div class="...">). For example, the first visualization here http://universaldependencies.github.io/docs/embedsd.html can be equivalently written either as

~~~ sdparse
Dogs run
nsubj(run, Dogs)
~~~

or

<div class="sd-parse">
Dogs run
nsubj(run, Dogs)
</div>

(the hyphen in sdparse in the second form only is a historical accident.)

The documentation currently contains a mix of both forms, which is potentially confusing for both authors and readers. It would be better to standardize on just one.

IMHO, the Markdown block (~~~) syntax is not only more compact but also easier to read and write as well as more consistent with the overall preference for Markdown over HTML in the documentation.

On the other hand, the HTML (<div class="...">) syntax does have the benefit of being more readily recognized by contributors who are familiar with basic web technologies.

On balance, I'd like to propose to use the Markdown block syntax consistently. If this is an acceptable choice, I'd be happy to write a script to implement this change globally in the documentation.

(Pre-empting one potential objection: attributes can also be specified for a Markdown block: http://kramdown.gettalong.org/syntax.html#attribute-list-definitions)

Avoid phrase structure language

Parts of the documentation make frequent use of phrase structure ideas and terminology in definitions. For example (http://universaldependencies.github.io/docs/u/dep/relcl.html):

A relative clause modifier of an NP is a relative clause modifying the NP.
The relation points from the head noun of the NP to the head of the relative
clause, normally a verb.

It would be better to reduce such usage, as the (exact) definitions of NP, VP, etc. are neither found in the documentation nor always obvious, not all languages intended to be covered by the UD documentation have a broadly accepted standard for phrase structure analysis, and (IMHO) it would be preferable if the dependency analyses could be defined without first defining or assuming a phrase structure analysis.

(Phrase structure terminology is particularly common in the English documentation, suggesting it is at least in part simply left over from the old SD documentation, where the scheme was defined as a conversion from a phrase structure analysis.)

validate.py crashes on DOS newlines

to replicate:

result:

[...]
File "validate.py", line 201, in proj
proj(dependent,s,deps)
File "validate.py", line 201, in proj
proj(dependent,s,deps)
RuntimeError: maximum recursion depth exceeded

expected: either accept or reject the file w/o crashing.

related issue: I didn't find comment in the format document on Unix (LF) vs. DOS (CR/LF) vs. other newline conventions.

Move "Interset features that are not part of this standard"?

The bottom of the morphology page (http://universaldependencies.github.io/docs/morphology.html) has a list of Interset features that are not part of the selected subset of universal features (http://universaldependencies.github.io/docs/ud-feat-index.html).

This is probably not the best possible location for this information. @mcdm suggested creating a separate page summarizing language-specific extensions to relations; perhaps we could either 1) create a page for language-specific extensions to features, or 2) create a general language-specific extensions page containing information on both features and relations, and move this material there?

(Assigning @dan-zeman , please reassign or clear if you don't want this!)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.