Git Product home page Git Product logo

incubator-annotator's Introduction

Apache Annotator (incubating) Build Status

Apache Annotator (incubating) provides libraries to enable annotation related software, with an initial focus on identification of textual fragments in browser environments.

Installation, usage, API documentation

See documentation on the website: https://annotator.apache.org/docs/

How to build

Building Annatator libraries requires Node.JS (>= 18). All other dependencies are automatically installed as part of the build.

  • npm run build -- builds the project
  • npm test -- runs the tests
  • npm run start -- starts the demo application

Getting Involved

License

This project is available as open source under the terms of the Apache 2.0 License. For accurate information, please check individual files.

Disclaimer

Apache Annotator is currently undergoing incubation at The Apache Software Foundation.

See the accompanying DISCLAIMER file for details.

incubator-annotator's People

Contributors

ajs6f avatar bigbluehat avatar dependabot[bot] avatar fredster33 avatar jakehartnell avatar jccr avatar krismeister avatar lnceballosz avatar permissionerror avatar reckart avatar tilgovi avatar treora avatar vrish88 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-annotator's Issues

highlight range: "not a perfect undo: split text nodes are not merged again"

Source code reference:

// Returns a function that cleans up the created highlight (not a perfect undo: split text nodes are
// not merged again).

// Remove a highlight element created with wrapNodeInHighlight.
function removeHighlight(highlightElement) {
// If it has somehow been removed already, there is nothing to be done.
if (!highlightElement.parentNode) return;
if (highlightElement.childNodes.length === 1) {
highlightElement.parentNode.replaceChild(
highlightElement.firstChild,
highlightElement,
);
} else {
// If the highlight somehow contains multiple nodes now, move them all.
while (highlightElement.firstChild) {
highlightElement.parentNode.insertBefore(
highlightElement.firstChild,
highlightElement,
);
}
highlightElement.remove();
}
}

(also see https://github.com/Treora/dom-highlight-range/blob/master/highlight-range.js )

Could Node.normalize() be used on highlightElement.parentNode?

https://developer.mozilla.org/en-US/docs/Web/API/Node/normalize

See MarkJS usage:

https://github.com/julmot/mark.js/blob/9b0efaf07b869b45cfc4815faad167940143016b/src/lib/mark.js#L675-L687

Requirements?

I just successfully ran the demo with node 7.10.0 and npm 4.2.0

However, I've not yet attempted the hot-reload stuff.

@tilgovi are there specific things that node 8 and npm 5 enable that I'm missing?

Continuous Integration

I've started a basic travis-ci.org setup via a personal fork.

This has raised a few questions:

  • do we want to use travis-ci?
    • I've no preference; Infra's set it up for others before...but also may not care in practice
  • @tilgovi mentioned that the top-level package needs node 8+ whereas the packages/ need node 6+
    • however...I've not been able to sort configuring travis in such away as to accommodate that...

Here's what I've got so far:

My last attempt includes using node as the node_js version via .travis.yml and attempting to use .nvmrc files for local packages/ folders. Turns out Travis doesn't check all those...

There's also the question of narrowing the test runs to just each individual package.

Anyhow. It's doable, but there's a lot more to do...

Interleaving selections within the DOM

Finding selections within the DOM and even wrapping them in an element is easy enough, and most developers just "roll their own" highlighter/selector for things like that--hence, they don't "shop" for tools like Apache Annotator for that.

However, juggling interleaved selections in the DOM is tricky and not standardized.

The DOM is a tree. Selections point at regions all over that tree, often intermixed.

We should build tooling to handle that interleaving to manage the display, removal, eventing, etc, for such selections.

See also #45 and #22.

Example:

<div>
<mark id="a1">Call me <mark id="a2">Ishmael</mark></mark>. Some years ago—never mind how long precisely—<mark id="a3">having little or <mark id="a4">no money in my purse</mark></mark><mark id="a4">, and nothing particular to interest me on shore</mark>, I thought I would sail about a little and see the watery part of the world.
</div>
  • a2 is within a1 and so will have eventing and display related trickiness
  • a4 is made up of 2 marks, but is currently invalid as they share an id--which conceptually "relates" them as a unit, but the DOM doesn't work that way.
    • both sets of <mark/> elements would need shared events, display, removal, etc.
  • a3 also includes 1 part of a4, but not all of it, so weird eventing and display issues again

Solving this (or even just exploring it) is something developers know they need, so likely it should be near the top of our list to solve. 😄

sync vs async API

As discussed in the call yesterday, I open this issue to park thoughts about whether/how to provide synchronous APIs, mainly future consideration.

So far, we have been using async iterables/generators as the system for returning a selector’s matches to the caller. The intention of this approach is to not block the thread for too long when dealing with e.g. a fuzzy text search (once that’s implemented) in a large document:

  • Using an iterable instead of returning all matches in an array allows the user to get the first results before the search has completed, and also pause or abort the continuation of the search if desired — this is great, no need to question this I think.
  • Using an async iterable allows such a search implementation to e.g. break up work in small chunks.

In our planned modular system, with support for different selector types and different implementations for anchoring them, some implementations may want to use the asynchronous approach, while others are quick enough to run synchronously. So far, our code only uses synchronous implementations, but it exposes them as asynchronous functions so that in the future one could swap out an implementation for an async one, and we can pass functions around without needing to distinguish between sync and async ones. Unfortunately, as Javascript lacks a way to turn a resolved promise into its value synchronously, a sync implementation with an async API cannot be wrapped to make it sync again.

Now a question may be whether, for situations where the implementations are synchronous anyway, it is a burden for users to have to use the asynchronous API when it is not needed. If so, one option would be to provide a sync and and async API, much like NodeJS does for many of its functions. It may require some reorganisation/duplication in our code and documentation, but we could consider this if the async approach is a show-stopper for some (potential) users; do leave a reply if that is the case for you.

Ability to use multiple layers of annotation.

This kind of thing cannot be represented in a DOM easily, unless we have a tree and a way for each tree to just represent the index of the annotations in it.

Working in a DOM:
The cat is on the roof

Not working in a DOM:
...............

I was envisionning something similar to (not following any W3C recommendation here, just an idea)

Something is here

<annotation_layer>is</annotation_layer>
<annotation_layer>Something is here</annotation_layer>
<annotation_layer>Something is</annotation_layer>
<annotation_layer>is here</annotation_layer>

But copying the text over and over is probably not a good idea…

‘Chunking’ abstraction

In recent calls we (especially @tilgovi — so feel free to improve my description) have discussed an approach to allow text selector matching/describing implementations on other ‘document models’ than the DOM. A typical use case would be a (web) application that uses some framework (ProseMirror, React, …) to display documents, and therefore would not want the result of anchoring an annotation to be a Range object, but rather something that matches their internal representation of the document.

A discussed requirement is also that the document can be provided piecemeal and asynchronously, so that an application can try anchor selectors on documents that are not fully available yet (or just not fully converted to text yet, think e.g. PDF.js). We have been calling such pieces of text ‘chunks’ for now.

Currently, our text quote anchoring function (in the dom package) is hard-coded to search for text quote using Range, NodeIterator, TreeWalker. When using the chunk approach, this functionality should be composed of two parts: one generic text quote anchoring function that takes a stream of Chunks of text; and one dom-to-chunk converter that uses TreeWalkers and such to present the DOM as a stream of text Chunks.

I am creating this issue to discuss what exactly a Chunk would be (a string?), and what a stream of chunks would be (an AsyncIterable<Chunk>?), and how our generalised anchoring functions interact with chunk providers (e.g. do we need an equivalent of Range, how do we pass back string offsets, …?). And also to discuss the assumptions and requirements (are we on the right track?).

Do not install husky git hooks when building a release tarball

I just discovered that husky will install its git hooks at the nearest git repository. I keep my dotfiles in git and was surprised to find husky hooks in my ~/.git/hooks directory, from when I was testing an Annotator release.

Here's my comment on a husky issue:

I'd like to use husky in a project that will be distributed as a tarball. I don't want husky to install hooks when users build the project.

Would it be possible to default to the old behavior of assuming that the git directory is at the same level as package.json, install no hooks if it's not found, and add a configuration switch for setting an explicit root? That would make the default safer while allowing those with different layouts the ability to still use husky.

typicode/husky#36 (comment)

I think we need to resolve this before release because I really don't want to install hooks into unknowable locations on users' machines.

error: context not equal to range's container

Hi, I'm trying to use annotator to allow generating an annotation from any arbitrary selection in a web page.

I'm having an issue with describeTextQuoteByRange from @annotator/dom.

If I modify the demo so that the context passed to describeTextQuoteByRange is anything other than the original value (selectable), it fails with the error:

Uncaught (in promise) Error: Context not equal to range's container; not implemented.
    at quote.js:48
    at Generator.next (<anonymous>)
    at step (asyncToGenerator.js:10)
    at _next (asyncToGenerator.js:25)
    at asyncToGenerator.js:32
    at new Promise (<anonymous>)
    at asyncToGenerator.js:5
    at describeTextQuoteByRange (quote.js:21)
    at HTMLDocument.onSelectionChange (mount.js:26)

I copied the approach used in the demo - I create a range using var selectableRange = document.createRange(), then add a node using selectableRange.selectNodeContents(el).

I tried using document.body as el, and various other elements larger than selectable, but they all give the same error. I also tried leaving out the context key from the options object, same result.

Can you provide any insight into how this should be done?

Fragment identifier license chore

A small script should replace the pegjs invocation in the fragement-identifier prepublish step. This script would programmatically invoke pegjs and prepend the project license header.

Fuzzy text quote matching

Many annotation tools want to match a quote also when it has been modified slightly, but we have yet to implement this.

Enabling approximate/fuzzy string matching could be an option to our existing implementation (perhaps a parameter that tells how fuzzy the match should be, where 0 means exact matching); alternatively it could be exposed as a separate implementation.

A second question is whether the matcher should return information about the quality of the match, and if so, how the API would look. I suppose a match object could have an extra attribute expressing the ‘match quality’; though in case of refined/range selectors we should figure out how to propagate this information.

Prior art we could borrow from:

Failing during `yarn start`

The tests reference a ../lib directory which is absent (perhaps auto-generated at runtime or something?) and it causes the following errors:

ERROR in ./packages/selector/test/index.mjs
Module not found: Error: Can't resolve '../lib' in 'C:\Users\byoung2\dev\annotator\annotator\p
ackages\selector\test'
 @ ./packages/selector/test/index.mjs 1:0-35 4:21-29
 @ ./node_modules/multi-entry-loader?include=./packages/*/test/**/*.mjs
 @ ./node_modules/mocha-loader!./node_modules/multi-entry-loader?include=./packages/*/test/**/
*.mjs
 @ multi chai/register-assert mocha-loader!multi-entry-loader?include=./packages/*/test/**/*.m
js

ERROR in ./packages/range/test/cartesian.mjs
Module not found: Error: Can't resolve '../lib/cartesian.mjs' in 'C:\Users\byoung2\dev\annotat
or\annotator\packages\range\test'
 @ ./packages/range/test/cartesian.mjs 5:0-47 48:19-26
 @ ./node_modules/multi-entry-loader?include=./packages/*/test/**/*.mjs
 @ ./node_modules/mocha-loader!./node_modules/multi-entry-loader?include=./packages/*/test/**/
*.mjs
 @ multi chai/register-assert mocha-loader!multi-entry-loader?include=./packages/*/test/**/*.m
js
i 「wdm」: Failed to compile.

Full log (containing the above) is below:

$ yarn start
yarn run v1.5.1
$ webpack-serve
i 「hot」: webpack: Compiling...
i 「hot」: WebSocket Server Listening at localhost:8081
i 「serve」: Project is running at http://localhost:8080
i 「serve」: Server URI copied to clipboard
i 「hot」: webpack: Compiling Done
× 「wdm」: Hash: 48894b03e48351b3f485
Version: webpack 4.1.1
Time: 13734ms
Built at: 2018-3-28 14:25:09
  Asset      Size  Chunks                    Chunk Names
demo.js   949 KiB    demo  [emitted]  [big]  demo
test.js  2.48 MiB    test  [emitted]  [big]  test
Entrypoint demo [big] = demo.js
Entrypoint test [big] = test.js
[./demo/index.js] 5.17 KiB {demo} [built]
[./demo/mark.js] 804 bytes {demo} [built]
[./node_modules/@babel/runtime/helpers/builtin/es6/asyncToGenerator.js] 709 bytes {test} {demo
} [built]
[./node_modules/chai/index.js] 40 bytes {test} [built]
[./node_modules/chai/register-assert.js] 38 bytes {test} [built]
[./node_modules/mocha-loader/index.js!./node_modules/multi-entry-loader/index.js?include=./pac
kages/*/test/**/*.mjs!./] ./node_modules/mocha-loader!./node_modules/multi-entry-loader?includ
e=./packages/*/test/**/*.mjs 776 bytes {test} [built]
[./node_modules/webpack-hot-client/client/index.js?69a6d200-876a-4461-8380-b5c66ef55c63] (webp
ack)-hot-client/client?69a6d200-876a-4461-8380-b5c66ef55c63 1.78 KiB {test} {demo} [built]
[./node_modules/webpack-hot-client/client/socket.js] (webpack)-hot-client/client/socket.js 1.3
7 KiB {test} {demo} [built]
[./node_modules/webpack/buildin/global.js] (webpack)/buildin/global.js 509 bytes {test} {demo}
 [built]
[./packages/dom/src/index.mjs] 1.74 KiB {demo} [built]
[./packages/fragment-identifier/src/index.mjs] 1.31 KiB {demo} [built]
   [0] multi ./demo/index.js 28 bytes {demo} [built]
   [1] multi chai/register-assert mocha-loader!multi-entry-loader?include=./packages/*/test/**
/*.mjs 40 bytes {test} [built]
   [2] multi webpack-hot-client/client?69a6d200-876a-4461-8380-b5c66ef55c63 ./demo/index.js 40
 bytes {demo} [built]
   [3] multi webpack-hot-client/client?69a6d200-876a-4461-8380-b5c66ef55c63 chai/register-asse
rt mocha-loader!multi-entry-loader?include=./packages/*/test/**/*.mjs 52 bytes {test} [built]
    + 209 hidden modules

ERROR in ./packages/selector/test/index.mjs
Module not found: Error: Can't resolve '../lib' in 'C:\Users\byoung2\dev\annotator\annotator\p
ackages\selector\test'
 @ ./packages/selector/test/index.mjs 1:0-35 4:21-29
 @ ./node_modules/multi-entry-loader?include=./packages/*/test/**/*.mjs
 @ ./node_modules/mocha-loader!./node_modules/multi-entry-loader?include=./packages/*/test/**/
*.mjs
 @ multi chai/register-assert mocha-loader!multi-entry-loader?include=./packages/*/test/**/*.m
js

ERROR in ./packages/range/test/cartesian.mjs
Module not found: Error: Can't resolve '../lib/cartesian.mjs' in 'C:\Users\byoung2\dev\annotat
or\annotator\packages\range\test'
 @ ./packages/range/test/cartesian.mjs 5:0-47 48:19-26
 @ ./node_modules/multi-entry-loader?include=./packages/*/test/**/*.mjs
 @ ./node_modules/mocha-loader!./node_modules/multi-entry-loader?include=./packages/*/test/**/
*.mjs
 @ multi chai/register-assert mocha-loader!multi-entry-loader?include=./packages/*/test/**/*.m
js
i 「wdm」: Failed to compile.

I changed the ../lib references to ../src in that case...but it happened again in another test. I'm not sure (yet) if this was a path change that got missed in the code, or if there's something not running at runtime that should be run. 🏃‍

Build a highlighter

This does eek into the realm of UX, so I want to proceed cautiously here. However, juggling the "mixed tree" environment discussed in #21 is likely to be a near everyday occurrence for folks building annotation tools.

If we can collectively solve that issue well, we'll all win. 😁

Range Normalization

We're currently using the range-normalize in the describeTextQuoteByRange function. This package is useful, but I once submitted a complete rewrite that has no external dependencies: webmodules/range-normalize#2

I would like to pull that into tree.

Build infrastructure

Set up build infrastructure for outputting ESM and CJS builds for the supported node/browser platforms.

Upstream bug in babel?

I had to edit node_modules/@babel/runtime/helpers/builtin/es6/wrapAsyncGenerator.js
and change the AsyncGenerator.js include to asyncGenerator.js

I couldn't find that in the original babel code, they seem to use a generator.

Filed a bug upstream: babel/babel#6938

I

Add composer.json to let PHP developers to keep track of annotator.js on packagist.org

Hello!

I would like to propose the composer.json file and ask you to add you library on packagist.org. It will be easy to PHP developer keep track of your updates and refer your library as project dependency. I bet many CMS and blogs developers would take advantage of this facility.

Unfortunately, all I can do is propose this file. If you agree, you have more 3 steps:

i) create packagist.org account;
ii) submit your package and
iii) wire github webhooks in packagist

I would love to have your library on packagist.org

Explore WorkerDOM

WorkerDOM is part of the growing list of AMP Project open source and potentially standardized outputs.

This one in particular could be useful for highlighting implementations:

Use Cases:

  1. Embedded content from a third party living side by side with first party code.
  2. Mitigation of expensive rendering for content not requiring synchronous updates to user actions.
  3. Retaining main thread availablity for high priority updates by async updating elsewhere in a document.

Firefox selection weirdness in the demo

When I select content in the left-hand box in the demo, the selection jumps outside of the box and sometimes leaves earlier selections in place. Double clicking a word works fine, however.

Also, if I double click near the , or . in the sentence the highlighting happens within the scope of the whole page--not just within the right-hand side box.

Chrome does seem to have these issues.

This trouble is in Firefox Dev Edition 57.0b12 (64-bit).

Create tests for dom package

I made a first small step in the dom-tests branch. Not yet sure what approach to take, but I thought to just start with some tests for text quote anchoring on small, artificial html documents, to test for various kinds of edge cases.

A technical hurdle is that to test things in Node, we would need to run the code inside e.g. jsdom; does anybody know a simple way to do this? Something like jsdom-global seems convenient, but that module looks outdated. jest provides a jsdom environment by default, but we’d have to switch everything from mocha to jest.

Currently I run tests in the browser (using yarn start), which works but I wonder how to get more useful output messages (it would not give me inlineDiffs). Help welcome.

yarn install errors on a clean checkout

I just did a clean checkout, and when I run yarn (or yarn install), I get this:

yarn install v1.2.1
error An unexpected error occurred: "patterns.map is not a function".
info If you think this is a bug, please open a bug report with the information provided in "C:\\Users\\user-name\\dev\\annotator\\incubator-annotator\\yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

Yarn version: 1.2.1
Node version: 8.8.1
Platform: win32 x64

Trace:
  TypeError: patterns.map is not a function
      at C:\Program Files (x86)\Yarn\lib\cli.js:53963:78
      at Generator.next (<anonymous>)
      at step (C:\Program Files (x86)\Yarn\lib\cli.js:92:30)
      at C:\Program Files (x86)\Yarn\lib\cli.js:110:14
      at new Promise (<anonymous>)
      at new F (C:\Program Files (x86)\Yarn\lib\cli.js:29382:28)
      at C:\Program Files (x86)\Yarn\lib\cli.js:89:12
      at Config.resolveWorkspaces (C:\Program Files (x86)\Yarn\lib\cli.js:53999:7)
      at C:\Program Files (x86)\Yarn\lib\cli.js:22025:49
      at Generator.next (<anonymous>)

Incorporate Web Annotation testing tools

This will ease testing (by us or others) of the Web Annotation JSON documents that we (and others) pass to the client libraries we're currently building.

There are several repos which contain code we can use:

Web Annotation Data Model

Web Annotation Protocol

Web Annotation Vocab

All of these are licensed such that we can incorporate them into this code. The command line protocol tester (which uses mocha.js) is probably the thing of most interest for us--especially if we expand it to use the JSON Schema's which are used for W3C validation.

Add support for ARIA Annotations output

I'm contributing to a draft of a spec called ARIA Annotations. It will be going up for review within the ARIA WG fairly soon. I also wanted to surface it here because implementing it could help a wide range of folks.

The core (if you're familiar with ARIA) leans on the aria-details property and then provides some clarifying roles (which mostly have Web Annotation Data Model analogues) to differentiate annotations from each other (for easier find-ability, potentially adding additional UX affordances, etc).

Here's an example:

<p><mark aria-details="comment-123">The proposal</mark> contains...</p>
...
<div id="comment-123" role="annotation-commentary">This proposal is great!</div>

It should be fairly simple for us to make annotations (which have annotation bodies) be expressible this way. There's also a bonus value (for us) when managing interleaving highlights--we can use a single aria-details value across multiple DOM nodes which can help use curate (edit/remove) annotations which cross DOM boundaries as if they were a single whole (see #47 for explanation of that issue).

So, an annotation crossing DOM boundaries might look like this:

<p><mark aria-details="comment-123">The</mark> sort of OK <mark aria-details="comment-123">proposal</mark> contains...</p>
...
<div id="comment-123" role="annotation-commentary">I think it's more than OK! I think the proposal is great!</div>

Now, the two <mark> tags could be conceptual (and technically by Apache Annotator) as a single whole. The overhead, though, is that using aria-details requires a DOM element to point at (i.e. the annotation body), so it doesn't cover simple highlighting usage (because those have no annotation body...).

Would love thoughts and ideas for implementing this (noting, though, that we don't yet have an "annotation body" API...). 😃

Add apache-annotator meta-package

In order to follow suggestions for releasing npm packages as apache-<project> without violating the spirit of producing convenience packages from released source code, I copied, unmodified, the two packages (selector and dom) into subdirectories of a new apache-annotator package and published that to GitHub with an abridged README.

If we complete #89, I think it should be safe to publish the individual packages to the annotator scope on npm because the individual packages will all make clear that they belong to apache-annotator. However, even leaving aside policy motivations it is a great convenience to users if we have a meta-package, I think, especially while we are still experimenting with our internal code organization.

Let's check in an apache-annotator meta-package with a clear README and have it depend on the other packages. That will get us in a place to publish directly and simply with lerna for the next release.

Adding the demo to the site

Right now the demo in this repo is as much a dev environment as it is a demo--which is great! 😸

However, I'd also like to package up the demo code (occasionally) and add it to the apache/incubator-annotator-website repo.

I'm new to webpack, so untangling the dev/build environment bits is what I need guidance in.

Thanks!
🎩

Document Identity Determination?

Curious to get thoughts from everyone on whether having document identification determination code would be useful for this project.

By "document identification determination" I mean the process of sorting out which one (or more!) identifiers should be stored as the target.

For instance:

GET /?utm_source=twitter&utm_medium=social
Host: http://example.com/
<html>
<head>
  <base href="http://cdn.example.com/">
  <link rel="canonical" href="http://www.example.com/">
  <link rel="latest-version" href="index.html">
  <link rel="working-copy" href="newer.html">
  <link rel="ogp:url" href="https://www.example.com/">
  <link rel="schema:url" href="https://www.example.com/index.html">
</head>

The ?utm_ prefixed query param are typical marketing-bot tracking thingies.
The canonical rel is from https://tools.ietf.org/html/rfc6596
The latest-version and working-copy rel's are from https://tools.ietf.org/html/rfc5829
The ogp:url is from http://ogp.me/
The schema:url is from http://schema.org/

At some level all (or most) of these are the same (presumably 😉). However, determining their "sameness" is outside of the scope of an annotation tool (I'd reckon), but storing the right one (or more) is mandatory for the annotation to make sense.

What I'm wondering is if we should provide a basic retrieval mechanism for determining the existence and potential value of them to the annotation. At the very least it would be handy to get back a list of all stated identifiers for the current document.

Real world scenario (which I just tripped over) is W3C Editorial Draft specs with GitHub URLs (or hosted locally) have their future Technical Recommendation (TR) URLs set as the rel="canonical" (which is injected by ReSpec post-page loading). Consequently, annotating the Verifiable Claims Data Model is hampered if only the canonical URL is stored (because it's not yet hit TR).

It's that "other" part of annotation creation that's so fun. 😁

💭's?

Documentation link error

Hello,
The link development documentation README file seems not connected. Could you look on?

Also, is there any document which describes this project more clearly?

Thanks

Support TextPositionSelector (in the dom package)

Following its specification.

Altough it looks simple, there may be challenges in ensuring we count characters correctly. From the spec (in the TextQuoteSelector section, but that is then referred to by the TextPositionSelector section):

The selection of the text MUST be in terms of unicode code points (the "character number"), not in terms of code units (that number expressed using a selected data type). Selections SHOULD NOT start or end in the middle of a grapheme cluster. The selection MUST be based on the logical order of the text, rather than the visual order, especially for bidirectional text. For more information about the character model of text used on the web, see charmod.

The text MUST be normalized before recording in the Annotation. Thus HTML/XML tags SHOULD be removed, and character entities SHOULD be replaced with the character that they encode.

The referenced ‘charmod’ (Character Model for the WWW) has a section on string indexing that may be relevant.

What still confuses me a little is what constitutes the exact text of a DOM. Given that normalisation should (why not must?) remove html tags, I suppose this assumes we deal with the source html.

What then to do with comments: are those text, or are their <!-- and --> parts to be removed? In the latter case, would the document’s total text equal the textContent of all children of the Document? (one may think document.documentElement.textContent, but that excludes whitespace and comments outside the <html> element)

Possibly more problematic, can one even access the source html accurately enough through the DOM? Might a source parser have modified whitespace, thus leading to miscounts? I am not even talking about executed scripts that may modify the DOM too, I suppose we have to disregard that scenario.

Of course there are implementations already whose approach and behaviour we could copy, but it may be good to do the exercise of implementing based on the spec to ensure that it matches up, also to help detect discrepancies between implementations and spots where the spec may need to be improved/updated.

Any differences in implementations would likely result in misanchored annotations, so doing this imprecisely seems of little value; unless the use is explicitly limited to only apply to e.g. selector refinement within text nodes, which could be a strategy to take.

@tilgovi (or others): what are your thoughts about this, and about the implementation as it is done in dom-anchor-text-position, in Hypothesis, or elsewhere?

Multiple packages have conflicting node version dependency

i was unable to proceed further than yarn install, i'm interested to contribute to the project but unable to setup the environment. i tried on node version @6.11.5 and yarn @1.5.1

error @shellscape/[email protected]: The engine "node" is incompatible with this module. Expected version ">= 7.6.0".
error An unexpected error occurred: "Found incompatible module".

And when i try with node 7.6.0, i get below error,
screenshot 25

LICENSE and NOTICE in packages

If packages are to be published to NPM, I think they should each have their own copy of LICENSE and NOTICE. It's possible we should concatenate all the NOTICE files into the root NOTICE file when preparing the "official" Apache release, or maintain such an aggregation by hand, but I think we want both.

  • When we publish to NPM, I want individual packages to have licenses and notices for their constituent code.

  • When we publish a source release of the whole repo, I want a LICENSE and NOTICE file that conforms to ASF expectations.

Web Annotation Protocol Server

This has come up a few times, but I'd like us to discuss building a Web Annotation Protocol server implementation as part of this project.

It was part of the original project proposal.

The application mention in the Proposal was MangoServer which is JavaScript-based (required MongoDB), but hasn't really been maintained since 2017--not a blocker, per se, but something to consider.

The Web Annotation Protocol, based on the Linked Data Platform (LDP) is fairly lightweight and ostensibly could be built upon existing LDP code.

A handful of LDP implementations exist in and around Apache land:

It would also be very possible to build an implementation that uses a JSON document database (like Apache CouchDB).

If we ignore authentication--pushing that to a different layer of the application--then creating this server code should be fairly minimal work, and complete our "stack" for those wanting to do front-to-back annotation stuff.

Anyone interested? 😁

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.