fortext / catma Goto Github PK

View Code? Open in Web Editor NEW

83.0 14.0 9.0 117.28 MB

Computer Assisted Text Markup and Analysis

Home Page: https://www.catma.de

License: GNU General Public License v3.0

CSS 12.26% HTML 0.05% GAP 0.70% Java 86.00% XSLT 0.02% SCSS 0.97%

java webapp annotations text-markup text-analysis digital-humanities

catma's People

Contributors

Stargazers

Watchers

Forkers

evelyn-gius alexalias mathias-goebel hagerhaf dondealban mariapapadopoulou pcdi lenguyenphuc2003 fpasquinisantos

catma's Issues

Tag Manager/Tagger: the way we display properties and their values seem to be suboptimal

This needs some discussion of how the optimal solution could look like.
See issue #48
Some proposals:

decrease identation of property nodes (and possibly of tag nodes as well)
do not show properties as tree nodes at all, to prevent confusing them with tag nodes

Visualizations: editable labels (especially for export purposes)

Tagger: prefer to display tags in a discontinous manner

Dependent on the order in which tags marked visible shorter tags sometimes disrupt longer tags. Longer tags should be inserted before shorter tags into the DOM. The problem is that this involves temporary removal and reinsertion of tags and the browser performance on this is already poor.

Integrate the new handbook as HTML

The handbook should be accessible from within the program.

Visualizations: predictable order of documents in distribution graph

sharing: readonly taglibraries are writable

Tagger: possibility to close User Markup Collections

Tagger: performance problems when visualizing tags

Especially when the page zoom is set to 100% and when visualizing many tags at once the browser has performance problems while manipulating the DOM. For bulk operations a server side computation or performing the manipulation on a DOM copy with later replacement of the whole DOM with its copy could be a solution.

sharing: share changes of corpora as well

add a mission control center

The menu has the disadvantage that it gets easily overlapped by other windows. It would be nice to have some kind of quickly accesible mission control center that gives an birdeye view onto the CATMA desktop.

Allow multiple selection in Document Manager

Allow the analysis of multiple Source Documents by multi selection in the Document Manager. This is something like an adhoc temporary Corpus creation.

Collocation analysis

Compute a collocation analysis similar to what CATMA 3 offered.

sharing: allow resharing from readable to writable

Tagger: add a system property "catma_tagtime"

The timestamp of each tagging operation should be recorded with a system property.

Proper internationalization of the user interface

corpus: prevent adding the same sourcedocument twice

sharing: when sharing corpora sourcedocuments must be handled independently from UMCs

Tagger: property value display

Consider displaying values as a comma separated list together with the property name. That way it would be easier when checking property values of tag instances. See issue #47.

Add more openID providers to the login dialog

openID.org, Yahoo, ...

Tagger: do not break lines before a whitespace

Analyzer: untag phrase results

This is difficult. Right now when untagging search results we use the markup of the results to to the untagging. With phrase results there is no markup information. The markup information needed to be gathered afterwards from the relevant user markup collections. Next problem is that the phrase results do not have to match tagged phrases exactly. So do we remove the tag from the phrase only? This would involve tag splitting. Or do we remove the whole tag?

Add code documentation

Analyzer: default sort order of results should be "by frequency descending"

Visualizer: clickable distribution charts

Link the distribution chart with the original Source Documents.

sharing: make user identification case insensitive

in principle user identification should be case sensitive, we use email addresses for identification, usually those are case insensitive...

Tagger: possibility to remove Tagsets from the "Active Tagsets" tab

Analyzer: the property query does not respect the restriction on certain documents

When restricting the analysis to a subset of documents, the property query doesn't respect that.

Markup export formats

More export formats like OWL and HTML5 would be nice.

Corpus support with zip uploads

Full support for index options

Index options are stored and used for indexing and during search operations by the Analyzer module. When analyzing corpora different Source Documents of a Corpus can have different index options, up to one for each Source Document. So what's needed is the possibility for the user to select the index options to be used for a specific Analyzer session and the possibility to change the index options and perform a reindex. The latter is important because in the end you would want to give all the Source Documents in a Corpus the same index options to be able to combine the search results.

Visualizer: display of additional results to existing tab failed

needs verification as this could not be reproduced in development system

sharing: mark shared items as such

shared state should be recorded in a database flag
items need a different rendering in GUI

Visualizer: better corpus support

The algorithm for the distribution chart and the chart itself is not well suited for corpora. This is partly due to problems with the comparison of word frequencies of texts of different lengths.

Tagger: visual separation of adjacent tag instances of the same tag definition

this is rather difficult, as a termporary solution we added the instance id column to the list of Tag Instances so they can at least be distinguished by clicking on them
maybe border-radius could be an easy solution, doesn't help with discontinuous tags though: the spotting of parts that belong to a tag instances of discontinous tag references is difficult for the user

export Source Document meta data as cite-link

Export the meta data of the Source Document as a cite-link. This is probably a nice feaure for other meta data, too.

Ability to move/copy tags and properties between tagsets/tags

A copy/pase functionality for tag and property nodes.

Better feedback about what's going on while the backend is working

This involves turning on background processing for certain long time processes. Current background processing level is too deep and error prone.

update all user markup collections in a corpus by dragging a tagset onto the corpus

Analyzer: integrate the property query into the Query Builder

Visualizer: make distribution charts exportable

Tagger: edit property value dialog

should open by doubleclick on property name
additionally open all properties for editing by doubleclicking on tag instance

Visualizations: capture page total and use it for distribution chart

The division into chunks for the distribution chart could be enriched by computing page counts from the percent-based chunks and the page total of the Source Document. If known by the user the page total can be captured on document addition.

Load and index static markup

CATMA 3 could generate basic Static Markup in the form of TEI encoded paragraphs. We would like to improve the generation of this simple Static Markup. Additionally we want to support pure XML documents as Source Documents where the XML elements are loaded as Static Markup. There will be indexing of Static Markup and a query that let's the user search for it. Taking a simple TEI encoded document as an example: let's assume there are TEI encoded chapters with

sections. A user could then search for all occurrences of the word "love" in the first chapter with "love" where static="div[@n=1 and @type=chapter" (syntax of the static query is not specified yet). This would work with any XML not just TEI.