fortext / catma Goto Github PK
View Code? Open in Web Editor NEWComputer Assisted Text Markup and Analysis
Home Page: https://www.catma.de
License: GNU General Public License v3.0
Computer Assisted Text Markup and Analysis
Home Page: https://www.catma.de
License: GNU General Public License v3.0
This needs some discussion of how the optimal solution could look like.
See issue #48
Some proposals:
Dependent on the order in which tags marked visible shorter tags sometimes disrupt longer tags. Longer tags should be inserted before shorter tags into the DOM. The problem is that this involves temporary removal and reinsertion of tags and the browser performance on this is already poor.
The handbook should be accessible from within the program.
Especially when the page zoom is set to 100% and when visualizing many tags at once the browser has performance problems while manipulating the DOM. For bulk operations a server side computation or performing the manipulation on a DOM copy with later replacement of the whole DOM with its copy could be a solution.
The menu has the disadvantage that it gets easily overlapped by other windows. It would be nice to have some kind of quickly accesible mission control center that gives an birdeye view onto the CATMA desktop.
Allow the analysis of multiple Source Documents by multi selection in the Document Manager. This is something like an adhoc temporary Corpus creation.
Compute a collocation analysis similar to what CATMA 3 offered.
The timestamp of each tagging operation should be recorded with a system property.
Consider displaying values as a comma separated list together with the property name. That way it would be easier when checking property values of tag instances. See issue #47.
openID.org, Yahoo, ...
This is difficult. Right now when untagging search results we use the markup of the results to to the untagging. With phrase results there is no markup information. The markup information needed to be gathered afterwards from the relevant user markup collections. Next problem is that the phrase results do not have to match tagged phrases exactly. So do we remove the tag from the phrase only? This would involve tag splitting. Or do we remove the whole tag?
Link the distribution chart with the original Source Documents.
in principle user identification should be case sensitive, we use email addresses for identification, usually those are case insensitive...
When restricting the analysis to a subset of documents, the property query doesn't respect that.
More export formats like OWL and HTML5 would be nice.
Index options are stored and used for indexing and during search operations by the Analyzer module. When analyzing corpora different Source Documents of a Corpus can have different index options, up to one for each Source Document. So what's needed is the possibility for the user to select the index options to be used for a specific Analyzer session and the possibility to change the index options and perform a reindex. The latter is important because in the end you would want to give all the Source Documents in a Corpus the same index options to be able to combine the search results.
needs verification as this could not be reproduced in development system
shared state should be recorded in a database flag
items need a different rendering in GUI
The algorithm for the distribution chart and the chart itself is not well suited for corpora. This is partly due to problems with the comparison of word frequencies of texts of different lengths.
this is rather difficult, as a termporary solution we added the instance id column to the list of Tag Instances so they can at least be distinguished by clicking on them
maybe border-radius could be an easy solution, doesn't help with discontinuous tags though: the spotting of parts that belong to a tag instances of discontinous tag references is difficult for the user
Export the meta data of the Source Document as a cite-link. This is probably a nice feaure for other meta data, too.
A copy/pase functionality for tag and property nodes.
This involves turning on background processing for certain long time processes. Current background processing level is too deep and error prone.
should open by doubleclick on property name
additionally open all properties for editing by doubleclicking on tag instance
The division into chunks for the distribution chart could be enriched by computing page counts from the percent-based chunks and the page total of the Source Document. If known by the user the page total can be captured on document addition.
CATMA 3 could generate basic Static Markup in the form of TEI encoded paragraphs. We would like to improve the generation of this simple Static Markup. Additionally we want to support pure XML documents as Source Documents where the XML elements are loaded as Static Markup. There will be indexing of Static Markup and a query that let's the user search for it. Taking a simple TEI encoded document as an example: let's assume there are TEI encoded chapters with
The "Result by markup" tab could show the Markup that is attached to the results of the "Result by phrase" tab.
The database repo already supports sharing. The GUI and a notification mechanism are missing.
Always bring the "Result by phrase" tab to the front when returning from a non-Tag-based query. Remaining in the "Result by markup" tab gives the wrong impression that there are no results.
To be consistent throughout the Query Builder wizard pages, it's less confusing to show preview results by button click only and remove the focus left actions of some pages.
When tagging kwic results the Analyzer asks for the destination User Markup Collection. A sort of "remember my decision" mechanism for the analyzer session would be convenient.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.