fraunhoferisst / trend Goto Github PK

View Code? Open in Web Editor NEW

5.0 3.0 2.0 974 KB

Traceability Enforcement of Datatransfers (TREND)

Home Page: https://fraunhoferisst.github.io/TREND/

License: Other

Kotlin 99.44% Dockerfile 0.26% HTML 0.08% JavaScript 0.22%

data-sovereignty steganography watermarking information-hiding

trend's People

Contributors

Stargazers

Watchers

Forkers

eschrewe gemdav

trend's Issues

Analyse and Improve Watermark Compression

🚀 Feature Request

Current Problem

The compression of a watermark can be used as an optional feature inside the watermarker library. There are a lot of other, partly better compression algorithms available.

Proposed Solution

It should be analyzed and checked if other compression algorithms are available in scientific publications or state-of-the-art implementations that can be used.

Additional Context

Notice: Remember that the build target of the watermarker library can be set to Java or JavaScript. When implementing an external library, it must be available in Java and JavaScript.

Add Generated Watermarker Library Documentation

There should be a pipeline to create auto-generated documentation for the watermarker library. The documentation should be created based on the comments in the source code.

It should further be checked if publishing it via GitHub pages in a gh-pages branch makes sense or if there are better possibilities.

Dependabot Fails Updating kvisionVersion

🐞 Bug Report

Describe the Bug

In the webinterface, the kvisionVersion is defined two times:

As shown by #25 , Dependabot only updates one of the version numbers.

To Reproduce

Steps to reproduce the behavior:

Wait for another kvisionVersion update PR from dependabot for the webinterface
See that only one version number is updated

Expected Behavior

Both version numbers should be updated.

System Information

Doesn't matter in this case.

Additional Context

./.

Add More Details to the Contributing File

🚀 Feature Request

Current Problem

Currently, this repository uses a squash and merge strategy with conventional commits and pull requests linked to issues. All of those information are missing in the CONTRIBUTING.md file.

Proposed Solution

Update the CONTRIBUTING.md file and add all relevant information beside the code style.

Additional Context

./.

Increase convenience of the StartEndSeparatorChars separator strategy

🚀 Feature Request

Current Problem

The StartEndSeparatorChars separator strategy is not very convenient to use as the users of the library have to define the start- and end-separators themselves.

TREND/watermarker/src/commonMain/kotlin/fileWatermarker/TextWatermarker.kt

Line 31 in 261b35c

 class StartEndSeparatorChars(val start: Char, val end: Char) : SeparatorStrategy() 

Proposed Solution

It would be nice to have default start- end end-separator characters defined, such that the users of the library do not have to look for promising white space charaters themselves. I am thinking of something like two additional constants in the DefaultTranscoding object (there is already one for the SingleSeparatorChar strategy).

TREND/watermarker/src/commonMain/kotlin/fileWatermarker/TextWatermarker.kt

Line 56 in 261b35c

const val SEPARATOR_CHAR = '\u2004' // Three-per-em space

Possible extension:

const val SEPARATOR_CHAR = '\u2004' // Three-per-em space

const val START_SEPARATOR_CHAR = '\u2004' // Three-per-em space
const val END_SEPARATOR_CHAR = '\u2005' // Four-per-em space (or whatever replaces normal white spaces best)

That's where, in my opinion, a small refactoring of the code would also contribute to its convenience. The SingleSeparatorChar and StartEndSeparatorChars should get default arguments in their constructor in order to easily get a default instance of these classes.

Additional Context

Why choose the `StartEndSeparatorChars` strategy at all?

Adding watermarks to a text using the default SingleSeparatorChar separator strategy is relatively inflexible in use cases like watermarking of mails or collaboratively created document where there are mutliple parties that all might want to add their own watermark: When two watermarked texts are merged, the last repetition (fragment) of the first watermark and the first repetition of the second watermark "blur" into one another, as the last repetition of the first watermark most likely does not perfectly end on its separation character.

For example when reading the watermark from this merged text ...

# Contains (complete) watermarks ["AA", "AA"] + fragment
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
# Contains (complete) watermarks ["BB", "BB"] + fragment
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.

... it does not result in ["AA", "AA", "BB", "BB"] but in ["AA", "AA", "A\t\t\u0001", "BB"]. This is due to the last repetition of the first watermark being fragmented. It does not conclude with a separation character and is thus interpreted as part of the next watermark.

Now one could propose a solution to this problem that includes removing the watermarks from both texts and re-adding them to the whole text in a combined way (e.g. "AA;BB", but in my eyes that procedure has 2 main disadvatages:

The information on which watermark belongs to which part of the text gets lost
The watermark becomes less robust as removals of characters threaten the integrity of the watermark-combination and thus all watermarks and not only one.

That's where the StartEndSeparatorChars strategy comes in handy, as it prevents watermarks from blurring into one another in the first place.

Add Support for all Watermarker in the Webinterface

🚀 Feature Request

Current Problem

Currently, the watermarker library is able to watermark Strings / .txt files and .zip files. However, the webinterface only supports watermarking for Strings. Adding a watermark to a .zip or .txt file is impossible.

Proposed Solution

Add an upload functionality to be able to watermark all supported file types of the watermarker library in the webinterface.

Additional Context

./.

Adjust usage of Pako library once Kotlin Bug is fixed

🐞 Bug Report

Describe the Bug

The current implementation of the watermarker library has a workaround implemented to prevent a crash caused by a bug in Kotlin. This workaround should be removed once the bug is fixed in Kotlin.

Additional Context

Adjust the usage of the Pako library in jsMain/kotlin/helper/Compression.kt as follows as soon as this bug is fixed:

The functions in the external object Pako should change:
- fun deflateRaw(data: IntArray, options: Any? = definedExternally): IntArray ->
  fun deflateRaw(data: UByteArray, options: Any? = definedExternally): UByteArray
- fun inflateRaw(data: IntArray, options: Any? = definedExternally): IntArray ->
  fun inflateRaw(data: UByteArray, options: Any? = definedExternally): UByteArray
The functions Compression.{inflate, deflate} must be changed accordingly

Remove file specific watermarks

🚀 Feature Request

Current Problem

Currently the library has specific Watermark classes depending on the source of the watermark (e.g., TextWatermark for Watermarks extracted from text files) that can contain additional information (e.g., the positions of the watermarks in the text file).

These specific watermarks can lead to confusion (e.g., is a TextWatermark a watermark from a text file or a watermark from any file that contains plain text?). It also leads to naming problems (e.g., How should we name a watermark containing plain text when we already have a TextWatermark?).

Proposed Solution

Remove the file specific watermarks. The additional information are currently not used anywhere. We can add them later in a better way if required.

Additional Context

Once this is done we can rename Textmark to TextWatermark.

Make watermarker library usable as JavaScript library

🚀 Feature Request

Current Problem

The watermarker library is usable in Java and Kotlin. It is possible to use the library in the browser by using a Kotlin frontend like KVision. However, it is currently not possible to import the library into plain JavaScript.

Proposed Solution

Evaluate what is required to export the library as JavaScript library

Additional Context

More information: https://kotlinlang.org/docs/js-to-kotlin-interop.html

Add Codecov

Kover is currently used to generate test coverage reports. Codecov should be added to better display the reports and directly include them in pull requests.

This issue is currently blocked by the following upstream issue for integrating Kover with Codecov: Kotlin/kotlinx-kover#16

Increase Watermark Robustness

🚀 Feature Request

Current Problem

When changing a watermarked text, the watermark inside the cover text can get destroyed. This can occur by moving sentences inside a text, deleting content, adding new content, or copying existing content.

Proposed Solution

The overall robustness of the watermarker library needs to be increased.

One possible example:
If a small watermark is included inside an extended cover text (e.g., 10 times), the watermarker library should be able to extract the watermark even if 4 of the 10 watermark repetitions got destroyed.

Additional Context

If a control char is implemented (see #18), the watermarker library needs a strategy if this control char gets destroyed.

Update GitHub Actions to Latest Node Version

🚀 Feature Request

Current Problem

The following information is displayed in the GitHub actions, like the build of the GitHub page for the documentation:

Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: gradle/gradle-build-action@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

See: https://github.com/FraunhoferISST/TREND/actions/runs/9204230070

Proposed Solution

Update the workflows so that the latest versions are used.

Additional Context

Note that there are changes in the gradle-build-action. See the README of their GitHub Repo for more information.

Add Watermarker Library Text-Based Documentation

🚀 Feature Request

Current Problem

Currently, the documentation of the watermarker library is only available as in-line code blocks. People unfamiliar with the project don't know how to use the library in Java and JS code.

Proposed Solution

Besides the automated documentation generation from code (as discussed in #9), there should be text-based documentation. It should contain additional details and all relevant information on using the watermarker library. Therefore, a Getting Started Guide or Quick Start Guide is needed for newcomers.

Further, it needs to be checked if the structure should be based on existing templates like arc42.

Documentation Framework

The documentation should be published via GitHub Pages and made available directly via this repository. It might make sense to use the gh-pages branch for it.
A framework is needed to create a baseline for the documentation. Existing frameworks like Docusaurus, Just the Docs, Docsify, Nextra, etc., should be checked and evaluated. A necessary requirement is that the documentation should be written in Markdown so that it is easy to integrate inline source code and be independent of the framework itself.

Pipeline

Additionally, a GitHub action pipeline might be needed to deploy and update the documentation directly without manual work.

Additional Context

After the first version of the documentation is created, the pull request template should be updated with the checklist (like a definition of done or acceptance criteria) to check that the documentation is still up-to-date after every code change.

Use Architecture Decision Records (ADR)

🚀 Feature Request

Current Problem

During the development of this project, different architectural decisions were made. For external people, it is hard to understand why different decisions are made and what the background is. Further, discussions can start in the future from aspects already discussed in the past.

Proposed Solution

To prevent duplicate discussions and be transparent, architecture decision records (ADRs) should be used. There should be an ADR template with a location where all ADRs are stored (like in a docs/adr folder). All ADRs should have the same structure and be easily accessible (in Markdown format).

Additional Context

Existing open source available examples or templates should be used.

Enable Runtime Security of GitHub Actions

In order to increase the runtime security of GitHub actions, it should be checked if harden-runner can be used.

Create GitHub Issue Templates

GitHub issue templates should be used and integrated for this repository. There should be at least three different templates for:

Bug reports
Feature request
Any other things

Improve JS compatibility by using types that are supported by JS

🚀 Feature Request

Current Problem

Some of the types we expose are not supported by JavaScript. These types can only be passed to other functions that take them as arguments, but it is not possible to access or modify their data.

Proposed Solution

To improve the usability of our library in JavaScript it should be evaluated if some unsupported types can be replaced by supported types.

Additional Context

Such a change was already done by changing the type of placement in TextWatermarker from Sequence<..> to List<..> (see here)

Related: #40

Use Control Char / Tag as Watermarking Start

🚀 Feature Request

Current Problem

The watermarker library is able to add a watermark with or without compression. When analyzing a watermarked text or file, the library needs to know which type of watermark is used (compressed, uncompressed, specific format, etc.). This is currently not possible. There might be specific use cases that have specific requirements towards the style, compression, linting or format of the watermark.

Proposed Solution

Every watermark should start with a 2-digit control character (like a number) that identifies the type of watermark. Using a 2-digit control char instead of a 1-digit allows to have a bigger namespace for future formats.

Example:
Instead of adding Test as a watermark, 00Test will be watermarked if the watermark is uncompressed, 01Test will be added as a watermark if the watermark is compressed.
The first control char must be inserted without compression to get it working.

Additional Context

The other components, like the CLI tool and the webinterface, need to be updated after the issue is implemented since it is a breaking change.

Further, a table in the documentation is needed to document the control char and its meaning, for example:

Control Char	Meaning
`00`	Uncompressed Watermark
`01`	Compressed Watermark using X compression technique
`02`	Specialized compression for use case Y
`03`	...
...	...

unzip shows warnings and errors on watermarked zips

🐞 Bug Report

Describe the Bug

When trying unzip a watermarked zip file the unzip command shows warnings and sometimes even errors:

Archive:  multiple_files_watermarked.zip
warning [multiple_files_watermarked.zip]:  32 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  32
  (attempting to re-compensate)
 extracting: a.txt
A

error: invalid zip file with overlapped components (possible zip bomb)
 To unzip the file anyway, rerun the command with UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE environmnent variable

unzip version:

UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

To Reproduce

Steps to reproduce the behavior:

go to samples/
execute unzip -c multiple_files_watermarked.zip

Expected Behavior

Files are extracted without warnings or errors

System Information

Additional Context

It might not be possible to prevent a warning. It depends on the implementation of the specific application.
Details about the file format: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Remove Maven Local Repository

Current Problem

In order to build and run the CLI tool or the webinterface, the watermarker library has to be built locally and published to maven local.

Proposed Solution

It should be checked whether it makes sense to publish the watermarker library directly (in GitHub, Docker Hub, etc.) to make it directly usable without an additional build and publish step.

Additional Context

After the watermarker library is published, the mavenLocal() repository can be removed from the CLI tool and the webinterface.

Monitor GitHub Action Permissions

In order to strengthen security, actions-permissions should be configured to assign the correct permissions to GitHub actions.

Include a function to calculate available insert positions in a text

🚀 Feature Request

Current Problem

Currently, while there is a function to create the required insert positions for a watermark, there is none to calculate the available insert positions in a text.

TREND/watermarker/src/commonMain/kotlin/fileWatermarker/TextWatermarker.kt

Lines 438 to 442 in 752be7c

 /** Counts the minimum number of insert positions needed in a text to insert the [watermark] */ 

 fun getMinimumInsertPositions(watermark: Watermark): Int { 

 val separatedWatermark = getSeparatedWatermark(watermark) 

 return separatedWatermark.count() 

 }

Such a function would come in handy, especially when needing to calculate whether or how often a given watermark fits in a given text.

Proposed Solution

Add a function that calculates the available insert positions in a text to the TextWatermarker class. This function could look like this:

/** Counts the available number of insert positions in a [file] */
fun getAvailableInsertPositions(file: TextFile): Int {
    return placement(file.content).count()
}

Additional Context

Specific Text Breaks Watermark

When adding the text Hello World as a Watermark to the text

Test ads asd asd asd as dasmlkjl lk lklk j lkafdas fsdbfsdhf k kjh kjh hkjfhf kjhkj hdkjahsdkj hadkahd kjhaskjhd kjashfdhiu u hj h hahdkja kj kjh kjn nkashdkjhwkjhwhw wqe qw ejkds,m askjhd,mandhakjd asdhc,mxyncndsa da sd asd as  asd  sa  d  d d d d d d d d d

and extracting the watermark from the generated watermarked text

Test ads asd asd asd as dasmlkjl lk lklk j lkafdas fsdbfsdhf k kjh kjh hkjfhf kjhkj hdkjahsdkj hadkahd kjhaskjhd kjashfdhiu u hj h hahdkja kj kjh kjn nkashdkjhwkjhwhw wqe qw ejkds,m askjhd,mandhakjd asdhc,mxyncndsa da sd asd as  asd  sa  d  d d d d d d d d d

the watermark Hello Worl$ is shown.

The problem couldn't be reproduced with other texts or watermarks.

Display Warnings and Results in Webinterface

🚀 Feature Request

Current Problem

The webinterface only returns the successful watermarked text or an error/warning message. Cases exist where it is possible to extract a problematic watermark.

Proposed Solution

This should be changed so that the frontend can return a warning/error message and a watermarked text, not only one of them.

Additional Context

./.

Increase Webinterface Useability for Newcomers

🚀 Feature Request

Current Problem

When first using the webinterface, it is hard for newcomers to understand how it works. Sometimes, people wonder why using the "Add Watermark" button is impossible. Further, it is hard to understand the percentage slider.

Proposed Solution

Different aspects can be implemented to improve the overall usability (incl. some styling adjustments):

Add some small (i) Icons with additional information on mouseover
Improve the introduction text that describes the tool
Add different hints depending on current typed-in information (for example, a hint like: "The watermark is too long for the short cover text. Please try to increase the cover text length or decrease the length of the watermark so that it fits in.")
Add more details (description) in the successful dialog
Add a "Copy to clipboard" icon to the successful dialog
Rename the "OK" Button to "Close"
Add the TREND logo to the header
Disable the "File" tab
Change the tab icon for the watermark insertion

Additional Context

./.

Include Watermark in all Files of a ZIP Archive

🚀 Feature Request

Current Problem

The ZipWatermarker currently adds the watermark directly inside the .zip file (the archive itself). After a user extracts the archive, the watermark is removed.

Proposed Solution

To solve this problem, the watermarker library should check all files inside the ZIP archive and include the watermark in all files that have a supported file type.

Additional Context

./.

	/** Counts the minimum number of insert positions needed in a text to insert the [watermark] */
	fun getMinimumInsertPositions(watermark: Watermark): Int {
	val separatedWatermark = getSeparatedWatermark(watermark)
	return separatedWatermark.count()
	}

fraunhoferisst / trend Goto Github PK

trend's People

Contributors

Stargazers

Watchers

Forkers

trend's Issues

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🐞 Bug Report

Describe the Bug

To Reproduce

Expected Behavior

System Information

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

Why choose the StartEndSeparatorChars strategy at all?

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🐞 Bug Report

Describe the Bug

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Documentation Framework

Pipeline

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

🐞 Bug Report

Describe the Bug

To Reproduce

Expected Behavior

System Information

Additional Context

Current Problem

Proposed Solution

Additional Context

🚀 Feature Request

Current Problem

Proposed Solution

Additional Context

Why choose the `StartEndSeparatorChars` strategy at all?