Git Product home page Git Product logo

odt.js's Introduction

odt.js

odt.js is a Javascript library to convert odt to html and back.

Limitations

Currently, setHTML only supports html as returned by getHTML, maybe with minor modifications such as changing text. Currently, this means setHTML will throw when given arbitrary html.

Currently, odt.js depends on the browser's XML parser, DOM parser and DOM serializer. If you want to use odt.js on the server, one way forward is to modify it to add support for pure javascript parsers. Keep in mind that for strictness parity with the in-browser parser, you need a DOM parser which breaks up the <p> in <p><div></div></p>.

Unsupported odt features:

  • Features on which getHTML throws
    • Any encoding other than utf-8
    • Annotations
    • Tracked changes
    • Charts
  • Features on which getHTML doesn't throw
    • Various types of images
    • Non-"manual" styles
    • Unordered lists
    • List styles (bullets,)
    • Strikethrough
    • Underlined text nested inside non-underlined text
    • Underline-color
    • Loads of other styles

And much more.

Strictness

Warning: currently, the following goes only within a single version of odt.js. So you can't getHTML, store the html, setHTML with another version of odt.js and expect a correct odt.

getHTML throws when otherwise a getHTML -> set html -> get html -> setHTML roundtrip would not produce the original odt file (barring xml encoding and zip file changes). This is the case for most unsupported odt features, except unsupported text styles.

getHTMLUnsafe probably won't produce html that's totally useless, since browsers are very forgiving. It might produce html which does not accurately represent the odt, though getHTML will also do that for unsupported styles.

setHTML throws when otherwise a setHTML -> getHTML roundtrip would not produce the original html (barring style and html encoding changes). This is the case for most unsupported html features, except unsupported text styles. The hope is that it means the resulting odt is not broken, but it's no guarantee.

setHTMLUnsafe might produce completely broken odt files.

Usage

<script src="jszip.js"></script>
<script src="lib/odt.js"></script>

odt2html

var html;
try {
	html = new ODTDocument(odt).getHTML();
} catch(e) {
	alert("Couldn't parse odt file.");
	throw e;
}

If you definitely want html while caring less about whether or not it is correct:

var html = new ODTDocument(odt).getHTMLUnsafe();

If you want fallback html:

var odtdoc = new ODTDocument(odt);
var html = odtdoc.getHTMLUnsafe();
try {
	html = odtdoc.getHTML();
} catch(e) {
	console.error('html is probably broken');
}

html2odt

As mentioned in Limitations, this example is currently non-functional for arbitrary html.

var req = new XMLHttpRequest();
req.open('GET', 'res/empty.odt');
req.responseType = 'arraybuffer';
req.addEventListener('load', function() {
	var empty = req.response;
	
	var odtdoc = new ODTDocument(empty);
	try {
		odtdoc.setHTML(html);
	} catch(e) {
		alert("Couldn't generate odt document.");
		throw e;
	}
	var odt = odtdoc.getODT();
});
req.send();

If you definitely want odt while caring less about whether or not it is a valid odt file:

	var odtdoc = new ODTDocument(empty);
	odtdoc.setHTMLUnsafe(html);
	var odt = odtdoc.getODT();

If you want a fallback odt:

	var odtdoc = new ODTDocument(empty);
	try {
		odtdoc.setHTML(html);
	} catch(e) {
		odtdoc.setHTMLUnsafe(html);
		console.error('odt is probably broken');
	}
	var odt = odtdoc.getODT();

Simple odt editor:

var iframe = document.createElement('iframe');
var odtdoc = new ODTDocument(odt);
var html = odtdoc.getHTMLUnsafe();
try {
	html = odtdoc.getHTML();
} finally {
	iframe.contentDocument.write(html);
	iframe.contentDocument.close();
}
iframe.contentDocument.documentElement.addEventListener('input', function save() {
	try {
		odtdoc.setHTML(iframe.contentDocument.documentElement.outerHTML);
	} catch(e) {
		alert("Generating ODT file failed.");
		throw e;
	}
	odt = odtdoc.getODT();
});

Documentation

ODTDocument

new ODTDocument(String|ArrayBuffer|Uint8Array|Buffer odt[, Object options]) -> ODTDocument | Error

Initialize an ODTDocument.

For arguments and errors, see the JSZip documentation.

ODTDocument#getHTML

ODTDocument#getHTML() -> html | TypeError | Error

Convert the odt document to html.

Throws TypeError if JSZip or DOMParser is undefined or if DOMParser does not support parsing text/xml and text/html.

Throws Error if the odt uses unsupported features (it doesn't throw on unsupported text styles, though). For more details, see Strictness above.

ODTDocument#getHTMLUnsafe

ODTDocument#getHTMLUnsafe() -> html | TypeError

Throws TypeError if JSZip or DOMParser is undefined or if DOMParser does not support parsing text/xml and text/html.

ODTDocument#setHTML

ODTDocument#setHTML(String html) -> undefined | TypeError | Error

Throws TypeError if DOMParser is undefined or if DOMParser does not support parsing text/xml.

Throws Error if the html uses unsupported features. For more details, see Strictness above.

ODTDocument#setHTMLUnsafe

ODTDocument#setHTMLUnsafe(String html) -> undefined | TypeError

Throws TypeError if DOMParser is undefined or if DOMParser does not support parsing text/xml.

ODTDocument#getODT

ODTDocument#getODT([Object options]) -> String|ArrayBuffer|Uint8Array|Buffer | Error

Generate an odt file from the ODTDocument.

For options and errors, see the JSZip documentation.

Contributing

Tips

odt.js is very strict towards its own code (except the Unsafe functions, that is). getHTML throws when a odt 2 html 2 odt roundtrip doesn't produce exactly the same odt file, and also when it produced invalid html (it's not as strict about the latter, though, e.g. it throws when you produce one <p> inside another).

One way to go about adding features to odt.js is to use getHTMLUnsafe and iterate until that generates something sane.

Another way is to use getHTML and set a breakpoint on the line that throws, diff (using a word-granular diff tool) the two things that were different, and work backwards from there.

It also helps to decide in advance on a strategy for producing html from which the original odt can be derived losslessly.

Guidelines

Keep in mind that the html produced should be useful on both screen and print media.

Please follow the code style of surrounding code, so single quotes unless the string contains a single quote, if( instead of if (, etc.

If you want to modify odt.js for use outside the browser, see tips in Limitations.

odt.js's People

Contributors

twiss avatar ferndot avatar

Stargazers

Hero Protagonist avatar Smooth E avatar  avatar  avatar Steeve Payraudeau avatar jaideraf avatar HsuChing(Hyman) avatar Andres Saa avatar William Palin avatar Cody Taylor avatar Andrew Armbruster avatar Masiro avatar heineiuo avatar Perry Werneck avatar Fedor avatar Paul Harris avatar Pablo Gargallo avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

odt.js's Issues

Broaden ODT support

Summary

ODT.js is growing in its support for the ODT format, but it is still incomplete.

We would love it if you could implement any of the listed missing features, or anything else that is missing.

I am here if you have any questions ๐Ÿ˜„

Missing Features

Features on which getHTML throws

  • Any encoding other than utf-8
  • Annotations
  • Tracked changes
  • Charts

Features on which getHTML doesn't throw

  • Various types of images
  • Non-"manual" styles
  • Unordered lists
  • List styles (bullets,)
  • Strikethrough
  • Underlined text nested inside non-underlined text
  • Underline-color
  • Loads of other styles

JSZIP VERSION?

Hello

which version of JSZIP am I supposed to use? None of them seems to work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.