Git Product home page Git Product logo

js-word's Introduction

Parser and writer for various word processing doc formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and maximal browser compatibility.

Test Files

Test files should be placed in the test_files directory, in the appropriate subdirectory for the filetype. For example, DOCX files should be placed in test_files\docx\wordjs and RTF files should be in test_files\rtf\wordjs.

Every test file should be accompanied by a plain text .txt representation whose filename is the original filename appended with .txt. For example, the DOCX file test_files\docx\wordjs\foo.docx pairs with the plain text file test_files\docx\wordjs\foo.docx.txt

Generating Baselines using Word for Windows

  1. Ensure you have PowerShell version 7.0 or greater
  2. Run Set-ExecutionPolicy RemoteSigned OR Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass in Powershell (PS) Admin 7.0
  3. Have the PS script in the root of the repo
  4. Run .\generate_txt.ps1 .\test_files\EXT_TYPE\FOLDER (ex. .\generate_txt.ps1 .\test_files\docx\apachepoi)

On first run, if a test file does not have an accompanying .txt file, the script will open Word and save the file as plaintext. Word will rapidly open and close during this process.

The script will not attempt to open Word or try to generate .txt files if they already exist. After a clean run, Word should not open on future runs.

The script will halt for documents that are broken in certain ways. Word will display a prompt, stalling the automated process. Those documents can be skipped by creating a .skip file as described below.

Skipping Files

The script will look for files with the .skip extension and skip processing the base file. For example, if test_files\docx\wordjs\Hello.docx.skip exists, the script will not attempt to process test_files\docx\wordjs\Hello.docx

When the UI blocks (for example, on a VBA error with ThisDocument), the corresponding .skip file should be created manually. The script merely tests if the file exists, so the content is immaterial and a single letter suffices.

Generating .skip files

The script will attempt to open password-protected documents using the password "WordJS". The script will not halt but it will not generate a text file. Instead, an output would be written to terminal indicating a skip and will generate a .skip when encountered.

License

Please consult the attached LICENSE file for details. All rights not explicitly granted by the Apache 2.0 License are reserved by the Original Author.

References

OSP-covered Specifications (click to show)
  • MS-CFB: Compound File Binary File Format
  • MS-DOC: Word (.doc) Binary File Format
  • RTF: Rich Text Format
  • ISO/IEC 29500:2012(E) "Information technology โ€” Document description and processing languages โ€” Office Open XML File Formats"
  • Open Document Format for Office Applications Version 1.3 (25 December 2019)

Analytics

js-word's People

Contributors

barronwei avatar garrettluu avatar janiewang26 avatar mohammedsahl avatar penguingovernor avatar sheetjsdev avatar srijonsaha avatar wlawt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

js-word's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.