Git Product home page Git Product logo

isodoc's Introduction

isodoc: Processor to generate HTML/Word from Metanorma XML

Gem Version Build Status Code Climate Pull Requests Commits since latest

Purpose

This Gem converts documents in the Metanorma document model into HTML and Microsoft Word.

Usage

The Gem contains the subclasses Iso::HtmlWordConvert (for HTML output) and IsoDoc::WordConvert (for Word output). They are initialised with the following rendering parameters:

i18nyaml

YAML file giving internationalisation equivalents for keywords in rendering output; see https://github.com/metanorma/metanorma-iso#document-attributes for further documentation

bodyfont

Font for body text

headerfont

Font for header text

monospacefont

Font for monospace text

titlefont

Font for document title text (currently used only in GB)

script

The ISO 15924 code for the main script that the standard document is in; used to pick the default fonts for the document

alt

Generate alternate rendering (currently used only in ISO)

compliance

Generate alternate rendering (currently used only in GB)

htmlstylesheet

Stylesheet for HTML output

htmlcoverpage

Cover page for HTML output

htmlintropage

Introductory page for HTML output

scripts

Scripts page for HTML output

scripts-pdf

Scripts page for HTML > PDF output

wordstylesheet

Stylesheet for Word output

standardstylesheet

Secondary stylesheet for Word output

header

Header file for Word output

wordcoverpage

Cover page for Word output

wordintropage

Introductory page for Word output

ulstyle

Style identifier in Word stylesheet for unordered lists

olstyle

Style identifier in Word stylesheet for ordered list

suppressheadingnumbers

Suppress heading numbers for clauses (does not apply to annexes)

The IsoDoc gem classes themselves are abstract (though their current implementation contains rendering specific to the ISO standard.) Subclasses of the Isodoc gem classes are specific to different standards, and are associated with templates and stylesheets speciific to the rendering of those standards. Subclasses also provide the default values for the rendering parameters above; they should be used only as overrides.

e.g.

IsoDoc::Convert::Iso.new(
  bodyfont: "Zapf Chancery",
  headerfont: "Comic Sans",
  monospacefont: "Andale Mono",
  alt: true,
  script: "Hans",
  i18nyaml: "i18n-en.yaml"
)

The conversion takes place with a convert method, with three arguments: the filename to be used for the output (once its file type suffix is stripped), the XML document string to be converted (optional), and a "debug" argument (optional), which stops execution before the output file is generated. If the document string is nil, its contents are read in from the filename provided. So:

# generates test.html
IsoDoc::Iso::HtmlConvert.new({}).convert("test.xml")

# generates test.doc, with Chinese font defaults rather than Roman
IsoDoc::Iso::WordConvert.new({script: "Hans"}).convert("test.xml")

# generates test.html, based on file1.xml
IsoDoc::Iso::HtmlConvert.new({}).convert("test", File.read("file1.xml"))

# generates HTML output for the given input string, but does not save it to disk.
IsoDoc::Iso::HtmlConvert.new({}).convert("test", <<~"INPUT", true)
  <iso-standard xmlns="http://riboseinc.com/isoxml">
    <preface><foreword>
    <note>
      <p id="_f06fd0d1-a203-4f3d-a515-0bdba0f8d83f">These results are based on a
      study carried out on three different types of kernel.</p>
    </note>
    </foreword></preface>
  </iso-standard>
  INPUT
Note
In the HTML stylesheets specific to standards, the Cover page and Intro page must be XHTML fragments, not HTML fragments. In particular, unlike Word HTML, all HTML attributes need to be quoted: <p class="MsoToc2">, not <p class=MsoToc2>.

Converting Word output into “Native Word” (.docx)

This gem relies on html2doc to generate Microsoft Word documents.

Please see this post-processing procedure to convert output into a native-docx document.

isodoc's People

Contributors

ahmohsen46 avatar alexeymorozov avatar andrew2net avatar camobap avatar intelligent2013 avatar maxirmx avatar opoudjis avatar ronaldtse avatar strogonoff avatar w00lf avatar zoras avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

isodoc's Issues

Dynamic ToC for Word

Generate the headings dynamically for the ToC of the Word document; the page numbers will still need to be frozen.

Sort symbols & abbreviations

ISO/IEC DIR 2 p. 52

Criteria:

  • Upper case Latin then lower case Latin
  • Letter without indices, letter with letter index, letter with numeric index
  • Latin then Greek then Other

Internationalise

Need output to deal with at least English, French, Chinese, and Russian. The metalanguage needs to be translated.

Dynamic TOC for IsoDoc HTML

Given that IsoDoc contains the sections proper in the generation of IsoDoc HTML we can easily insert the TOC with proper anchors. This way we can get rid of the jQuery code in it.

Separate HTML styling

Right now the .out.html output is the basic Microsoft-ready HTML file. We need to take out the Microsoft-specific features (TOC becomes generic Asciidoc TOC), and introduce cogent HTML styling

Left align, right align OOMML

Stem expressions in Word are center aligned. To make them left or right aligned, need to wrap them in isodoc in

<m:oMathPara><m:oMathParaPr><m:jc
     m:val="left"/></m:oMathParaPr>
<m:oMath>
....
</m:oMath>
<m:oMathPara>

Use templating language instead of `gsub`

HTML templates are currently "filled-in" within isodoc using gsub. If the content included a phrase like For CSS we should use SCSS templates as you can see in asciidoctor-reveal.js (a key that is being replaced), then the templating won't work properly.

We should use template engines to interpolate the HTML instead.

For example, asciidoctor-reveal.js uses the Slim template language. Personally, I think we can use the Liquid template language to create HTML templates.

This way the content of the templates can be easily changed.

Formula referencing

Formulae are referenced by container and formula number; e.g. "see A.2, Formula (A.5)."

For all container referencing, the container is skipped if we are currently in that container: if we are in A.2, we just reference "Formula (A.5)."

Ensuring references follow Directives Part 2

@opoudjis you probably have gone through this but ideally we should document in the README how the references are being rendered.

In Directives Part 2, https://www.iso.org/sites/directives/2016/part2/index.xhtml different referential targets have a different way of rendering:

22.4 Referencing
Clauses and subclauses need not be specifically referred to in the text.
Use, for example, the following forms for references to clauses and subclauses:
• “in accordance with Clause 4”;
• “details as given in 4.1.1”;
• “the requirements given in B.2”;
• “the methods described in 5.3 provide further information on...”.

23.4 Referencing
The purpose of a list should be made clear by its context. For example, an introductory proposition or a subclause title can serve to introduce the list. Lists need not be specifically referred to in the text.
If cross-references to list items are necessary, a numbered list shall be used. Within a subdivision, each list item in a numbered list shall have a unique identifier. Numbering restarts at each new clause or subclause.
Use, for example, the following forms for references to lists:
• “as specified in 3.1 b)”;
• “the requirements given in B.2 c)”.

24.4 Referencing
Notes need not be specifically referred to in the text.
If notes are referred to, use for example, the following forms for references:
• “an explanation is provided in 7.1, Note 2”;
• “see 8.6, Note 3”.

25.4 Referencing
Examples need not be specifically referred to in the text.
If examples are referred to, use for example, the following forms for references:
• “see 6.6.3, Example 5”;
• “Clause 4, Example 2 lists …”.

27.4 Referencing
If a formula is numbered, it should be referred to in the text. The purpose of a formula should be made clear by its context, for example, with an introductory proposition.
Use, for example, the following forms for references to mathematical formulae:
• “see 10.1, Formula (3)”;
• “see A.2, Formula (A.5)”.

28.4 Referencing
Each figure shall be explicitly referred to within the text.
Use, for example, the following forms for references to figures and subfigures:
• “Figure 3 illustrates…”;
• “See Figure 6 b)”.

29.4 Referencing
Each table shall be explicitly referred to within the text.
Use, for example, the following forms for references to tables:
• “Table 3 lists…”;
• “See Table B.1”.

Note numbering & referencing

Outside of termnotes, all notes shall be numbered, but only if there is more than one in their container.

All notes are referenced by their container and Note number.

Example numbering, rendering, referencing

  1. Examples are numbered sequentially within a container, but not if they are the only example in their container.

  2. Examples are referenced by container and example number.

  3. Examples are rendered as boxed text.

List item crossreferencing

23.4 Referencing
The purpose of a list should be made clear by its context. For example, an introductory proposition or a subclause title can serve to introduce the list. Lists need not be specifically referred to in the text.
If cross-references to list items are necessary, a numbered list shall be used. Within a subdivision, each list item in a numbered list shall have a unique identifier. Numbering restarts at each new clause or subclause.
Use, for example, the following forms for references to lists:
• “as specified in 3.1 b)”;
• “the requirements given in B.2 c)”.

Terms: title on next line

The term being defined in asciidoctor-iso (and gb) does not appear in the title of the section: the term number is on its own as a title, and the term being defined is a separate boldface para, indented to line up with body text.

This gem (`isodoc`) should be used as the basis for building an IsoDoc

Currently a lot of XML processing happens in the asciidoctor-iso gem.

However, this gem (isodoc) should be used as the central place to create and edit isodoc files.

  • The asciidoctor-iso gem should use the isodoc gem to build the ISOXML.
  • The isoxml-html gem should use the isodoc gem to read into the ISOXML, and transform it into HTML.

For example, code like (asciidoctor/iso/blocks.rb):

      def stem(node)
        # NOTE: html escaping is performed by Nokogiri
        stem_content = node.lines.join("\n")

        noko do |xml|
          xml.formula **id_attr(node) do |s|
            s.stem stem_content, **{ type: "AsciiMath" }
            style(node, stem_content)
          end
        end
      end

should be:

      def stem(node, isodoc_current_node)
        # NOTE: html escaping is performed by Nokogiri
        stem_content = node.lines.join("\n")

        isodoc_current_node.stem(
          content: stem_content,
          type: "AsciiMath"
        )
      end

Update according to isodoc changes

  • isodoc title is now isolocalizedtitle (gbdoc title is inherited)
  • bibliography now takes multiple dates in one array
  • textelement is now a localizedstring

Rice document: Table footnotes

Move table footnotes inside of table, and do not treat them as footnotes but as notes within table, with letter references rather than numbers. HTML2Doc needs to ignore these footnotes.

Figure footnotes will do the same.

Refactor isodoc to isolate HTML and Word HTML generation

Have Generic HTML generation as one class, and have Word HTML class inherit from it. Will undo a lot of the transformation functionality in postprocessing.rb and html.rb .

In the process, strip out any Word-specific HTML from the Generic HTML branch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.