Git Product home page Git Product logo

docx's Introduction

docx

Gem Version Ruby Coverage Status Gitter

A ruby library/gem for interacting with .docx files. currently capabilities include reading paragraphs/bookmarks, inserting text at bookmarks, reading tables/rows/columns/cells and saving the document.

Usage

Prerequisites

  • Ruby 2.6 or later

Install

Add the following line to your application's Gemfile:

gem 'docx'

And then execute:

bundle install

Or install it yourself as:

gem install docx

Reading

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('example.docx')

# Retrieve and display paragraphs
doc.paragraphs.each do |p|
  puts p
end

# Retrieve and display bookmarks, returned as hash with bookmark names as keys and objects as values
doc.bookmarks.each_pair do |bookmark_name, bookmark_object|
  puts bookmark_name
end

Don't have a local file but a buffer? Docx handles those to:

require 'docx'

# Create a Docx::Document object from a remote file
doc = Docx::Document.open(buffer)

# Everything about reading is the same as shown above

Rendering html

require 'docx'

# Retrieve and display paragraphs as html
doc = Docx::Document.open('example.docx')
doc.paragraphs.each do |p|
  puts p.to_html
end

Reading tables

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('tables.docx')

first_table = doc.tables[0]
puts first_table.row_count
puts first_table.column_count
puts first_table.rows[0].cells[0].text
puts first_table.columns[0].cells[0].text

# Iterate through tables
doc.tables.each do |table|
  table.rows.each do |row| # Row-based iteration
    row.cells.each do |cell|
      puts cell.text
    end
  end

  table.columns.each do |column| # Column-based iteration
    column.cells.each do |cell|
      puts cell.text
    end
  end
end

Writing

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('example.docx')

# Insert a single line of text after one of our bookmarks
doc.bookmarks['example_bookmark'].insert_text_after("Hello world.")

# Insert multiple lines of text at our bookmark
doc.bookmarks['example_bookmark_2'].insert_multiple_lines_after(['Hello', 'World', 'foo'])

# Remove paragraphs
doc.paragraphs.each do |p|
  p.remove! if p.to_s =~ /TODO/
end

# Substitute text, preserving formatting
doc.paragraphs.each do |p|
  p.each_text_run do |tr|
    tr.substitute('_placeholder_', 'replacement value')
  end
end

# Save document to specified path
doc.save('example-edited.docx')

Writing to tables

require 'docx'

# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('tables.docx')

# Iterate over each table
doc.tables.each do |table|
  last_row = table.rows.last
  
  # Copy last row and insert a new one before last row
  new_row = last_row.copy
  new_row.insert_before(last_row)

  # Substitute text in each cell of this new row
  new_row.cells.each do |cell|
    cell.paragraphs.each do |paragraph|
      paragraph.each_text_run do |text|
        text.substitute('_placeholder_', 'replacement value')
      end
    end
  end
end

doc.save('tables-edited.docx')

Advanced

require 'docx'

d = Docx::Document.open('example.docx')

# The Nokogiri::XML::Node on which an element is based can be accessed using #node
d.paragraphs.each do |p|
  puts p.node.inspect
end

# The #xpath and #at_xpath methods are delegated to the node from the element, saving a step
p_element = d.paragraphs.first
p_children = p_element.xpath("//child::*") # selects all children
p_child = p_element.at_xpath("//child::*") # selects first child

Writing and Manipulating Styles

require 'docx'

d = Docx::Document.open('example.docx')
existing_style = d.styles_configuration.style_of("Heading 1")
existing_style.font_color = "000000"

# see attributes below
new_style = d.styles_configuration.add_style("Red", name: "Red", font_color: "FF0000", font_size: 20)
new_style.bold = true

d.paragraphs.each do |p|
  p.style = "Red"
end

d.paragraphs.each do |p|
  p.style = "Heading 1"
end

d.styles_configuration.remove_style("Red")

Style Attributes

The following is a list of attributes and what they control within the style.

  • id: The unique identifier of the style. (required)
  • name: The human-readable name of the style. (required)
  • type: Indicates the type of the style (e.g., paragraph, character).
  • keep_next: Boolean value controlling whether to keep a paragraph and the next one on the same page. Valid values: true/false.
  • keep_lines: Boolean value specifying whether to keep all lines of a paragraph together on one page. Valid values: true/false.
  • page_break_before: Boolean value indicating whether to insert a page break before the paragraph. Valid values: true/false.
  • widow_control: Boolean value controlling widow and orphan lines in a paragraph. Valid values: true/false.
  • shading_style: Defines the shading pattern style.
  • shading_color: Specifies the color of the shading pattern. Valid values: Hex color codes.
  • shading_fill: Indicates the background fill color of shading.
  • suppress_auto_hyphens: Boolean value controlling automatic hyphenation. Valid values: true/false.
  • bidirectional_text: Boolean value indicating if the paragraph contains bidirectional text. Valid values: true/false.
  • spacing_before: Defines the spacing before a paragraph.
  • spacing_after: Specifies the spacing after a paragraph.
  • line_spacing: Indicates the line spacing of a paragraph.
  • line_rule: Defines how line spacing is calculated.
  • indent_left: Sets the left indentation of a paragraph.
  • indent_right: Specifies the right indentation of a paragraph.
  • indent_first_line: Indicates the first line indentation of a paragraph.
  • align: Controls the text alignment within a paragraph.
  • font: Sets the font for different scripts (ASCII, complex script, East Asian, etc.).
  • font_ascii: Specifies the font for ASCII characters.
  • font_cs: Indicates the font for complex script characters.
  • font_hAnsi: Sets the font for high ANSI characters.
  • font_eastAsia: Specifies the font for East Asian characters.
  • bold: Boolean value controlling bold formatting. Valid values: true/false.
  • italic: Boolean value indicating italic formatting. Valid values: true/false.
  • caps: Boolean value controlling capitalization. Valid values: true/false.
  • small_caps: Boolean value specifying small capital letters. Valid values: true/false.
  • strike: Boolean value indicating strikethrough formatting. Valid values: true/false.
  • double_strike: Boolean value defining double strikethrough formatting. Valid values: true/false.
  • outline: Boolean value specifying outline effects. Valid values: true/false.
  • outline_level: Indicates the outline level in a document's hierarchy.
  • font_color: Sets the text color. Valid values: Hex color codes.
  • font_size: Controls the font size.
  • font_size_cs: Specifies the font size for complex script characters.
  • underline_style: Indicates the style of underlining.
  • underline_color: Specifies the color of the underline. Valid values: Hex color codes.
  • spacing: Controls character spacing.
  • kerning: Sets the space between characters.
  • position: Controls the position of characters (superscript/subscript).
  • text_fill_color: Sets the fill color of text. Valid values: Hex color codes.
  • vertical_alignment: Controls the vertical alignment of text within a line.
  • lang: Specifies the language tag for the text.

Development

todo

  • Calculate element formatting based on values present in element properties as well as properties inherited from parents
  • Default formatting of inserted elements to inherited values
  • Implement formattable elements.
  • Easier multi-line text insertion at a single bookmark (inserting paragraph nodes after the one containing the bookmark)

docx's People

Contributors

satoryu avatar chrahunt avatar bitops avatar mportiz08 avatar rickychilcott avatar visoft avatar higginsdragon avatar manovasanth1227 avatar aforward avatar bishopandco avatar artembaikuzin avatar abartov avatar joelbarker2011 avatar mvz avatar neilbartley avatar pjsk-stripe avatar tmikoss avatar yannp avatar cichaczem avatar oslivan avatar qelphybox avatar wakematta avatar nathanvda avatar ahorek avatar petergoldstein avatar gopeter avatar fig avatar misdoro avatar mohanapriya2308 avatar supraja-trd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.