Git Product home page Git Product logo

docx's Introduction

DocX

Swift

A small framework that converts NSAttributedString to .docx Word files on iOS and macOS.

Motivation

On iOS, NSAttributedString.DocumentType supports only HTML and Rich Text, while on macOS .doc and .docx are available options. Even then the .docx exporter on macOS supports only a subset of the attributes of NSAttributedString.

This library is used in SimpleFurigana for macOS and SimpleFurigana for iOS, hence the focus on furigana annotation export.

Installation

Swift Package Manager

Add

.package(name: "DocX", url: "https://github.com/shinjukunian/DocX.git", .branch("master"))

to dependencies in your Package.swift file. This requires Swift 5.3, which shipped with Xcode12. Alternatively, add DocX in Xcode via File->Swift Packages->Add Package Dependency, paste https://github.com/shinjukunian/DocX.git as URL and specify master as branch.

CocoaPods

Add

pod 'DocX-Swift'

to your Podfile.

Usage

let string = NSAttributedString(string: "This is a string", attributes: [.font: UIFont.systemFont(ofSize: UIFont.systemFontSize), .backgroundColor: UIColor.blue])
let url = URL(fileURLWithPath:"myPath")
try? string.writeDocX(to: url)

Starting from iOS 15 / macOS 12, you can use the new AttributedString.

var att=AttributedString("Lorem ipsum dolor sit amet")
att.font = NSFont(name: "Helvetica", size: 12)
att[att.range(of: "Lorem")!].backgroundColor = .blue
let url = URL(fileURLWithPath:"myPath")
try att.writeDocX(to: url)

Naturally, this works for Markdown as well:

let mD="~~This~~ is a **Markdown** *string*."
let att=try AttributedString(markdown: mD)
try att.writeDocX(to: url)

DocXOptions allow the customization of DocX output.

  • you can optionally specify metadata using DocXOptions:
let font = NSFont(name: "Helvetica", size: 13)! //on macOS
let string = NSAttributedString(string: "The Foundation For Law and Government favours Helvetica.", attributes: [.font: font])

var options = DocXOptions()
options.author = "Michael Knight"
options.title = "Helvetica Document"

let url = URL(fileURLWithPath:"myPath")
try string.writeDocX(to: url, options:options)
  • you can specify character and paragraph styling based on a style document using the NSAttributedString.Key.characterStyleId and NSAttributedString.Key.paragraphStyleId attributes. Use DocXStyleConfiguration to specify the style document.

  • you can use DocXStyleConfiguration to specify that Word should use standard fonts instead of explicitly specified font names. This is useful for cross-platform compatibility when using Apple system fonts. Other font attributes (size, bold / italic) will be preserved if possible.

  • you can specify a page size using .pageDefinition. Page definitions consist of a paper size and margins to determine the printable area. If no page definition is specified, Word will fall back to useful defaults based on the current system.

let A4 = PageDefinition(pageSize: .A4) // an A4 page with defaults margins
let square = PageDefinition(pageSize: .init(width: Measurement(value: 10, unit: .centimeters), height: Measurement(value: 10, unit: .centimeters))) // a custom square page size with default (72 pt) margins)
let custom = PageDefinition(pageSize: .init(width: .init(value: 30, unit: .centimeters), height: .init(value: 20, unit: .centimeters)), pageMargins: .init(top: .init(value: 1, unit: .centimeters), bottom: .init(value: 1, unit: .centimeters), left: .init(value: 1, unit: .centimeters), right: .init(value: 1, unit: .centimeters))) // a page with custom paper and custom margins

See the attached sample projects (for iOS and macOS) for usage and limitations. On iOS, DocX also includes a UIActivityItemProvider subclass (DocXActivityItemProvider) for exporting .docx files through UIActivityViewController.

NSAttributedString has no concept of pagination. For manual pagination, use

try DocXWriter.write(pages:[NSAttributedString], to url:URL)

to render each NSAttributedString as a separate page.

Screenshot macOS

A sample output on macOS opened in Word365.

Screenshot Lenna

A sample output on macOS with an embedded image (via NSTextAttachment). in the macOS sample application (which is a simple NSTextView), this can be achieved using drag&drop. Note that there is little control over the placement of the image, the image will be inline with text.

Screenshot iOS

A sample output on iOS opened in Word for iOS. Furigana annotations are preserved. The link is clickable. Please note that Quicklook (on both platforms) only supports a limited number of attributes.

Supported Attributes

  • most things in NSAttributedString.Key (fonts, colors, underline, indents etc.) except
    • NSAttributedString.Key.expansion
    • NSAttributedString.Key.kern
    • NSAttributedString.Key.ligature
    • NSAttributedString.Key.obliqueness
    • NSAttributedString.Key.superscript (macOS only, doesnt really work for most fonts anyway). Use NSAttributedString.Key.baselineOffset with a positive value for superscript and a negative value for subscript instead
    • NSAttributedString.Key.textEffect
  • CTRubyAnnotation for furigana (ruby) annotations in CoreText
  • NSTextAttachment embedded images (inline with text)

For AttributedString, DocX supports the attributes present in NSAttributedString, i.e. most attributes in AttributeScopes.AppKitAttributes or AttributeScopes.UIKitAttributes (see above for omissions). For AttributedStrings initialized from Markdown (AttributeScopes.FoundationAttributes), DocX supports links (AttributeScopes.FoundationAttributes.LinkAttribute), bold, italic, and strikethrough (AttributeScopes.FoundationAttributes.InlinePresentationIntentAttribute), and inline images (AttributeScopes.FoundationAttributes.ImageURLAttribute). Please note that images are not rendered by SwiftUI's Text.

Some attributes don't have a direct correspondence. For example NSAttributedString does (typically) not have the concept of a page size. The page size can be specified by creating a PageDefinition and specifying DocXOptions.pageDefinition.

Dependencies

Alternatives

References

Licence

MIT

docx's People

Contributors

andalman avatar shinjukunian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

docx's Issues

Long multi-page strings can take an *extremely* long time to write

I constructed a test case for a very long “book” that consists of 400 “chapters” of 5,000 words each. When I write it as a docx using DocXWriter.write(pages:), it takes a prohibitively long time: about 100 seconds on my computer (a 2021 MacBook Pro M1).

Here’s the test:

    func testLongBookString() throws {
        // Two paragraphs / 100 words
        let twoParagraphs =
"""
This property contains the space (measured in points) added at the end of the paragraph to separate it from the following paragraph. This value is always nonnegative. The space between paragraphs is determined by adding the previous paragraph’s paragraphSpacing and the current paragraph’s paragraphSpacingBefore.
Specifies the border displayed above a set of paragraphs which have the same set of paragraph border settings. Note that if the adjoining paragraph has identical border settings and a between border is specified, a single between border will be used instead of the bottom border for the first and a top border for the second.
"""
        // Create a "book" that consists of 400 chapters each with 5,000 words
        // (2 million words total).
        let chapterString = String(repeating: twoParagraphs, count: 50)
        let chapterAttributedString = NSAttributedString(string: chapterString)
        let chapters = Array(repeating: chapterAttributedString, count: 400)
        
        let url=self.tempURL.appendingPathComponent(UUID().uuidString + "_myDocument_\(chapterString.prefix(10))").appendingPathExtension("docx")
        
        try DocXWriter.write(pages: chapters, to: url)
        try validateDocX(url: url)
    }

Profiling revealed that significant time is spent bridging between NSString/String (and allocating Strings) in paragraphRanges.

I have a fix that reduces the time for this test from ~100 seconds to ~1.4s.

Suggestion: simple support for styles

I am working on an application that can import a docx file. On import, text that uses particular Word Styles will be imported in specific ways. For instance, text that uses the "Heading 1” style will be imported as a title.

I’d like to be able to export a docx file that looks exactly the same when it is re-imported into my application. To do this perfectly, I need the exported docx to have styles applied to some text.

I imagine that writeDocX(), could take an optional dictionary of Style Information, which would map an identifier (e.g. “Heading1Identifier”) to a dictionary of style information (e.g. [“name”: “Heading 1”, “kind”: “paragraphStyle”, “bold”: “true”]). Then, an attributed string could have an attribute (e.g. NSAttributedString.key.docXStyleIdentifier) that contained the style identifier value (e.g. “Heading1Identifier”). On export, text with that attribute would have the appropriate style applied.

This may be outside the scope of this project, but I figured it couldn’t hurt to submit it.

Suggestion: set image extent so that it fits on a page

When I create a docx, I’d like for any images to fit on a single page. I realize I can resize the image on my end, but that isn’t ideal as I’d like the original image to remain untouched.

I’ve attached two docx files that illustrate the issue:
Under-the-Wave-Created-DocX.docx
Under-the-Wave-Created-Word.docx

The first docx file was created using DocX, and you can see that my image, since it is large, is pushed to the next page and also clipped. When I add the original image into Word, you can see that it is resized to fit the page. (Word also – annoyingly – resizes the original image. However, when I unzip the docx, replace the resized image with my original image, then re-created the docx file, Word still displays the image correctly.)

I’ve only looked quickly, but it seems to me that this is controlled by the wp:extent element. In the DocX package, it looks like the extent is determined by the size of the image. This seems like a great fallback. It would be nice, though, if the image was scaled to fit when the image’s width/height is greater than the page’s usable width/height (i.e. not including margins).

Bonus suggestion #1: It would also be nice if there were a way to set the image’s display width more directly. Perhaps NSTextAttachment could be extended to have a “docxExportWidthFraction” attribute. If that were set to “0.5”, then the display width would be set to 50% of the page’s usable width.

Bonus suggestion #2: When the “docxExportWidthFraction” is set, it might be nice to center the image horizontally. (Though I wouldn’t complain if images were just always centered, rather than left-aligned.)

Multi-page support?

Hello @shinjukunian,

Thank you for this beautiful implementation. Actually I tried to convert a NSMutableAttributed string which has instead other 2 attributed string but I get the .docx document for the first AttributedString.

I did like:

let rootAttributedString = NSMutableAttributedString()
rootAttributedString.append(NSAttributedString(string: "blah blah blah 1 ... but more text")
rootAttributedString.append(NSAttributedString(string: "blah blah blah 2 ... more text here also")

and then
rootAttributedString.writeDocX(to: myURL)

but the output was the a .docx file with "blah blah blah 1 ... but more text" only.

Any ideas how to fix this?

B. Regards,
John

Other ways to add DocX to the project

Currently, Only Package is the only way how to install this library. However, it's not always possible to use Packages in large projects due to some reasons. Can you add either manual installation or adding Cocoa Pods option?

Setting the NSTextAttachment bounds is useless

        let image1Attachment = NSTextAttachment()
        image1Attachment.bounds = CGRect(x: 0, y: 0, width: 40 , height: 300)
        image1Attachment.image = image
        // wrap the attachment in its own attributed string so we can append it
        let image1String = NSAttributedString(attachment: image1Attachment)

        // add the NSTextAttachment wrapper to our full string, then add some more text.
        fullString.append(image1String)


      
        I have set the image display bounds , but generate docx display very big and beyond the words width then only dispaly 
        half of image

Pagination doesn’t work when there are empty paragraphs before an intended page break

I’m using DocXWriter.write(pages:to:options:) to write an array of attributed strings with page breaks in between, and noticed that a page break isn’t always inserted. In particular, when a string ends with multiple empty paragraphs in a row, the page break isn’t inserted.

If I trim the attributed strings so that they don’t end with any newline characters, everything behaves as expected. This is a fine workaround for me, but I still think this is a confusing bug.

You can repro by adding two empty lines to the end of the “string” in the testMultiPage() or testMultiPageWriter() tests. With that change, the generated docx file won’t contain any page breaks. I believe this is due to the “early out” in ParagraphElement’s buildRuns():
guard subString.length>0 else{return [AEXMLElement]()}

Suggestion: docx files should NOT display "Compatibility Mode" when opened in Word

When you open a DocX-written file in Word, you'll see "Compatibility Mode" in the title bar. And, if you use File > Save As to create a new version of your file, Word will warn you that "you are about to update the file format, which might result in layout changes."

Compatibility Mode is a file format that allows these docx files to be opened in Word 2010 or earlier. However, it also means that some post-2010 Word features can't be used (unless, of course, you update the file).

Seems like maintaining compatibility with Word 2010 isn't necessary, and it would be preferable for docx files to appear like other "modern" docx files when opened in Word.

I have a fix.

Can’t build and run tests in `master` branch

I think I probably broke this, but I can no longer build and run tests in the master branch. (I usually test DocX in the context of my own application, so it’s rare that I build this way these days).

Anyhow, it looks like this is because DocXStyleConfiguration.swift and styles.xml haven’t been added to the project file.

I have a fix.

Underlining doesn't work when color isn't also specified

If you have an attributed string and apply NSAttributedString.Key.underlineStyle to it, the generated docx will not include that underline. This is because DocX requires that foregroundColor must be set, otherwise the <w:u> element won't be output.

Since the w:color value is optional in the docx, I don't think this should be required. I have a fix ready that makes the color argument Optional in the underlineElement function.

That said, I'm not sure using the foregroundColor is correct either, since NSAttributedString.Key.underlineColor exists for the purpose. If existing code relies on this behavior, though, we could have DocX check for the underlineColor and, if that isn't found, then use the foregroundColor if it exists. Thoughts?

Replacing an existing file does not work

When trying to replace an existing docx file with one written by DocX, I get the following error:

“[filename].zip” couldn’t be copied to “Desktop” because an item with the same name already exists.
To save the file, either provide a different name, or move aside or delete the existing file, and try again.

The reason is because writeDocX_builtin() uses FileManager.default.copyItem() rather than replaceItemAt(). The writing function is all set up to use replaceItemAt() since it already creates the temp folder in a location that’s appropriate for the final URL.

(I will submit a pull request for this shortly)

Hello, there was an error using in the package.

When executing main.swift

import DocX
import SwiftUI

let string = NSAttributedString(string: "This is a string", attributes: [:])
let url = URL(fileURLWithPath:"/Users/lilun/Downloads")
try? string.writeDocX(to: url)

Thread 1:"-[NSConcreteAttributedString writeDocXTo:error:] unrecognized selector sent to instance 0x600000204f60"
2023-05-24 08:12:28.904180+0800 AddDocx[7329:105841] -[NSConcreteAttributedString writeDocXTo:error:]: unrecognized selector sent to instance 0x600000204f60
2023-05-24 08:12:28.904517+0800 AddDocx[7329:105841] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSConcreteAttributedString writeDocXTo:error:]: unrecognized selector sent to instance 0x600000204f60'
*** First throw call stack:
(
0 CoreFoundation 0x000000019599f154 __exceptionPreprocess + 176
1 libobjc.A.dylib 0x00000001954be4d4 objc_exception_throw + 60
2 CoreFoundation 0x0000000195a46110 -[NSObject(NSObject) __retain_OA] + 0
3 CoreFoundation 0x00000001959070a0 forwarding + 1600
4 CoreFoundation 0x00000001959069a0 _CF_forwarding_prep_0 + 96
5 AddDocx 0x0000000100005028 main + 604
6 dyld 0x00000001954eff28 start + 2236
)
libc++abi: terminating due to uncaught exception of type NSException
(lldb)

Where is the problem? Novice learning, the foundation is not yet solid, troublesome! Thank you.

Question for use of code...

Thanks very much for the help closing my issue. I tried it and it works now. I was wondering if I could get your opinion on what I am trying to do. I was looking for a way to generate a word doc in code. I have to let a user enter lots and lots of data and then I store it in core data and then they want to tap a button and then my app should take all of that data plus some images and generate a word doc that they can then export from the app to one drive and edit it more on a laptop. I am trying to see if this library would help me in creating a word doc for them in code using the data they have entered. They do not need to edit anything directly in the app just enter the data and then the app generates word doc. The word doc again would have text and pictures and tables and will be very large. Do you think you library would be helpful in doing that?

Docx is invalid if the written attributed string contains an ampersand (&) or less-than (<) character

This is easy to repro using these simple tests:

    func testAmpersand() {
        let string="Key & Peele"
        let attributedString=NSAttributedString(string: string)

        testWriteDocX(attributedString: attributedString)
    }

    func testLessThan() {
        let string="0 < 1"
        let attributedString=NSAttributedString(string: string)

        testWriteDocX(attributedString: attributedString)
    }

If you then run these tests, they will fail with: [DocXTests.DocXTests testAmpersand] : failed - The file…couldn’t be opened because it isn’t in the correct format.

Line space issue

The line space is zero, and I got this result after using .writeDocX in iOS

written language: Arabic with custom font name (Cairo)

image

Images exported to docx should use an extension that matches their format

When DocX writes images to the media folder during docx creation, it always uses the “png” extension, regardless of the format of that image.

For instance, if I initialize an NSTextAttachment with a file wrapper that points to a JPEG image (e.g. “image.jpg”), then DocX will export it using a name like “rId3.png”. However, the format for that image is still a JPEG:

$ file -I rId3.png 
rId3.png: image/jpeg; charset=binary

Instead, it should use an extension that matches the format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.