Git Product home page Git Product logo

tendril's Introduction

tendril

Warning: This library is at a very early stage of development, and it contains a substantial amount of unsafe code. Use at your own risk!

Build Status

API Documentation

Introduction

Tendril is a compact string/buffer type, optimized for zero-copy parsing. Tendrils have the semantics of owned strings, but are sometimes views into shared buffers. When you mutate a tendril, an owned copy is made if necessary. Further mutations occur in-place until the string becomes shared, e.g. with clone() or subtendril().

Buffer sharing is accomplished through thread-local (non-atomic) reference counting, which has very low overhead. The Rust type system will prevent you at compile time from sending a tendril between threads. (See below for thoughts on relaxing this restriction.)

Whereas String allocates in the heap for any non-empty string, Tendril can store small strings (up to 8 bytes) in-line, without a heap allocation. Tendril is also smaller than String on 64-bit platforms โ€” 16 bytes versus 24. Option<Tendril> is the same size as Tendril, thanks to NonZero.

The maximum length of a tendril is 4 GB. The library will panic if you attempt to go over the limit.

Formats and encoding

Tendril uses phantom types to track a buffer's format. This determines at compile time which operations are available on a given tendril. For example, Tendril<UTF8> and Tendril<Bytes> can be borrowed as &str and &[u8] respectively.

Tendril also integrates with rust-encoding and has preliminary support for WTF-8 buffers.

Plans for the future

Ropes

html5ever will use Tendril as a zero-copy text representation. It would be good to preserve this all the way through to Servo's DOM. This would reduce memory consumption, and possibly speed up text shaping and painting. However, DOM text may conceivably be larger than 4 GB, and will anyway not be contiguous in memory around e.g. a character entity reference.

Solution: Build a rope on top of these strings and use that as Servo's representation of DOM text. We can perhaps do text shaping and/or painting in parallel for different chunks of a rope. html5ever can additionally use this rope type as a replacement for BufferQueue.

Because the underlying buffers are reference-counted, the bulk of this rope is already a persistent data structure. Consider what happens when appending two ropes to get a "new" rope. A vector-backed rope would copy a vector of small structs, one for each chunk, and would bump the corresponding refcounts. But it would not copy any of the string data.

If we want more sharing, then a 2-3 finger tree could be a good choice. We would probably stick with VecDeque for ropes under a certain size.

UTF-16 compatibility

SpiderMonkey expects text to be in UCS-2 format for the most part. The semantics of JavaScript strings are difficult to implement on UTF-8. This also applies to HTML parsing via document.write. Also, passing SpiderMonkey a string that isn't contiguous in memory will incur additional overhead and complexity, if not a full copy.

Solution: Use WTF-8 in parsing and in the DOM. Servo will convert to contiguous UTF-16 when necessary. The conversion can easily be parallelized, if we find a practical need to convert huge chunks of text all at once.

Source span information

Some html5ever API consumers want to know the originating location in the HTML source file(s) of each token or parse error. An example application would be a command-line HTML validator with diagnostic output similar to rustc's.

Solution: Accept some metadata along with each input string. The type of metadata is chosen by the API consumer; it defaults to (), which has size zero. For any non-inline string, we can provide the associated metadata as well as a byte offset.

tendril's People

Contributors

alexcrichton avatar atouchet avatar bors-servo avatar chris-morgan avatar jdm avatar kmcallister avatar metajack avatar ms2ger avatar nikhilshagri avatar nikomatsakis avatar nox avatar pointlessone avatar simonsapin avatar xfix avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.