Git Product home page Git Product logo

ccf_draft's Introduction

Cadence Compact Format (CCF)

Cadence Compact Format (CCF) is a data format designed for compact, efficient, and deterministic encoding of Cadence external values. CCF is defined in ccf_specs.md.

Cadence is a resource-oriented programming language that introduces new features to smart contract programming. It's used by Flow blockchain and has a syntax inspired by Swift, Kotlin, and Rust. Its use of resource types maps well to the Move language.

CCF can be used as a hybrid data format. CCF-based messages can be fully self-describing or partially self-describing. Both are more compact than JSON-based messages. CCF-based protocols can send Cadence metadata just once for all messages of that type. Malformed data can be detected without Cadence metadata and without creating Cadence objects.

CCF obsoletes JSON-Cadence Data Interchange Format for use cases that do not require JSON.

Introduction

CCF is a data format that allows compact, efficient, and deterministic encoding of Cadence external values.

Cadence external values (e.g. events, transaction arguments, etc.) have been encoded using JSON-CDC, which is inefficient, verbose, and doesn't define deterministic encoding.

The same FeesDeducted event on the Flow blockchain can encode to:

  • 298 bytes in JSON-CDC (minified).
  • 118 bytes in CCF (fully self-describing mode).
  •  20 bytes in CCF (partially self-describing mode).

CCF defines all requirements for deterministic encoding (sort orders, smallest encoded forms, and Cadence-specific requirements) to allow CCF codecs implemented in different programming languages to produce the same deterministic encodings.

Some requirements (such as "Deterministic CCF Encoding Requirements") are defined as optional. Each CCF-based format or protocol can have its specification state how CCF options are used. This allows each protocol to balance tradeoffs such as compatibility, determinism, speed, encoded data size, etc.

CCF uses CBOR and is designed to allow efficient detection and rejection of malformed messages without creating Cadence objects. This allows more costly checks for validity, etc. to be performed only on well-formed messages.

CBOR is an Internet Standard defined by IETF STD 94. CBOR is designed to be relevant for decades and is used by data formats and protocols such as W3C WebAuthn, C-DNS (IETF RFC 8618), COSE (IETF STD 96), CWT (IETF RFC 8392), etc.

Internet Standards

CCF uses a subset of CBOR and Core Deterministic Encoding Requirements which are defined in RFC 8949. CCF specification document uses CDDL (Concise Data Definition Language) notation and EDN (Extended Diagnostic Notation). CDDL and EDN are defined in RFC 8610.

RFC 8949 and RFC 8610 are Internet Standards designed to be relevant for many years (not just regular RFCs).

Status

  • CCF specification is currently a release candidate (RC1) with cleanup underway.
  • CCF codec (written in Go) was merged into Cadence repository with API compatible with JSON-CDC codec.

Next steps

  • CCF specification will be cleaned up and RC1 status will be replaced by RC2.
  • Fuzz tests will be run by the Cadence team for each PR that has changes affecting CCF codec.

Timeline

  • Sep-Oct 2022 - As requested, paused onboarding in order to work fulltime on reviewing checkpointer v6 (onflow/flow-go repo).
  • Oct 18, 2022 - Resume onboarding of Cadence external value encoding and requirements.
  • Nov 17, 2022 - Share the abridged first draft of CCF with Cadence team and Ramtin for initial sanity check.
  • Nov 22, 2022 - First team meeting about abridged draft of CCF to present and discuss revision 20221122b.
  • Nov 29, 2022 - Second team meeting about draft of CCF to present and discuss revision 20221129b.
  • Dec 9, 2022 - Merged PR 35 to add more Cadence types and reassign CBOR tag values. The only Cadence type that is known to be missing from CCF specs is cadence.PathLink (blocked by onflow/cadence#2167).
  • Dec 15, 2022 - Updated CCF codec (WIP) to incorporate latest CCF specs. For example, updates to CCF specs from PRs 30, 31, 32, and 35. All existing CCF codec tests pass (e.g. JSON-Cadence tests ported and modified to CCF).
  • Feb 14, 2023 - ABRIDGED DRAFT -> DRAFT. Third team meeting about draft of CCF to present and discuss unmerged revision 20230214a.
  • Feb 17, 2023 - DRAFT -> RC1 with revision 20230217a
  • Mar 1, 2023 - Open PR 2364 to add CCF codec (about +15,000 lines and go test -cover reporting 77%).
  • Mar 2023 - Paused in order to work on Atree v0.6: onflow/atree#295
  • Mar-Apr 2023 - Updated PR 2364 to match new changes to Cadence that affect external values, incorporate review feedback, and add more tests.
  • Apr 5, 2023 - As requested, reduce hours spent on CCF to begin work on Atree register inlining. onflow/atree#292
  • Apr 13, 2023 - Paused after merging PR 2364 to add CCF codec (+20,857 lines, go test -cover reported 83%, fuzz tested many billions of executions). onflow/cadence#2364
  • May 26, 2023 - Resumed as requested with June 7 deadline, to use CCF codec for events encoding for deployment to testnet by June 7. - Paused work on Atree register inlining onflow/atree#292.
  • June 9, 2023 - Fix backward compatibility with programs relying on JSON-CDC sort order because they were accessing event fields by index rather than field name. - Add options to CCF codec and make events encoding opt-out of "Deterministic CCF Encoding Requirements"
  • June 13, 2023 - Paused as requested ("CCF showed enough impact" in fully self-describing mode) and switch back to Atree register inlining onflow/atree#292. Postpone work on CCF Specs and CCF Codec (e.g. partially self describing mode which would reduce events encoding size by 14x instead of 2x)
  • August 1-4, 2023 - Resume updating CCF specs part-time after inquiry about moving it to onflow/ccf. #92.

Preliminary Size and Benchmark Comparisons

We are not comparing apples to apples. Prior formats (CBF and JSON-Cadence Data Interchange) didn't specify requirements for validity, sorting, etc.

  • CCF encoder sorts events data for deterministic encoding.
  • CCF decoder verifies that events data are well-formed and sorted.

At this time, CCF decoder doesn't include the option to check for "Preferred Serialization" (encoding to smallest size).

Size Comparisons

Encoding Event Count Encoded size Comments
JSON 48,309 13,858,836 JSON-Cadence Data Interchange Format
CCF 48,309 6,159,931 CCF in fully self-describing and deterministic mode
CCF 48,309 TBD Est. 1/14 size of JSON-CDC with CCF in partially self-describing mode

CCF's partially self-describing mode would be even smaller (roughly 1/14 the size of JSON) in some use cases.

Preliminary Speed and Memory Comparisons (obsolete)

These informal and preliminary benchmarks used commit f911063 in onflow/cadence#2364.

This is obsolete because we opt-out of "Deterministic CCF Encoding Requirements" for events encoding. Not using that mode makes CCF faster and more memory efficient than shown here.

$ benchstat bench_json_events_48k.log bench_ccf_events_48k.log 
goos: linux
goarch: amd64
pkg: github.com/onflow/cadence/encoding/ccf
cpu: 13th Gen Intel(R) Core(TM) i5-13600K
                     │ bench_json_events_48k.log │      bench_ccf_events_48k.log       │
                     │          sec/op           │   sec/op     vs base                │
EncodeBatchEvents-20                 96.61m ± 4%   70.73m ± 3%  -26.79% (p=0.000 n=10)
DecodeBatchEvents-20                 647.7m ± 3%   157.5m ± 3%  -75.68% (p=0.000 n=10)
geomean                              250.1m        105.5m       -57.81%

                     │ bench_json_events_48k.log │       bench_ccf_events_48k.log       │
                     │           B/op            │     B/op      vs base                │
EncodeBatchEvents-20                32.45Mi ± 0%   25.82Mi ± 0%  -20.45% (p=0.000 n=10)
DecodeBatchEvents-20               234.97Mi ± 0%   56.16Mi ± 0%  -76.10% (p=0.000 n=10)
geomean                             87.32Mi        38.08Mi       -56.39%

                     │ bench_json_events_48k.log │      bench_ccf_events_48k.log       │
                     │         allocs/op         │  allocs/op   vs base                │
EncodeBatchEvents-20                 756.6k ± 0%   370.4k ± 0%  -51.05% (p=0.000 n=10)
DecodeBatchEvents-20                 4.746M ± 0%   1.288M ± 0%  -72.86% (p=0.000 n=10)
geomean                              1.895M        690.7k       -63.55%

Event Data Details

The 48,309 events used in comparisons are from a transaction on mainnet with unusually high number of events.

There were 9 event types. These 3 event types had over 15,000 events each: FlowToken.TokensDeposited, FlowToken.TokensWithdrawn, FlowIDTableStaking.DelegatorRewardsPaid.

To simplify benchmark code (it's Sunday night), all event values for each event type are the same (i.e. the values are from the first event of that type).

These benchmark results are preliminary and subject to change.

Notes

Draft of CCF was originally in README.md and moved to ccf_specs.md on Nov 29, 2022. Given this, the initial commit history of the CCF specification is associated with the README.md file rather than ccf_specs.md.

License

CCF is licensed under the terms of the Apache License, Version 2.0. See LICENSE for more information.

ccf_draft's People

Contributors

fxamacker avatar turbolent avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ccf_draft's Issues

Update outdated benchmarks in README

Benchmarks are from initial proof-of-concept CCF codec. We should update benchmarks and comparisons with a reminder that we are not comparing apples to apples.

Prior formats (CBF and JSON-Cadence Data Interchange) didn't specify requirements for validity, sorting, etc.

Sorting data, deterministic encoding, etc. are not free.

Add section about canonical encoding

For example, make it clear that CCF-based protocols and data formats must use a deterministic sequence when encoding more than one Cadence composite type.

Update specs for composite-type-value.initializers

Some changes to specs are required:

  • Update composite-type-value.initializers from "one or many" to "zero or one" since only one initializer is supported and sorting is hard for multiple initializers.

  • Remove deterministic sorting requirement for composite-type-value.initializers since only one initializer is supported and initializer parameters have natural sorting and shouldn't be changed.

Thanks @turbolent for great discussion and suggesting this today!

Remove unnecessary cadence-type-id

Problem

In JSON-CDC, sometimes cadence-type-id was encoded when it was just a "stringification" of other encoded data and not necessary to encode.

In CCF, we don't need to keep this inefficiency for the sake of compatibility.

Thanks @turbolent for spotting this! 👍

Proposed Solution

Remove the inefficiency by removing unnecessary cadence-type-id from

  • restricted-type-value
  • restricted-type
  • function-value

Update Security Considerations

Mention decoding limits can be stricter for untrusted inputs and less strict for trusted inputs. For example, CBOR limits such as MaxArrayElements, MaxMapPairs, and MaxNestedLevels can be set differently for decoders processing trusted and untrusted inputs. CCF-based protocols can also specify different limits to balance tradeoffs.

The main tradeoff for decoder limits:

  • too high will allow memory exhaustion attacks, etc. to succeed.
  • too low will create the possibility of being unable to decode a non-malicious message that exceeds limits.
    NOTE: Encoders usually don't enforce limits because it's much simpler and more efficient for apps to enforce it.

Example Limit for Max Array Elements

A GRPC limit of 20 MB can support (at most) a 20,000,000 element array (for an unrealistic message with zero-overhead and 1 byte elements).

In practice, it would take many thousands of non-malicious CCF messages (like average-sized events) to reach a 20 MB GRPC limit, so it doesn't make sense to allow more than 20,000,000 elements for each array within a single CCF message.

This update to CCF specs can be done after opening PR to add CCF Codec to onflow/cadence and before CCF Specs RC2.

README has outdated status and introduction can be improved

The current status and timeline is outdated. Also, the introduction can be improved by copying this Introduction section from the CCF specification:

Introduction

Cadence external values (e.g. transaction arguments, events, etc.) have been encoded using JSON-Cadence Data Interchange format, which is human-readable, verbose, and doesn't define deterministic encoding.

CCF is a binary data format that allows more compact, efficient, and deterministic encoding of Cadence external values. Consequently, the CCF codec in Cadence is faster, uses less memory, encodes deterministically, and produces smaller messages than the JSON-CDC codec.

A real FeesDeducted event can encode to:

  • 298 bytes in JSON-CDC (minified).
  • 118 bytes in CCF (fully self-describing mode).
  • ~20 bytes in CCF (partially self-describing mode) with 12 bytes for data and ~8 bytes for type id (counter, hash, etc.)

Unlike prior formats, CCF defines all requirements for deterministic encoding (sort orders, smallest encoded forms, and Cadence-specific requirements) to allow CCF codecs implemented in different programming languages to deterministically produce identical messages.

For security, CCF was designed to allow efficient detection and rejection of malformed messages without creating Cadence objects. This allows more costly checks (e.g. validity) to be performed only on well-formed messages.

CCF leverages vendor-neutral Internet Standards such as CBOR (RFC 8949), which is designed to be relevant for decades.

Add more types and reassign CBOR tag numbers in CDDL

The following improvements were identified by reading Cadence Core Contracts with CCF specs and codec in mind.

More types need to be added in CDDL:

  • struct interface type
  • resource interface type
  • contract interface type
  • reference type
  • restricted type

Reassign tag numbers to reserve some tag numbers in each (sub)group.

Add interface types as options to ccf-composite-type-message. This is more extensible than using simple type at the cost of encoding a little more data.

Add reference and restricted types as options to inline-type.

Add support for function value.

Refactor CDDL to separate type objects from type value objects for readability and cleaner implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.