Git Product home page Git Product logo

proto-lens's Introduction

proto-lens

Build Status

The proto-lens library provides an API for protocol buffers using modern Haskell language and library patterns. Specifically, it provides:

  • Composable field accessors via lenses
  • Simple field name resolution/overloading via type-level literals
  • Type-safe reflection and encoding/decoding of messages via GADTs

This is not an official Google product.

Tutorial

You can find tutorial documentation in the proto-lens-tutorial subdir.

There is also a reference document showing the protobuf scalar type -> haskell type mappings used by the generated lenses.

Instructions

Setup

First, install the "protoc" binary somewhere in your PATH. You can get it by following these instructions.

This project requires at least protoc version 3.12.0.

Building from HEAD

To build and test this repository from HEAD, run:

git submodule update --init --recursive
stack test

Note: building this repository requires stack-2.3.1 or newer.

Using in a Cabal or Stack package

proto-lens is available on Hackage and Stackage. Cabal and Stack projects can use it to auto-generate Haskell source files from the original protocol buffer specifications (.proto files).

Note: if using Stack, these instructions require v1.4.0 or newer.

First, edit the .cabal file of your project to:

  • Specify build-type: Custom, and add a custom-setup clause that depends on proto-lens-setup.
  • Add build-tool-depends: proto-lens-protoc:proto-lens-protoc.
  • List the .proto files in extra-source-files. Note that the field belongs at the top level of the .cabal file, rather than once per library/executable/etc.
  • List the generated modules (e.g. Proto.Foo.Bar) in exposed-modules or other-modules of the rule(s) that use them (e.g. the library or executables).
  • Add proto-lens-runtime to the build-depends of those rules.
  • Add a custom-setup clause to your .cabal file.

For example, in foo-bar-proto.cabal:

...
build-type: Custom
extra-source-files: src/foo/bar.proto
...
custom-setup
  setup-depends: base, Cabal, proto-lens-setup
  build-tool-depends: proto-lens-protoc:proto-lens-protoc

library
    exposed-modules: Proto.Foo.Bar, Proto.Foo.Bar_Fields
    autogen-modules: Proto.Foo.Bar, Proto.Foo.Bar_Fields
    build-depends: proto-lens-runtime, ...

(Note: if you do not have proto-lens-{runtime/setup}, you are probably using a version earlier than 0.4 and should replace those packages with proto-lens-protoc.)

Next, write a Setup.hs file that uses Data.ProtoLens.Setup and specifies the directory containing the .proto files. For example:

import Data.ProtoLens.Setup

main = defaultMainGeneratingProtos "src"

Then, when you run cabal build or stack build, Cabal will generate a Haskell file from each .proto file, and use it as part of building the library/executable/etc.

See the proto-lens-tests package for some more detailed examples.

Manually running the protocol compiler

Suppose you have a file foo.proto. Then to generate bindings, run:

protoc --plugin=protoc-gen-haskell=`which proto-lens-protoc` \
    --haskell_out=. foo.proto

This will generate a file Proto/Foo.hs which contains Haskell definitions corresponding to the messages and fields in the protocol buffer file.

Use --haskell_out to control the location of the output file.

Use --proto_path to specify the location of input .proto files. For example, suppose we have the files src/project/{foo,bar}.proto, and bar.proto has the line

import "project/foo.proto";

Then running:

protoc --plugin=protoc-gen-haskell=`which proto-lens-protoc` \
    --haskell_out=. \
    --proto_path=src \
    src/project/foo.proto src/project/bar.proto

will generate the haskell files Proto/Project/{Foo,Bar}.hs.

Current differences from the standard

  • Extensions (proto2-only) are not supported.
  • Unknown proto2 enum values cause a decoding error, instead of being preserved round-trip.

Troubleshooting

Rebuilding

Due to stack issue #1891, if you only change the .proto files then stack won't rebuild the package (that is, it won't regenerate the Proto.* modules).

Loading into ghci with Stack

stack ghci can get confused when trying to directly load a package that generates Proto.* modules (for example: stack ghci <proto-package>). To work around this issue, run instead:

stack exec ghci --package <proto-package>

And then manually import the generated modules within ghci, for example:

Prelude> import Proto.Foo.Bar
Prelude Proto.Foo.Bar>
...

Alternately, you can make those modules available at the same time as another local package, by running:

stack ghci <another-package> --package <proto-package>

Linking errors

Due to the limitations of how we can specify the dependencies of Setup files, stack may try to link them against the terminfo package. You may get an error from stack build similar to:

/usr/bin/ld: cannot find -ltinfo

On a Debian based system (like Ubuntu), the remedy is:

sudo apt-get install libncurses5-dev

proto-lens's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proto-lens's Issues

Consolidate packages

The proto-lens, proto-lens-protoc and proto-lens-descriptors packages are tied pretty closely together. Consider consolidating some or all of them.

The main concern is bootstrapping. Changing the internals of lens-labels or proto-lens effectively breaks proto-lens-descriptors until the descriptor modules can be regenerated -- but regenerating them requires a working proto-lens-protoc, which depends on proto-lens-descriptors, introducing a cycle. The current bootstrap script solves this using the fact that they're all separate packages: it builds a new proto-lens-protoc against an old, working version of lens-labels, proto-lens and proto-lens-descriptors, and uses that compiler to generate the new descriptor modules. I'm not sure how to implement that process if they're all in the same Cabal package.

Convert to/from JSON

We can provide functions to convert proto messages to/from JSON, using the existing reflection capabilities of Data.ProtoLens.Message.

This is the canonical mapping:
https://developers.google.com/protocol-buffers/docs/proto3#json

Some example language bindings:

https://github.com/google/protobuf/blob/master/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java

Note that they don't include proto2-only features; extensions and unknown fields are dropped from the JSON output.

Better support for "oneof" fields

We currently treat "oneof" fields similar to optional fields. This is an intended backwards-compatible behavior of the wire encoding, but makes some use cases more awkward.

https://developers.google.com/protocol-buffers/docs/proto#oneof

One possible approach is to store the value as a sum type internally, and provide lenses that return a default value when their case isn't set (as well as "maybe'foo" variants). Another option (less memory efficient) is to store the fields normally, but make each field's lens clear out all the other fields when it's being set.

Better support for chunked parsers and streaming

Currently, when parsing messages we force a strict ByteString for every submessage. It might be better to parse in a more chunked fashion, e.g., for use with lazy ByteStrings or with conduits/pipes.

Counterpoint: regardless, parsing has to read the whole input into a Haskell object that's at least as big as the input data anyway. So it's not clear how much you'd gain from this change.

If we did want this, it might be easier to move from attoparsec to another library. Protobufs are a little tricky to parse because the wire format is not naturally delimited; messages are just sequences of tagged field/value pairs, and sub-messages are encoded as a varint followed by the message (with no "ending" marker).

For example, from the binary package we could use isolate :: Int -> Get a -> Get a which restricts a sub-parser to a specific number of bytes, and isEmpty :: Get Bool which detects end-of-input correctly within a call to isolate. In comparison:

  • attoparsec doesn't provide an isolate function, AFAIK; currently we mimic it by running a parser on the output of take :: Parser ByteString.
  • cereal provides an isolate function, but it still reads the full span into a single ByteString.
  • store's isolate doesn't work yet for our use case (mgsloan/store#40) and the library also lacks support for architecture-independent serialization (mgsloan/store#36). See #5 for more discussion.

showMessage and related functions use Haskell not C string escaping conventions

showMessage and the related pprintMessage and showMessageShort functions use the Haskell string escaping conventions instead of the C ones. This means that non-printing characters get written as, e.g. "\SOH", which https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.cc#L1039 won't parse. Worse, in Haskell the escape "\101" means decimal 101 whereas the tokenizer.cc code (following C convention) interprets that as octal, i.e. decimal 97.

Build failure

Hello,

I was trying to build this with stack install and got this error:

-- Dumping log file due to warnings: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log

[1 of 2] Compiling Main ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o )
[2 of 2] Compiling StackSetupShim ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o )
Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ...
Configuring proto-lens-protobuf-types-0.2.2.0...
proto-src: warning: directory does not exist.
proto-src/google/protobuf/any.proto: No such file or directory
callProcess: /usr/local/bin/protoc
"--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc"
"--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen"
"--proto_path=proto-src" "proto-src/google/protobuf/any.proto"
"proto-src/google/protobuf/duration.proto"
"proto-src/google/protobuf/wrappers.proto" (exit 1): failed

-- End of log file: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log

Log files have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/
Progress: 33/36
-- While building package proto-lens-protobuf-types-0.2.2.0 using:
/home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup --builddir=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0 build lib:proto-lens-protobuf-types --ghc-options " -ddump-hi -ddump-to-file"
Process exited with code: ExitFailure 1
Logs have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log

[1 of 2] Compiling Main             ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o )
[2 of 2] Compiling StackSetupShim   ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o )
Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ...
Configuring proto-lens-protobuf-types-0.2.2.0...
proto-src: warning: directory does not exist.
proto-src/google/protobuf/any.proto: No such file or directory
callProcess: /usr/local/bin/protoc
"--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc"
"--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen"
"--proto_path=proto-src" "proto-src/google/protobuf/any.proto"
"proto-src/google/protobuf/duration.proto"
"proto-src/google/protobuf/wrappers.proto" (exit 1): failed

Apparently it is looking for proto-src/google/protobuf/any.proto and that might be causing the issue?

Add a benchmark for endoding/decoding

We should have a benchmark for encoding to/from the wire format, to make sure that our reflection and abstraction in the parser doesn't cause a significant slowdown compared to other Haskell code, and to give us more confidence when refactoring. (So far, we haven't done any performance tuning of the code.)

One arbitrary data point: decoding a 1MB proto took ~60ms on my desktop, and decoding followed by encoding (which includes forcing all the fields) took ~230ms. (The code was compiled with -O2.)

We should probably also benchmark the text format, though that's usually less performance-critical.

Generate insertion points to allow integration with other protoc plugins.

Protobuf compiler plugins can request that their generated code be pasted into an existing generated file immediately above a specified insertion point, using the File.insertion_point field. These insertion points are specified by placing "@@protoc_insertion_point(some_name)" in the generated source. proto-lens-protoc should have insertion points for at least the imports and the module top-level scope.

Generate `Ord` instances for messages.

It would be nice if protolens automatically generated Ord instances of messages. This could be hidden behind a cabal flag if it turned out to be egregiously slow.

Test failure (test build failure)

In stackage nightly. Full trace:

> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
[1 of 1] Compiling Main             ( Setup.hs, Setup.o )
Linking Setup ...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup configure --enable-tests --package-db=clear --package-db=global --package-db=/var/stackage/work/builds/nightly/pkgdb --libdir=/var/stackage/work/builds/nightly/lib --bindir=/var/stackage/work/builds/nightly/bin --datadir=/var/stackage/work/builds/nightly/share --libexecdir=/var/stackage/work/builds/nightly/libexec --sysconfdir=/var/stackage/work/builds/nightly/etc --docdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --htmldir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --haddockdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --flags=
Configuring proto-lens-combinators-0.1.0.8...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup build
unrecognized option `--plugin=protoc-gen-haskell=/var/stackage/work/builds/nightly/bin/proto-lens-protoc'
unrecognized option `--haskell_out=dist/build/global-autogen'
unrecognized option `--proto_path=tests'
Usage: protoc [OPTION]... FILES
  -h  --help     show usage
  -v  --version  show version number

Preprocessing library for proto-lens-combinators-0.1.0.8..
Building library for proto-lens-combinators-0.1.0.8..
[1 of 1] Compiling Data.ProtoLens.Combinators ( src/Data/ProtoLens/Combinators.hs, dist/build/Data/ProtoLens/Combinators.o )
Preprocessing test suite 'combinators_test' for proto-lens-combinators-0.1.0.8..
Setup: can't find source for Proto/Combinators in tests,
dist/build/combinators_test/autogen, dist/build/global-autogen

Protobuf generation does not preserve capitalisation

I've got the following protobuf definition:

syntax` = "proto2";

message Request {
  required string uri = 1;
  required string userUuid = 2;
}

But the generated code downcases the userUuid field so the "accessor" function is: useruuid which doesn't seem right. The only reference to changing case is to do with groups so I assume this isn't correct.

I've put the resultant file in line as I can't attach it:

{- This file was auto-generated from Request.proto by the proto-lens-protoc program. -}
{-# LANGUAGE ScopedTypeVariables, DataKinds, TypeFamilies,
  MultiParamTypeClasses, FlexibleContexts, FlexibleInstances,
  PatternSynonyms #-}
{-# OPTIONS_GHC -fno-warn-unused-imports#-}
module Proto.Request where
import qualified Prelude
import qualified Data.Int
import qualified Data.Word
import qualified Data.ProtoLens.Reexport.Data.ProtoLens
       as Data.ProtoLens
import qualified
       Data.ProtoLens.Reexport.Data.ProtoLens.Message.Enum
       as Data.ProtoLens.Message.Enum
import qualified Data.ProtoLens.Reexport.Lens.Family2
       as Lens.Family2
import qualified Data.ProtoLens.Reexport.Lens.Family2.Unchecked
       as Lens.Family2.Unchecked
import qualified Data.ProtoLens.Reexport.Data.Default.Class
       as Data.Default.Class
import qualified Data.ProtoLens.Reexport.Data.Text as Data.Text
import qualified Data.ProtoLens.Reexport.Data.Map as Data.Map
import qualified Data.ProtoLens.Reexport.Data.ByteString
       as Data.ByteString

data Request = Request{_Request'uri :: !Data.Text.Text,
                       _Request'useruuid :: !Data.Text.Text}
             deriving (Prelude.Show, Prelude.Eq)

type instance Data.ProtoLens.Field "uri" Request = Data.Text.Text

instance Data.ProtoLens.HasField "uri" Request Request where
        field _
          = Lens.Family2.Unchecked.lens _Request'uri
              (\ x__ y__ -> x__{_Request'uri = y__})

type instance Data.ProtoLens.Field "useruuid" Request =
     Data.Text.Text

instance Data.ProtoLens.HasField "useruuid" Request Request where
        field _
          = Lens.Family2.Unchecked.lens _Request'useruuid
              (\ x__ y__ -> x__{_Request'useruuid = y__})

instance Data.Default.Class.Default Request where
        def
          = Request{_Request'uri = Data.ProtoLens.fieldDefault,
                    _Request'useruuid = Data.ProtoLens.fieldDefault}

instance Data.ProtoLens.Message Request where
        descriptor
          = let uri__field_descriptor
                  = Data.ProtoLens.FieldDescriptor "uri"
                      (Data.ProtoLens.StringField ::
                         Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
                      (Data.ProtoLens.PlainField Data.ProtoLens.Required uri)
                useruuid__field_descriptor
                  = Data.ProtoLens.FieldDescriptor "userUuid"
                      (Data.ProtoLens.StringField ::
                         Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
                      (Data.ProtoLens.PlainField Data.ProtoLens.Required useruuid)
              in
              Data.ProtoLens.MessageDescriptor
                (Data.Map.fromList
                   [(Data.ProtoLens.Tag 1, uri__field_descriptor),
                    (Data.ProtoLens.Tag 2, useruuid__field_descriptor)])
                (Data.Map.fromList
                   [("uri", uri__field_descriptor),
                    ("userUuid", useruuid__field_descriptor)])

uri ::
    forall msg msg' . Data.ProtoLens.HasField "uri" msg msg' =>
      Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "uri" msg)
        (Data.ProtoLens.Field "uri" msg')
uri
  = Data.ProtoLens.field
      (Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "uri")

useruuid ::
         forall msg msg' . Data.ProtoLens.HasField "useruuid" msg msg' =>
           Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "useruuid" msg)
             (Data.ProtoLens.Field "useruuid" msg')
useruuid
  = Data.ProtoLens.field
      (Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "useruuid")

In proto3, repeated fields of scalar numeric types use packed encoding by default.

See https://developers.google.com/protocol-buffers/docs/proto3#specifying-field-rules. proto-lens generated code does not implement this correctly. Trying to decode repeated fields of scalar numeric types (like int32 and float) results in a message like "Field 1 expects wire type 0 but found 2", because packed repeated fields are wire type 2 (see https://developers.google.com/protocol-buffers/docs/encoding#structure).

Switch FieldDescriptor name from String to Text

The FieldDescriptor name is a String, which is probably less efficient than Text. The same goes for fieldsByTextFormatName.

This probably only affects text format decoding, since the wire format uses it only for error messages. We can add a benchmark and see whether it makes a difference.

Static type-checking of required fields.

Currently, required fields are defaulted to the "zero" value for that type. We should instead provide smart construction that checks at compile time whether all the required fields have been set properly.

Note that this is moot for proto3, which got rid of the concept of required fields altogether.

One possible, nebulously-described approach: for every datatype Foo, also define a Foo'Builder which is parametrized by the type of each required field (and which may be () if it's not set). This Foo'Builder can be an instance of Default (instead ofFoo), and we can provide lenses to build up its individual fields, as well as a class to "freeze" Foo'Builder into Foo once all its fields have been set.

Preserve unknown proto2 fields and enums

We should preserve unknown fields and enums in proto2 messages.

From the docs:
https://developers.google.com/protocol-buffers/docs/proto

Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.

Also, enums with unknown values should be treated like unknown fields and preserved for reserialization. (Note that this behavior is proto2-only; #28 describes the desired behavior for proto3.)

Haskell files being touched causes unecessary recompilation

proto-lens-protoc touches (updates the modification time of) generated .hs files even when the contents did not change, causing unecessary rebuilds when compiling with ghc.

This is because currently GHC considers only the mtime of the input file for determining whether something has to be recompiled, not its contents.

The problematic code is here:

callProcess protoc $
[ "--plugin=protoc-gen-haskell=" ++ protoLensProtoc
, "--haskell_out=" ++ output
]
++ ["--proto_path=" ++ p | p <- imports]
++ files

It seems that it is protoc that touches / rewrites the files.

Which way should this be fixed?

  • Generate the files into a different, temporary output directory, the moving the .hs files over only if they aren't identical?
  • Change protoc itself to not write the files if the contents are the same?

Support parametrized "Any"

We can use google.protobuf.Any to support basic parametric polymorphism. For example, if we define a file haskell_type_variables.proto with custom options:

extend google.protobuf.MessageOptions {
    repeated string haskell_type_var = 50000;
}

extend google.protobuf.FieldOptions {
    optional string haskell_type_var = 50000;
}

Then we can use those options to annotate a type:

import ".../haskell_type_variables.proto";

message Foo {
    option (haskell_type_var) = "a";
    option (haskell_type_var) = "b";
    int32 x = 1;
    google.protobuf.Any y = 2 [(haskell_type_var)="a"];
    google.protobuf.Any z = 2 [(haskell_type_var)="b"];
}

And generate the following type from that file:

data Foo a b = Foo { _Foo'x :: Int32, _Foo'y :: Maybe a, _Foo'z :: Maybe b}

instance (Message a, Message b) => Message (Foo a b) where ...

encodeMessage/decodeMessage could serialize the submessage as an Any, but in Haskell code represent it as a regular (not serialized) Haskell type. This makes it easier to use proto types directly in Haskell (instead of via wrapper types).

Following the above option-based design will require us to support extensions (#27) so that proto-lens-protoc can understand the new options that we add.

Compile failure with `proto-lens-combinators-0.1.0.8`

In order, the following will be built (use -v for more details):
 - proto-lens-combinators-0.1.0.8 {proto-lens-combinators-0.1.0.8-inplace} (lib:proto-lens-combinators) (first run)
[1 of 1] Compiling Main             ( /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup.hs, /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/Main.o )
Linking /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup ...
<<ghc: 195021320 bytes, 90 GCs, 9019646/22334576 avg/max bytes residency (7 samples), 57M in use, 0.001 INIT (0.001 elapsed), 0.123 MUT (1.892 elapsed), 0.185 GC (0.185 elapsed) :ghc>>
Configuring proto-lens-combinators-0.1.0.8...
<<ghc: 190663776 bytes, 110 GCs, 12726334/39654304 avg/max bytes residency (8 samples), 102M in use, 0.001 INIT (0.001 elapsed), 0.101 MUT (0.101 elapsed), 0.218 GC (0.221 elapsed) :ghc>>
==========
Error: couldn't find the executable "proto-lens-protoc" in your $PATH.
    Please file a bug at https://github.com/google/proto-lens/issues .
==========
Missing executable "proto-lens-protoc"
CallStack (from HasCallStack):
  error, called at src/Data/ProtoLens/Setup.hs:297:13 in proto-lens-protoc-0.2.2.0-6fbfcc9fefb6f837231240070e1fad9e51f23d5d830dd28e2a4fa31f1e705ca4:Data.ProtoLens.Setup

Improve the field-related documentation/API

Some related changes to help simplify the API around fields in proto-lens:

  • Hide the constructors for proto messages. They're even less useful than before now that we have unknown fields. (Developers can still click source in the Haddock docs to see the underlying implementation.)

  • Make the Show instance not display the internal fields, instead using the text format, for example:

    showsPrec _ x = showChar '{' . showString (showMessageShort x) . showChar '}' 
    
  • Add Haddock comments for every proto message that list the names and types of all available lenses. Note: this is a little tricky (but doable) since haskell-src-exts doesn't easily support inserting top-level comments. (Note: include the accessor for unknown fields.)

How to parse delimited messages

I got a socket connection that is used to stream data with delimited protobuf messages (see this). In the java api there is an convenient method called parseDelimitedFrom. How can this be achieved using this library?
I am new to haskell, so i might have overlooked something. I am sorry if this is very obvious.

Autogenerate the list of modules

Currently every proto file needs to be specified twice in the .cabal file: the raw .proto file in extra-src-files, and the Proto.* module in exposed-modules/other-modules.

From very basic experiments with stack, I think it's possible to drop the latter requirement and have our Setup script populate the list of Haskell modules automatically, by changing the PackageDescription and/or LocalBuildInfo.

In addition to less redundancy, this will help with:

  • Refactorings such as #100
  • Integrate better with hpack (in particular, its ability to autodetect the exposed-modules and other-modules)

The exact design of this feature is still an open question: can (should?) we provide control to the user over whether their protos end up in exposed-modules or other-modules? Or in individual components (e.g. tests, executables or benchmarks)? For example, proto-lens-combinators contains a proto test file which is test-only and not intended to be exported from the library.

Finish support for "import public"

.proto files with "import public" statements don't re-export the public imports.

For example, if foo.proto contains

public import "bar.proto"

Then the generated module Foo.hs should re-export all the names defined in Bar.hs.

I think this is doable, but it might be a little tricky to avoid name conflicts between the autogenerated field accessors in Foo.hs and Bar.hs.

Document Combinators

While writing the implementation for prisms I had to go back and forth between the language extension library, Combinators.hs and Generate.hs figuring out what did what.

I think some documentation of Combinators.hs would be super helpful for any future development on those files.

Support filepaths with `dots`.

Hi, again. On top of #152 I had to make another workaround for the project I was using: the project had a directory with a UNIX-hidden-directory such as .protodir/myfilename.proto . The current plugin will generate modules named Proto..protodir.MyFileName, which is invalid Haskell.

As for the other bug I filed, I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .

Correct handling of unknown proto3 enum values

Currently proto-lens returns an error when decoding unknown enum values. It should instead accept and preserve such values.

Quoting the proto3 docs:

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

readMessage chokes on '-delimited strings

The TextFormat encoding may use either single or double quotes (though they must match). However proto-lens only supports double-quotes so far.

The error message looks like:

unexpected "'"
expecting "-", number, literal string or identifier

I found documentation of this behavior in the protobuf sources:
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.h#L116

Currently proto-lens's TextFormat parser uses Text.Parsec.Token.stringLiteral which doesn't support single-quoted strings:
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Token.html#v:stringLiteral

Use `autogenPackageModulesDir`

Cabal 2.0 added a function autogenPackageModulesDir which we should use in Data.ProtoLens.Setup if it's available. That would let us generate modules separately for each component (e.g. library vs exe vs tests), rather than generating them all in one place.

At minimum, this would prevent confusing GHC/Cabal errors when an exe imports a proto module but doesn't specify it in other-deps, and the module is specified for a library, but the test accidentally doesn't depend on the library.

Handle non-capitalized enum values

Hi, thanks for proto-lens; I got it to work on a gRPC client project with somewhat complicated .protos. While making it work, I had to patch a workaround to support valid Enum definitions which do not implement the recommended style-guide and use lower-cased enum value names.

My workaround is to call toUpper on enum names. This solution is not really great so I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .

Support Kythe metadata to crosslink Haskell code and protos

Hello - this is to start a discussion if proto-lens & haskell-indexer could support this feature. For a background, please read https://kythe.io/docs/schema/indexing-protobuf.html .

TLDR for proto-lens: the generated Haskell code (on specific request) should be annotated with proto2.GeneratedCodeInfo-equivalent data, most importantly path of proto file and the "magic path string" of the proto entity.

A complication is that proto-lens AFAIU doesn't generate direct field lens, rather string proxy lens (what's the correct term for this)?, so maybe the specific typeclass instance methods (these are the lens, am I right?) need to be annotated.

Then a complication for haskell-indexer is that now it emits a reference to the class method instead of the instance method from the use site (assuming the instance is fixed at the use-site). The indexer should rather reference the instance method (and of course emit a generates edge from the proto VName to the instance method lens VName), which might be possible to find out from the AST, though some digging is required here.

Open questions:

  1. Does this sound reasonable for proto-lens?

  2. How to parametrize proto-lens to get the metadata emitted? How do we arrange that this happens only in haskell-indexer mode?

  3. How should the metadata be emitted? In the C++ example, a new .pb.meta include is generated and included into the .pb.h. I think the main point is that the indexer should have somewhat convenient access to this - for example the data (generated Haskell spans -> proto source info) could also be shipped in a side-channel file.

+@judah @blackgnezdo for proto-lens

Support extensions

Extensions (proto2-only) currently aren't supported yet. It's not clear what the API should look like.

This would primarily be useful for legacy code, since proto3 replaces extensions with the Any type (#22).

Inconsistent naming of coproduct oneof fields

message AcmeObservation {
  oneof status {
    ActionWin win = 2;
    CompletedHurdleStatus completed_hurdle = 3;
    QualifyTransaction qualify_transaction = 4;
  }
}

results in the generated haskell:

data AcmeObservation'Status = AcmeObservation'Win !ActionWin
                            | AcmeObservation'Completed_hurdle !CompletedHurdleStatus
                            | AcmeObservation'Qualify_transaction !QualifyTransaction
                            deriving (Prelude.Show, Prelude.Eq)

Notice AcmeObservation'Completed_hurdle, which should become AcmeObservation'CompletedHurdle according to the renaming of all other snake case identifiers.

Change proto3 enums (back) to a sum type

Description of proto3 enums (reformatted from the docs):

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent.

  1. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation.
  2. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

Currently (i.e., on HEAD) we're using option #1. That is, if we had enum Foo = { A = 1; B = 2 } then we generate newtype Foo = Foo Int32 and define A and B as pattern synonyms:

pattern A = Foo 1
pattern B = Foo 2

This is simpler, but limits our ability to get exhaustiveness checking from the compiler. Specifically,
if someone adds a new enum case to the proto, the type checker won't tell us that we're now missing a case. This issue happened to us in real code.

GHC 8.2.1 does has COMPLETE directives for pattern synonyms, but (a) it's too soon to drop support for 8.0, and (b) that's a newer and less-well-understood feature.


The proposal for the new API is similar to what already exists for Scala and Java. For:
enum Foo = { A = 1; B = 2; }
generate the following code:

data Foo = A | B | Foo'Unrecognized Foo'UnrecognizedValue

-- | Representation of an unknown value.  Uses a newtype to make
-- the different branches of `Foo` provably distinct.
-- For example, this way we don't have to worry about whether
-- `A == Foo'Unrecognized (Foo'UnrecognizedValue 1)`.
newtype Foo'UnrecognizedValue = Foo'UnrecognizedValue Int32 -- hidden constructor

unrecognizedValue'Foo :: Foo'UnrecognizedValue -> Int32

instance Enum Foo where
    toEnum 1 = A
    toEnum 2 = B
    toEnum n = Foo'Unrecognized (Foo'UnrecognizedValue n)
    fromEnum = ...

Error on using maps

I define a file LinkParser.proto

syntax = "proto3";

message LinkParseResult {
  string title = 1;
  map<string, string> og = 2;
  repeated string imgs = 3;
}

When I run stack build, and also when I run:

protoc --plugin=protoc-gen-haskell=`which proto-lens-protoc` --haskell_out . LinkParser.proto

manually, I get the following error:

proto-lens-protoc: definedFieldType: Field type .LinkParseResult.OgEntry not found in environment.
--haskell_out: protoc-gen-haskell: Plugin failed with status code 1.

Happens on both libprotoc 3.0.0 and libprotoc 2.6.1 (where the latter used the "equivalent syntax" mentioned in the protoc docs).

Support enum aliases

Enums can have "aliases" where two different constructors may map to the same int value. (In both proto2 and proto3). This breaks our codegen, in particular the fromEnum instances.

Documentation:
https://developers.google.com/protocol-buffers/docs/proto3#enum

The user enables this feature by adding option allow_alias = true to the enum declaration. I don't know whether the protobuf compiler is the one doing the checking, or if our proto-lens-protoc plugin needs check it manually.

Support "Any"

We should support the Any type that was introduced in proto3:
https://developers.google.com/protocol-buffers/docs/proto3#any

At its core, Any is just another protocol buffer message (defined in google/protobuf/any.proto), so we should already be able to handle protos that reference it. However, we can add a nicer API on top for converting to/from an arbitrary message type, siimilar to what the C++ and Java bindings provide.

Include generated files in releases of packages depending on proto-lens

We should do something similar to happy/alex/etc for generated files, i.e., bundle them into the release archive that's uploaded to Hackage/Stackage. That way, packages that depend on protos won't require installing the protoc executable.

Cabal has special logic for happy and alex, but the logic around when to rebuild the generated files is somewhat flaky: haskell/cabal#2940, haskell/cabal#2311, haskell/cabal#2362. Part of the problem is that when Cabal unpacks the tarball of the package, it doesn't set the modification times consistently (this may be fixed on newer versions of Cabal, not sure though).

One option is for us to do something simpler than Cabal:

  • Never run protoc when building from an archive that was created by cabal sdist
  • Always run protoc otherwise (in particular: when building from the git repo).
    We'd need to make cabal sdist do something special in order for cabal build to tell the difference. One hacky option is to include an extra dummy file in extra-src-files. A more involved option would be to copy the generated files from the autogen dir (where they are now) to one of the hs-source-dirs; but that may be complicated in the presence of multiple binaries/tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.