galoisinc / cereal Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
If no objections, I can open a PR to fix this.
https://github.com/GaloisInc/cereal/blob/master/src/Data/Serialize/IEEE754.hs#L11 says that you use a method with ST
to cast between IEEE types, but that's been reverted: cereal again uses Foreign
to cast.
Ideally, instead of removing the comment, you could explain why you switched back? The git history (7cfabb6) makes it clear the decision was made for a good reason:
When serializing/deserializing a lot of nested structures with the intermediate array causes significant GC slowdowns.
...but a newcomer (e.g. me) might wonder why we use something that may be slower than other methods (e.g. http://hackage.haskell.org/package/reinterpret-cast claims to see 5x speedup)
Hello,
Is there any plan on supporting decimal128 ? I don't have the knowledge in binary formats to implement it myself, but I'm ready to help in other ways if need be.
Thanks
Hi,
I'm just wondering if there was any particular reason why instances of UTCTime
/Day
etc. from the time package, and StdGen
from the random package are missing...these packages are both considered pretty stable and central to the haskell ecosystem; is there a reason other than minimizing dependencies?
Using cereal 0.5.4.0, putting an Integer causes memory use to explode in an overlinear fashion to the size (in bits) of the Integer. Here's a table of memory usages (in bytes) with the following program:
weighEncode :: Weigh ()
weighEncode =
do forM_ ([0, 1000..10000] ++ [20000, 30000..100000] ++ [200000, 300000..1000000]) $
\x -> func (show x) weighExponent x
where weighExponent n = BS.length $ encode ((2^n) :: Integer)
memProf :: IO ()
memProf = mainWith weighEncode
Case Allocated GCs
0 4,688 0
1000 58,016 0
2000 125,600 0
3000 208,720 0
4000 307,408 0
5000 421,744 0
6000 551,776 1
7000 697,224 1
8000 858,472 1
9000 1,035,264 2
10000 1,227,728 2
20000 4,011,416 10
30000 8,357,160 19
40000 14,305,160 30
50000 21,776,120 44
60000 30,815,032 61
70000 41,414,856 80
80000 53,575,848 104
90000 67,299,584 129
100000 82,582,672 158
200000 321,357,784 602
300000 716,433,208 1,331
400000 1,267,721,672 2,341
500000 1,975,251,800 3,622
600000 2,839,080,936 5,150
700000 3,859,106,568 6,944
800000 5,035,376,032 9,034
900000 6,367,966,256 11,378
1000000 7,856,746,304 13,884
A 1e6-bit number in GHC fits in a few megabytes at most; there cannot be much reason why encoding it should take 8 gigabytes.
I may be missing something, but I would expect get
to be faster than put
. But it takes twice as long. Here's a simple test:
import Data.Serialize
main :: IO ()
main = do
let x = encode ([1..1000000] :: [Double])
(decode x :: Either String [Double]) `seq` return ()
And here's the profile (scroll down to encode/decode, then scroll right):
Wed Apr 9 13:09 2014 Time and Allocation Profiling Report (Final)
realtra-benchmark2 +RTS -p -RTS
total time = 28.11 secs (28105 ticks @ 1000 us, 1 processor)
total alloc = 11,402,369,328 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
>>=.\.ks' Data.Serialize.Get 25.4 20.1
>>=.\ Data.Serialize.Get 15.8 9.9
put Data.Serialize 9.6 12.8
roll.unstep Data.Serialize 6.1 11.9
unroll.step Data.Serialize 5.2 11.3
>>= Data.Serialize.Get 3.9 0.0
withSize.\ Data.Serialize.Builder 3.2 3.4
return.\ Data.Serialize.Get 3.0 0.0
put Data.Serialize 3.0 2.5
>> Data.Serialize.Put 2.7 3.2
fmap.\.ks' Data.Serialize.Get 2.5 8.8
getListOf.go Data.Serialize.Get 2.3 4.6
put Data.Serialize 2.2 2.0
return Data.Serialize.Get 2.1 0.0
unroll Data.Serialize 1.9 3.4
fmap.\ Data.Serialize.Get 1.7 1.3
getWord8 Data.Serialize.Get 1.3 0.0
getListOf Data.Serialize.Get 1.3 0.0
put Data.Serialize 1.1 1.3
main.x Main 0.5 1.5
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 53 0 0.0 0.0 100.0 100.0
CAF Main 105 0 0.0 0.0 100.0 100.0
main Main 106 1 0.0 0.0 100.0 100.0
main.x Main 117 1 0.5 1.5 32.4 43.3
encode Data.Serialize 118 1 0.0 0.0 31.9 41.8
sndS Data.Serialize.Put 126 1 0.0 0.0 0.0 0.0
put Data.Serialize 125 1 6.1 8.6 21.2 31.2
put Data.Serialize 137 1000000 0.6 1.0 15.0 22.6
put Data.Serialize 138 1000000 2.2 1.6 14.4 21.6
put Data.Serialize 184 1000000 0.2 0.4 0.6 1.1
put Data.Serialize 185 1000000 0.4 0.7 0.4 0.7
>>.(...) Data.Serialize.Put 161 1000000 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 162 1000000 0.0 0.0 0.0 0.0
>>.w' Data.Serialize.Put 160 1000000 0.1 0.0 0.1 0.0
mappend Data.Serialize.Builder 150 1000000 0.0 0.0 0.0 0.0
put.hi Data.Serialize 148 1000000 0.0 0.0 0.0 0.0
put Data.Serialize 147 1000000 1.4 1.4 10.2 17.6
unroll Data.Serialize 177 0 1.9 3.4 7.1 14.7
unroll.step Data.Serialize 178 8000000 5.2 11.3 5.2 11.3
put.sign Data.Serialize 172 1000000 0.3 0.1 0.3 0.1
>> Data.Serialize.Put 163 1000000 1.1 1.3 1.4 1.3
>>.(...) Data.Serialize.Put 174 1000000 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 175 1000000 0.0 0.0 0.0 0.0
>>.w' Data.Serialize.Put 173 1000000 0.1 0.0 0.1 0.0
>>.w Data.Serialize.Put 170 1000000 0.0 0.0 0.0 0.0
>>.(...) Data.Serialize.Put 167 1000000 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 168 1000000 0.0 0.0 0.0 0.0
mappend Data.Serialize.Builder 164 1000000 0.0 0.0 0.0 0.0
put.lo Data.Serialize 146 1000000 0.0 0.0 0.0 0.0
>> Data.Serialize.Put 139 2000000 0.9 1.3 1.2 1.3
>>.(...) Data.Serialize.Put 182 1000000 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 183 1000000 0.0 0.0 0.0 0.0
>>.w' Data.Serialize.Put 181 1000000 0.1 0.0 0.1 0.0
>>.w Data.Serialize.Put 149 1000000 0.0 0.0 0.0 0.0
>>.(...) Data.Serialize.Put 144 1000000 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 145 1000000 0.0 0.0 0.0 0.0
mappend Data.Serialize.Builder 140 1000000 0.0 0.0 0.0 0.0
sndS Data.Serialize.Put 130 9000001 0.1 0.0 0.1 0.0
unPut Data.Serialize.Put 129 9000001 0.0 0.0 0.0 0.0
mappend Data.Serialize.Builder 127 1000001 0.0 0.0 0.0 0.0
unPut Data.Serialize.Put 124 1 0.0 0.0 0.0 0.0
toByteString Data.Serialize.Builder 119 1 0.1 0.2 10.7 10.7
flush.\ Data.Serialize.Builder 200 1 0.0 0.0 0.0 0.0
put Data.Serialize 128 0 3.5 4.2 10.6 10.5
put Data.Serialize 141 0 0.0 0.0 1.7 1.4
put Data.Serialize 142 0 0.5 0.4 1.7 1.4
put Data.Serialize 187 0 0.0 0.0 0.3 0.2
put Data.Serialize 188 0 0.3 0.2 0.3 0.2
put Data.Serialize 165 0 0.2 0.2 0.5 0.5
>> Data.Serialize.Put 166 0 0.3 0.3 0.3 0.3
>> Data.Serialize.Put 143 0 0.4 0.3 0.4 0.3
withSize Data.Serialize.Builder 133 0 0.9 0.0 5.4 4.9
withSize.\ Data.Serialize.Builder 134 11000001 3.2 3.4 4.5 4.9
flush.\ Data.Serialize.Builder 192 763 0.0 0.2 0.0 0.2
put Data.Serialize 193 0 0.0 0.0 0.0 0.0
put Data.Serialize 194 0 0.0 0.0 0.0 0.0
put Data.Serialize 197 0 0.0 0.0 0.0 0.0
put Data.Serialize 195 0 0.0 0.0 0.0 0.0
put Data.Serialize 196 0 0.0 0.0 0.0 0.0
put Data.Serialize 157 0 0.0 0.0 1.3 1.3
put Data.Serialize 158 0 0.3 0.4 1.3 1.3
put Data.Serialize 189 0 0.0 0.0 0.4 0.4
put Data.Serialize 190 0 0.4 0.4 0.4 0.4
put Data.Serialize 159 0 0.6 0.4 0.6 0.4
runBuilder Data.Serialize.Builder 135 11000001 0.0 0.0 0.0 0.0
runBuilder Data.Serialize.Builder 123 1 0.0 0.0 0.0 0.0
decode Data.Serialize 107 1 0.0 0.0 67.6 56.7
get Data.Serialize 109 1 0.0 0.0 67.6 56.7
getListOf Data.Serialize.Get 110 1 1.3 0.0 67.6 56.7
getListOf.go Data.Serialize.Get 229 8000000 1.3 4.3 1.3 4.3
return Data.Serialize.Get 231 1000000 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 230 7000000 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 111 14000002 3.9 0.0 65.1 52.4
>>=.\ Data.Serialize.Get 112 49000004 15.8 9.9 61.2 52.4
get Data.Serialize 242 0 0.1 0.0 0.1 0.0
get Data.Serialize 239 0 0.1 0.0 0.1 0.0
getWord8 Data.Serialize.Get 216 0 1.3 0.0 2.9 1.3
fmap Data.Serialize.Get 218 0 0.5 0.0 1.6 1.3
fmap.\ Data.Serialize.Get 219 9000000 1.1 1.3 1.1 1.3
unGet Data.Serialize.Get 220 9000000 0.0 0.0 0.0 0.0
get Data.Serialize 212 0 0.1 0.0 0.1 0.0
get Data.Serialize 209 0 0.1 0.0 0.1 0.0
>>=.\.ks' Data.Serialize.Get 201 49000004 25.4 20.1 41.9 41.3
get.v Data.Serialize 234 1000000 0.9 0.0 7.1 11.9
roll.unstep Data.Serialize 236 7000000 6.1 11.9 6.1 11.9
get Data.Serialize 225 0 0.1 0.0 0.1 0.0
getListOf.go Data.Serialize.Get 206 1000001 1.0 0.4 1.3 0.4
return Data.Serialize.Get 232 1 0.1 0.0 0.3 0.0
return.\ Data.Serialize.Get 233 1000001 0.1 0.0 0.1 0.0
finalK Data.Serialize.Get 244 1 0.0 0.0 0.0 0.0
shiftl_w64 Data.Serialize.Get 205 14000007 0.1 0.0 0.1 0.0
return Data.Serialize.Get 203 26000002 2.0 0.0 7.9 8.8
return.\ Data.Serialize.Get 204 26000002 2.9 0.0 5.9 8.8
getWord8 Data.Serialize.Get 221 0 0.0 0.0 3.1 8.8
fmap Data.Serialize.Get 222 0 0.0 0.0 3.1 8.8
fmap.\ Data.Serialize.Get 223 0 0.6 0.0 3.1 8.8
fmap.\.ks' Data.Serialize.Get 224 9000000 2.5 8.8 2.5 8.8
unGet Data.Serialize.Get 202 49000004 0.0 0.0 0.0 0.0
getWord64be Data.Serialize.Get 116 0 0.3 0.0 0.3 0.0
unGet Data.Serialize.Get 113 49000004 0.0 0.0 0.0 0.0
unGet Data.Serialize.Get 108 1 0.0 0.0 0.0 0.0
CAF Data.Serialize 104 0 0.0 0.0 0.0 0.0
get Data.Serialize 243 1 0.0 0.0 0.0 0.0
get Data.Serialize 240 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 241 1 0.0 0.0 0.0 0.0
get Data.Serialize 237 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 238 1 0.0 0.0 0.0 0.0
roll Data.Serialize 235 1 0.0 0.0 0.0 0.0
get Data.Serialize 213 1 0.0 0.0 0.0 0.0
get Data.Serialize 210 1 0.0 0.0 0.0 0.0
get Data.Serialize 226 1 0.0 0.0 0.0 0.0
getListOf Data.Serialize.Get 227 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 228 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 211 2 0.0 0.0 0.0 0.0
get Data.Serialize 207 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 208 1 0.0 0.0 0.0 0.0
put Data.Serialize 186 1 0.0 0.0 0.0 0.0
unroll Data.Serialize 176 1 0.0 0.0 0.0 0.0
put Data.Serialize 169 1 0.0 0.0 0.0 0.0
put Data.Serialize 151 0 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 156 1 0.0 0.0 0.0 0.0
>> Data.Serialize.Put 152 0 0.0 0.0 0.0 0.0
>>.w Data.Serialize.Put 155 1 0.0 0.0 0.0 0.0
>>.(...) Data.Serialize.Put 154 1 0.0 0.0 0.0 0.0
unPut Data.Serialize.Put 153 1 0.0 0.0 0.0 0.0
put Data.Serialize 131 1 0.0 0.0 0.0 0.0
mempty Data.Serialize.Builder 198 1 0.0 0.0 0.0 0.0
mappend Data.Serialize.Builder 136 1 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 132 1 0.0 0.0 0.0 0.0
CAF Data.Serialize.Put 103 0 0.0 0.0 0.0 0.0
mempty Data.Serialize.Builder 180 1 0.0 0.0 0.0 0.0
mappend Data.Serialize.Builder 179 1 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 171 2 0.0 0.0 0.0 0.0
CAF Data.Serialize.Get 102 0 0.0 0.0 0.0 0.0
getWord8 Data.Serialize.Get 214 1 0.0 0.0 0.0 0.0
fmap Data.Serialize.Get 217 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 215 2 0.0 0.0 0.0 0.0
getWord64be Data.Serialize.Get 114 1 0.0 0.0 0.0 0.0
>>= Data.Serialize.Get 115 2 0.0 0.0 0.0 0.0
CAF Data.Serialize.Builder 101 0 0.0 0.0 0.0 0.0
flush Data.Serialize.Builder 191 1 0.0 0.0 0.0 0.0
defaultSize Data.Serialize.Builder 120 1 0.0 0.0 0.0 0.0
defaultSize.overhead Data.Serialize.Builder 122 1 0.0 0.0 0.0 0.0
defaultSize.k Data.Serialize.Builder 121 1 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 93 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 83 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 78 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 69 0 0.0 0.0 0.0 0.0
Main use I have in mind is to use a mutable state (ST monad) to perform object
initialization. It's not possible now but sometimes desirable for performance
reasons. I tried to implement Get as a transformer and there weren't any
trouble. Just a lot of work.
There are some API choices to be made. First of all GetT
require change of
Result data type to:
data Result m r = Fail String B.ByteString
| Partial (B.ByteString -> m (Result m r))
| Done r B.ByteString
So it's not possible to export GetT from Data.Serialize.Get. It will break old
code and it's not convenient to work with when GetT is not used as
transformer. Same applies to runGetWhatever functions.
So modules Data.Serialize.Get
and Data.Serialize.Get.Trans
have to
created. Second point whether Data.Serialize.Get
should just reexport generic
function from Data.Serialize.Get.Trans
or export functions with specialized
type signatures.
I like first variant more because definition of Serialize
monad should be
changed too:
class Serialize a where
get :: Monad m => GetT m a
put :: a -> Put
But module Data.Serialize
reexport Data.Serialize.Get
. If former export
type-restricted functions it wouldn't be possible to define Serialize
instances.
There is no way to get out Put a -> (a, Builder)
in a single call.
This is also a wart in the design of binary
.
Hello,
I have a question and I cannot find somewhere the answer.
Cereal performs binary serialization, which sounds to me that it serializes the primitive types as they are in memory.
So for example, when a float is serialized, the bytes written will represent the float with the standard that the hardware uses, like IEEE 754. This means that the bytestring cannot be loaded by another machine that uses some other standard and not IEEE 754.
Is my understanding correct?
And probably other functions, given the large number of B.append
s in Data.Serialize.Get
.
The practical consequence is that deserializing using many chunks is quadratic in the number of chunks.
See kolmodin/binary#76 for a related problem in binary
.
This is example code of inconsistency in cereal-0.4.1.1 with bytestring-0.10.4.0.
Bytestrings are concatinated from the same bytestring-chunk list.
import Control.Applicative
import Data.Word
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LB
import Data.Serialize.Get
data Bug =
Bug Word8 [(Word8, Word8)]
deriving (Eq, Show)
chunks :: [BS.ByteString]
chunks = [BS.pack [0, 1], BS.pack [2], BS.pack [3], BS.pack [4]]
getPs :: Get [(Word8, Word8)]
getPs = do
e <- isEmpty
if e
then return []
else do f <- getWord8
s <- getWord8
tl <- getPs
return $ (f, s) : tl
getBug :: Get Bug
getBug = Bug <$> getWord8 <*> getPs
lazy :: [BS.ByteString] -> Either String Bug
lazy =
runGetLazy
getBug
. LB.fromChunks
eager :: [BS.ByteString] -> Either String Bug
eager =
runGet
getBug
. BS.concat
expect :: Bool
expect = lazy chunks == eager chunks
main :: IO ()
main = do
print expect
print $ lazy chunks
print $ eager chunks
Execution output is as follows.
False
Right (Bug 0 [(1,2)])
Right (Bug 0 [(1,2),(3,4)])
I think the same result is expected in this case, isn't it?
I am using Generic
support in cereal to serialize my data types to binary format, which works fine. However I am now facing a thorny issue related to migration: I have old representations of my data types that need to be deserialised to a type with a (slightly) different structure. Note that I am storing the known version along with the serialized form of the data, which allows me to know which transformation to apply when deserializing (assuming a monotonically increasing version number).
I previously used to serialise to/from JSON using aeson, which makes it easy to handle this migration issue: First deserialise to a JSON value then apply the needed changes to reconstruct a value consistent with your the new structure of the type.
However, I am stuck applying the same strategy with cereal because it is not clear to me what's the underlying bytestring representation of generic types. I understand that it should read enough bytes to represent my sum of product type then dispatch on the read number according to which constructors and walk the generic tree. Am I correct? Any idea on how to do this in the simplest possible way?
src/Data/Serialize/Get.hs:726:36: error:
• Couldn't match expected type ‘Word16#’ with actual type ‘Word#’
• In the first argument of ‘W16#’, namely
‘(w `uncheckedShiftL#` i)’
In the expression: W16# (w `uncheckedShiftL#` i)
In an equation for ‘shiftl_w16’:
shiftl_w16 (W16# w) (I# i) = W16# (w `uncheckedShiftL#` i)
|
726 | shiftl_w16 (W16# w) (I# i) = W16# (w `uncheckedShiftL#` i)
| ^^^^^^^^^^^^^^^^^^^^^^^^
src/Data/Serialize/Get.hs:727:36: error:
• Couldn't match expected type ‘Word32#’ with actual type ‘Word#’
• In the first argument of ‘W32#’, namely
‘(w `uncheckedShiftL#` i)’
In the expression: W32# (w `uncheckedShiftL#` i)
In an equation for ‘shiftl_w32’:
shiftl_w32 (W32# w) (I# i) = W32# (w `uncheckedShiftL#` i)
|
727 | shiftl_w32 (W32# w) (I# i) = W32# (w `uncheckedShiftL#` i)
| ^^^^^^^^^^^^^^^^^^^^^^^^
https://hackage.haskell.org/package/cereal-0.5.8.1/docs/Data-Serialize-Get.html#v:runGetState
It says that the function returns the number of consumed bytes, but it's not
Come on :|
#61 added Int
-type helpers such as putInthost
, however it did not also add corresponding getters. These should be added for symmetry.
The documentation of GHC.Generics
says this:
However, users /should not rely on a specific nesting strategy/ for :+: and :*: being used.
From reading the code it seems to me as if the code depends on the structure being balanced.
I was able to construct the following example, which breaks the logic of gPut
by nesting all constructor to the left:
{-# LANGUAGE DataKinds, TypeOperators #-}
import Data.Serialize
import GHC.Generics
type MyGeneric a = D1 ('MetaData "Foo" "Ghci2" "interactive" 'False) (((C1 ('MetaCons "Foo1" 'PrefixI 'False) U1 :+: C1 ('MetaCons "Foo2" 'PrefixI 'False) U1) :+: C1 ('MetaCons "Foo3" 'PrefixI 'False) U1) :+: C1 ('MetaCons "Foo1" 'PrefixI 'False) U1) a
x :: MyGeneric a
x = M1 $ L1 $ L1 $ L1 $ M1 U1
y :: MyGeneric a
y = M1 $ L1 $ L1 $ R1 $ M1 U1
test :: Either String (Bool, Bool)
test = (\z -> (z == x, z == y)) <$> runGet gGet (runPut $ gPut x)
main :: IO ()
main = print test
When I run the code it outputs Right (False,True)
, which shows that the encoding and decoding of x
isn't equal to x
itself, but equal to y
.
I've done some porting of attoparsec's Parser
to Get
: https://github.com/SX91/cereal/tree/buffer-pos. Please, take a look. Performance is almost the same though, at least with included benchs.
Overlaps with #62.
PS: I've taken Buffer
module from Attoparsec package and modified it a bit. Probably need to add some copyrights.
Recently I'm hacking into cereal codebase to build a takeTill
, but i don't quite get why runGetChunk
and data More = Complete | Incomplete (Maybe Int)
existed. it seems runGetPartial
just do fine without provide any remaining estimate, and More
field force me to calculate remaining estimate every chunk. So the quesiton is, what's the use case of runGetChunk
?
Thanks for cereal, I'm using it in a set of binary representation libraries with alternate typeclasses, and it's super handy.
I've found myself needing a dual to Put.putByteString
for getters, where there's no length prefix, so all you can do is consume the whole remaining bytestring and spit it back out. Data.Serialize.Get
doesn't export some of the underlying types or helpers needed for writing this, so users aren't able to define it themselves (efficiently, anyway).
I've forked and written something that apparently works, but I don't know how the buffer and other parts function.
getByteStringEOF :: Get B.ByteString
getByteStringEOF =
Get (\s0 b0 m0 w _ k -> k B.empty Nothing m0 w (mergeBuf s0 b0))
where mergeBuf s0 = \case Nothing -> s0
Just b -> b <> s0
-- adapted from an existing function
-- but it doesn't consume the input, just return it
get :: Get B.ByteString
get = Get (\s0 b0 m0 w _ k -> k s0 b0 m0 w s0)
Would it be useful to for Data.Serialize.Get
to export something like this? It wouldn't be used in the typeclass-- or anywhere else, but it's convenient to piggyback off cereal's efficient internal parser rather than write my own.
There are many new and faster implementations of the Builder type that Put uses. It would be really nice to rewrite Put in terms of one of them.
This commit seems to have removed the code to define the "GENERICS". But the code still contains ifdefs on it.
(Edit, forgot to link to the commit: b18e6af)
I am not sure how any code relying on this machinery works at the moment -- they cannot as far as I can see.
Fix should be either to define GENERICS somewhere else or remove the ifdefs altogether.
It would be useful if the Fail
constructor included the unconsumed input at the time of failure. This would allow me to create better failure messages, which in turn would help debugging parsers.
$ cabal install -w ghc-9.4.0.20220523 cereal-0.5.8.2
...
Building library for cereal-0.5.8.2..
[1 of 4] Compiling Data.Serialize.Get ( src/Data/Serialize/Get.hs, dist/build/Data/Serialize/Get.o, dist/build/Data/Serialize/Get.dyn_o )
src/Data/Serialize/Get.hs:744:36: error:
• Couldn't match expected type ‘Word64#’ with actual type ‘Word#’
• In the first argument of ‘W64#’, namely
‘(w `uncheckedShiftL#` i)’
In the expression: W64# (w `uncheckedShiftL#` i)
In an equation for ‘shiftl_w64’:
shiftl_w64 (W64# w) (I# i) = W64# (w `uncheckedShiftL#` i)
|
744 | shiftl_w64 (W64# w) (I# i) = W64# (w `uncheckedShiftL#` i)
| ^^^^^^^^^^^^^^^
Error location:
cereal/src/Data/Serialize/Get.hs
Line 744 in b2ee49b
Reprise of:
In my role as hackage trustee, I revised cereal-0.5.8.2
with upper bound base < 4.17
to exclude compilation with GHC 9.4:
A release (e.g. of 0.5.8.3) is warranted to comply with GHC 9.4.
cereal
triggers a space leak when encoding lists. The source of the problem is encodeListOf
, which computes the length of the list, which brings the entire list into memory.
encodeListOf :: (a -> Builder) -> [a] -> Builder
encodeListOf f = -- allow inlining with just a single argument
\xs -> execPut (putWord64be (fromIntegral $ length xs)) `mappend`
foldMap f xs
The way I get around this in my own code is a chunked list representation. The following code writes to disk directly just because I'm not as familiar with cereal
's internals, but I'm reasonably sure you could translate it into a pure equivalent:
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad
import qualified Data.ByteString.Char8 as B
import Data.List.Split
import Data.Serialize
import Foreign.Safe
import System.IO
list = [1..10000000] :: [Int]
main = do
-- B.writeFile "test.dat" $ encode list
withFile "test.dat" WriteMode $ \h -> encodeList 100 encodeS list h
-- xs <- withFile "test.dat" ReadMode $ \h -> decodeList decodeS h
-- print $ length (xs :: [Int])
encodeS :: forall a . (Storable a) => a -> Handle -> IO ()
encodeS x hdl = with x $ \ptr -> hPutBuf hdl ptr (sizeOf (undefined :: a))
decodeS :: forall a . (Storable a) => Handle -> IO a
decodeS hdl = alloca $ \ptr -> do
hGetBuf hdl ptr (sizeOf (undefined :: a))
peek ptr
markLast [] = (Just 0, []):[]
markLast (l:[]) = (Just $ length l, l):[]
markLast (l:ls) = (Nothing , l):markLast ls
encodeList :: Int -> (a -> Handle -> IO ()) -> [a] -> Handle -> IO ()
encodeList n e as hdl = do
encodeS n hdl
forM_ (markLast $ chunk n as) $ \(m, as') -> do
case m of
Nothing -> encodeS False hdl
Just len -> do
encodeS True hdl
encodeS len hdl
forM_ as' $ \a -> e a hdl
decodeList :: (Handle -> IO a) -> Handle -> IO [a]
decodeList d hdl = do
n <- decodeS hdl
let loop = do
last <- decodeS hdl
if last
then do
len <- decodeS hdl
replicateM len (d hdl)
else do
as <- replicateM n (d hdl)
fmap (as ++) loop
loop
The above implementation runs in constant space (and much faster because of direct IO
, but I'll create a separate ticket for that).
I wanted to ask if you could add something like the above chunked implementation that runs in constant space in Data.Serialize.Get
and Data.Serialize.Put
(i.e. perhaps "getChunkedListOf
" or "putChunkedListOf
").
Also, optionally, you could presumably pick a fast default value of n
(I found roughly 100 worked best on my machine) and replace the default instance for encoding lists with that one, although if you choose not to (i.e. for binary compatibility with previous cereal
versions), that would be just fine with me and I would be happy to just use the named versions instead.
testIsEmpty :: IO ()
testIsEmpty = do
assertEqual "is empty (works)" (Right True) (runGet isEmpty mempty)
assertEqual "is empty many (don't work)" (Right [True]) (runGet (many isEmpty) mempty)
The is empty many results in this:
see if empty is empty: unit: internal error: Unable to commit 1048576 bytes of memory
(GHC version 9.0.2 for x86_64_unknown_linux)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
Aborted (core dumped)
I'm on cereal-0.5.8.2
Failure case:
import qualified Data.Serialize as C
import qualified Data.ByteString.Lazy as B
import Control.Applicative ((<|>))
C.runGetLazy (C.getWord8 <|> return 0) B.empty
(result:) Left "Failed reading: Internal error: unexpected end of input"
Hi, if it's no trouble, do you mind cutting a new release? The latest version's tests don't pass since 00ea19f is unreleased.
It's not a huge deal, I can work around it, it's just that default builds using Nix fail due to the latest test suite not passing, which seems like a needless issue for people to run into.
After updating the docs for a recent issue, it seems that much of the docs just don't make sense anymore. It would be nice to update them.
Currently, it takes 25 bytes to store a 64-bit Double
.
λ Data.ByteString.length $ Data.Serialize.encode (1.75 :: Double)
25
Right now, the current behavior is to use GHC.Float.decodeFloat
, which a typical 64-bit Double
into an Integer
(typically, 17 bytes at the relevant size) and an Int
(8 bytes) before serializing them. This leads to a 3.125x increase in size if you're storing, say, a large list or array of Doubles
. For 32-bit Float
, the footprint is 13 bytes, for a 3.25x increase.
I'm not aware of the history behind decisions that were made. Is there a reason why Double
and Float
are stored (when the Serialize
instance for them is used) as an (Integer, Int)
pair rather than as raw binary? Is there a safety-related, corner-case reason for not having the default to the more efficient alternative?
While lookAhead
does not consume the input, it does update the bytesRead
.
test = do
x <- bytesRead
_ <- lookAhead getInt64be
y <- bytesRead
return (x, y)
> runGet test (encode (2 :: Int)) -- result: Right (0, 8)
I suspect this is not the desired behaviour, and that the above example should return (0, 0)
, to reflect that no bytes of the input have been consumed. If this is the desired behaviour, it should probably be documented.
This is a separate issue from the previous one I just posted, but uses the same example code.
I find that a naive serialization straight to disk is approximately 5x faster on my machine than the translation to a ByteString
and then writing the ByteString
to disk. Using the same sample code as the previous ticket:
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad
import qualified Data.ByteString.Char8 as B
import Data.List.Split
import Data.Serialize
import Foreign.Safe
import System.IO
list = [1..10000000] :: [Int]
main = do
-- B.writeFile "test.dat" $ encode list
withFile "test.dat" WriteMode $ \h -> encodeList 100 encodeS list h
-- xs <- withFile "test.dat" ReadMode $ \h -> decodeList decodeS h
-- print $ length (xs :: [Int])
encodeS :: forall a . (Storable a) => a -> Handle -> IO ()
encodeS x hdl = with x $ \ptr -> hPutBuf hdl ptr (sizeOf (undefined :: a))
decodeS :: forall a . (Storable a) => Handle -> IO a
decodeS hdl = alloca $ \ptr -> do
hGetBuf hdl ptr (sizeOf (undefined :: a))
peek ptr
markLast [] = (Just 0, []):[]
markLast (l:[]) = (Just $ length l, l):[]
markLast (l:ls) = (Nothing , l):markLast ls
encodeList :: Int -> (a -> Handle -> IO ()) -> [a] -> Handle -> IO ()
encodeList n e as hdl = do
encodeS n hdl
forM_ (markLast $ chunk n as) $ \(m, as') -> do
case m of
Nothing -> encodeS False hdl
Just len -> do
encodeS True hdl
encodeS len hdl
forM_ as' $ \a -> e a hdl
decodeList :: (Handle -> IO a) -> Handle -> IO [a]
decodeList d hdl = do
n <- decodeS hdl
let loop = do
last <- decodeS hdl
if last
then do
len <- decodeS hdl
replicateM len (d hdl)
else do
as <- replicateM n (d hdl)
fmap (as ++) loop
loop
Using cereal
, I get:
time ./cereal
real 0m2.327s
user 0m2.080s
sys 0m0.250s
Using the straight-to-disk naive implementation, I get:
time ./cereal
real 0m0.443s
user 0m0.410s
sys 0m0.030s
I just wanted to ask if there is any possibility of supporting direct IO
encoding to get this speed increase or if there is another package that already provides this functionality.
I can understand the importance of translating to the intermediate ByteString
first since there are many other uses of serialization other than disk IO
or even Handle
s. However, the performance gains from specializing to Handle
-based encoding seem to warrant some package providing this functionality.
This is not really a critical feature for me since I've already written all this stuff up for myself, but it seems like something useful for the Haskell community to have. If you choose not to implement it, then that's okay and I will just use it internally in my code until I think it's polished enough to release as a package.
I tried building cereal
with the alpha release of the ghc-8.8.1
and the current version results in the following error:
Configuring cereal-0.5.8.0...
Preprocessing library for cereal-0.5.8.0..
Building library for cereal-0.5.8.0..
[1 of 4] Compiling Data.Serialize.Get ( src/Data/Serialize/Get.hs, .stack-work/dist/x86_64-linux/Cabal-2.5.0.0/build/Data/Serialize/Get.o )
/tmp/stack7255/cereal-0.5.8.0/src/Data/Serialize/Get.hs:229:5: error:
‘fail’ is not a (visible) method of class ‘Monad’
|
229 | fail = Fail.fail
| ^^^^
/tmp/stack7255/cereal-0.5.8.0/src/Data/Serialize/Get.hs:230:16: error:
The INLINE pragma for ‘fail’ lacks an accompanying binding
(The INLINE pragma must be given where ‘fail’ is declared)
|
230 | {-# INLINE fail #-}
| ^^^^
It appears that the code has been updated in the repository to be compatible with base-4.13.0.0
. Can a new version of cereal
be uploaded to Hackage to help others migrate their code to base-4.13.0.0
?
Following code involves two examples.
First one is a0 which is a broken associativity example on lazy parsing.
Second one is d0 which is a broken distributivity example on lazy parsing.
a0' is a strict version of a0, and d0' is a strict version of d0.
Strict versions are not broken.
import Control.Applicative
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LB
import Data.Serialize.Get
-- Compare results with ignoring error message string
testGetLazyEq :: Eq a => Get a -> Get a -> LB.ByteString -> (Bool, Either String a, Either String a)
testGetLazyEq x0 y0 in' = (toMaybe x == toMaybe y, x, y)
where
toMaybe = either (const $ Nothing) Just
x = runGetLazy x0 in'
y = runGetLazy y0 in'
testGetEq :: Eq a => Get a -> Get a -> BS.ByteString -> (Bool, Either String a, Either String a)
testGetEq x0 y0 in' = (toMaybe x == toMaybe y, x, y)
where
toMaybe = either (const $ Nothing) Just
x = runGet x0 in'
y = runGet y0 in'
-- Test associative law with 2-byte lazy input
associative2 :: Eq a => Get a -> Get a -> Get a -> (Bool, Either String a, Either String a)
associative2 a b c =
testGetLazyEq
(a <|> (b <|> c))
((a <|> b) <|> c)
$ LB.fromChunks $ map BS.pack [[0], [1]]
-- Test associative law with 2-byte lazy input
associative2' :: Eq a => Get a -> Get a -> Get a -> (Bool, Either String a, Either String a)
associative2' a b c =
testGetEq
(a <|> (b <|> c))
((a <|> b) <|> c)
$ BS.concat $ map BS.pack [[0], [1]]
-- Test associative law with
-- a = skip 6
-- b = lookAhead getWord64be *> pure ()
-- c = getWord16le *> pure ()
a0 :: IO ()
a0 =
print $
associative2
(skip 6)
(lookAhead getWord64be *> pure ())
(getWord16le *> pure ())
-- Test distributive law with 3-byte lazy input
distributive3 :: Eq b => Get a -> Get b -> Get b -> (Bool, Either String b, Either String b)
distributive3 a b c =
testGetLazyEq
(a *> (b <|> c))
(a *> b <|> a *> c)
$ LB.fromChunks $ map BS.pack [[0], [1], [2]]
distributive3' :: Eq b => Get a -> Get b -> Get b -> (Bool, Either String b, Either String b)
distributive3' a b c =
testGetEq
(a *> (b <|> c))
(a *> b <|> a *> c)
$ BS.concat $ map BS.pack [[0], [1], [2]]
-- Test distributive law with
-- a = getWord16le
-- b = lookAhead getWord16le *> pure ()
-- c = lookAhead getWord8 *> pure ()
d0 :: IO ()
d0 =
print $
distributive3
getWord16le
(lookAhead getWord16le *> pure ())
(lookAhead getWord8 *> pure ())
main :: IO ()
main = do
a0
d0
a0' :: IO ()
a0' =
print $
associative2'
(skip 6)
(lookAhead getWord64be *> pure ())
(getWord16le *> pure ())
d0' :: IO ()
d0' =
print $
distributive3'
getWord16le
(lookAhead getWord16le *> pure ())
(lookAhead getWord8 *> pure ())
The result output is as follows. Both results are expected to be True.
(False,Right (),Left "too few bytes\nFrom:\tdemandInput\n\n")
(False,Right (),Left "too few bytes\nFrom:\tdemandInput\n\n")
Lazy-parsers has this issue. Strict parsers returns consistent results.
It'd be great if there was a Serialize instance for Data.Text.
Should uncheckedSkip
's document be Skip ahead up to n bytes...
? because it seems it will not skip enough bytes if current chunk's length is not enough. I also suggest to make unchecked..
version's document be more explicit about this behavior:
-- | Skip ahead up to @n@ bytes until end of this chunk. No error if there isn't enough bytes.
uncheckedSkip :: Int -> Get ()
uncheckedSkip n = do
s <- get
put (B.drop n s)
-- | Get the next up to @n@ bytes as a ByteString until end of this chunk, without consuming them.
uncheckedLookAhead :: Int -> Get B.ByteString
uncheckedLookAhead n = do
s <- get
return (B.take n s)
Does this make sense to you?
When I build cereal-0.5.7.0
with ghc844
I get a compiler error:
src/Data/Serialize.hs:290:9: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for ‘findNr’: Patterns not matched: _ _
|
290 | findNr lo hi
| ^^^^^^^^^^^^...
[1 of 4] Compiling Data.Serialize.Get ( src/Data/Serialize/Get.hs, dist/build/Data/Serialize/G
et.p_o )
[2 of 4] Compiling Data.Serialize.Put ( src/Data/Serialize/Put.hs, dist/build/Data/Serialize/P
ut.p_o )
[3 of 4] Compiling Data.Serialize.IEEE754 ( src/Data/Serialize/IEEE754.hs, dist/build/Data/Ser
ialize/IEEE754.p_o )
[4 of 4] Compiling Data.Serialize ( src/Data/Serialize.hs, dist/build/Data/Serialize.p_o )
src/Data/Serialize.hs:290:9: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for ‘findNr’: Patterns not matched: _ _
|
290 | findNr lo hi
| ^^^^^^^^^^^^...
Preprocessing test suite 'test-cereal' for cereal-0.5.7.0..
Building test suite 'test-cereal' for cereal-0.5.7.0..
[1 of 3] Compiling GetTests ( tests/GetTests.hs, dist/build/test-cereal/test-cereal-tm
p/GetTests.o )
[2 of 3] Compiling RoundTrip ( tests/RoundTrip.hs, dist/build/test-cereal/test-cereal-t
mp/RoundTrip.o )
tests/RoundTrip.hs:33:12: error:
• The constructor ‘Success’ should have 6 arguments, but has been given 3
• In the pattern: Success _ _ _
In an equation for ‘RoundTrip.isSuccess’:
RoundTrip.isSuccess (Success _ _ _) = True
|
33 | isSuccess (Success _ _ _) = True
| ^^^^^^^^^^^^^
Or am I missing a reason why this wouldn't work?
For benefits, compare mgsloan/store#101.
When I attempt to decode a bytestring that contains bad character sets, the program crashes with the following error:
Prelude.chr: bad argument: 3077277
Would it make sense for decode
to catch this error and return Left "Invalid character set", if this happens?
I found myself wanting this instance trying to serialise a trees-that-grow-style structure, where one constructor has an arg of Void
.
If this is likely to be accepted, I'll cook up a PR.
Hello,
I'm working on a module for Erlang's external term format, using cereal
and it looks like some protocol constraints are impossible to express using Haskell type system. For example:
data ErlType = ErlBinary ByteString -- BINARY_EXT
According to the specs, BINARY_EXT
s length is encoded as Word32
, but there's no way of enforcing this on the type level. For cases like this, it would be nice to have some kind of error reporting in the Put
monad. What do you think?
Is this intentionally absent to prevent the additional dependency?
In Data.Serialize.Put, there are a number of functions for putting Word
, Word64
, etc, but the equivalent functions for working on Int
s are missing. Would you take a PR adding these?
Attempting to deserialize a Ratio
where the denominator is 0 leads to a runtime error instead of failing deserialization. Since the Ratio
type disallows a denominator of 0, it would probably be better that deserialization should fail.
Actual behaviour:
Prelude Data.Serialize Data.Ratio Data.Word> decode "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL" :: Either String (Ratio Word64)
Right *** Exception: Ratio has zero denominator
Expected behaviour:
Prelude Data.Serialize Data.Ratio Data.Word> decode "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL" :: Either String (Ratio Word64)
Left "ratio has zero denominator"
I get a strange error on fail, even in the most trivial cases:
data Test = Test { someVal :: ByteString } deriving (Show, Generic)
instance Serialize Test where
get = fail "test"
testtest :: Either String Test
testtest = decode "B6034fooobaarbazbla"
Output: Left "Failed reading: test\nEmpty call stack\n"
It would be helpful for me to have a bytesRead function, like the one in binary, that produces a count of the number of bytes that have been consumed from input. This would be useful for deserializing formats that refer to previous byte offsets in the input, such as DNS messages that contain compressed domain names.
A complementary bytesWritten function would also be helpful for writing such formats. I have not been able to find a bytesWritten feature in any serialization library that I have looked at.
Also, I want to say thank you for publishing such a useful library!
Optimally, these are trivial:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
deriving instance Serialize a => Serialize (Const a)
deriving instance Serialize a => Serialize (Identity a)
But I have no idea what complications might be introduced by the need for legacy support; base >= 4.4
is a long, long time ago!
Data.HashMap.Map
is an instance of Data
, so this should be easy.
There was no README so I am not sure where to ask for help (please redirect me if it's not correct)
I have these long datatypes (that I hope to generate from /include/uapi/linux/inet_diag.h)
data IDiagExtension = DiagTcpInfo {
tcpi_state :: Word8,
tcpi_ca_state :: Word8,
tcpi_retransmits :: Word8,
tcpi_probes :: Word8,
tcpi_backoff :: Word8,
tcpi_options :: Word8,
tcpi_wscales :: Word8,
-- tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4 :: Word8,
tcpi_rto :: Word32,
tcpi_ato :: Word32,
tcpi_snd_mss :: Word32,
tcpi_rcv_mss :: Word32,
tcpi_unacked :: Word32,
tcpi_sacked :: Word32,
tcpi_lost :: Word32,
tcpi_retrans :: Word32,
tcpi_fackets :: Word32,
-- Time
tcpi_last_data_sent :: Word32,
-- Not remembered, sorr
tcpi_last_ack_sent :: Word32,
tcpi_last_data_recv :: Word32,
tcpi_last_ack_recv :: Word32,
-- Metric
tcpi_pmtu :: Word32,
tcpi_rcv_ssthresh :: Word32,
tcpi_rtt :: Word32,
tcpi_rttvar :: Word32,
tcpi_snd_ssthresh :: Word32,
tcpi_snd_cwnd :: Word32,
tcpi_advmss :: Word32,
tcpi_reordering :: Word32,
tcpi_rcv_rtt :: Word32,
tcpi_rcv_space :: Word32,
tcpi_total_retrans :: Word32
} | Meminfo {
idiag_rmem :: Word32
, idiag_wmem :: Word32
, idiag_fmem :: Word32
, idiag_tmem :: Word32
} | TcpVegasInfo {
tcpInfoVegasEnabled :: Word32
, tcpInfoRttCount :: Word32
, tcpInfoRtt :: Word32
, tcpInfoMinrtt :: Word32
} | CongInfo String deriving (Show, Generic, Serialize)
that "derive" Serialize
with the hope that I don't have to write the serialization/deserialization manually (since I send/receive messages over netlink-hs Ongy/netlink-hs#9 that uses cereal)
My code to deserialize, the key (e.g., InetDiagCong) indicate which constructor to use: (e.g., CongInfo)
loadExtension :: Int -> ByteString -> Maybe IDiagExtension
loadExtension key value =
case toEnum key of
InetDiagCong -> case decode value of
Right x -> Just $ x
Left err -> error $ "DiagCong error " ++ err
-- _ -> case runGet (getGet 42) value of
_ -> case decode value of
Right x -> Just x
-- Left err -> error $ "fourre-tout error " ++ err
Left err -> Nothing
but it generates a few errors such as
daemon: DiagCong error too few bytes
From: demandInput
CallStack (from HasCallStack):
error, called at IDiag.hs:449:37 in main:IDiag
or
"Unknown encoding for constructor"
or
Error too few bytes
Though I've made huge progress in learning haskell, I am still a beginner so I might miss sthg obvious. The code is here https://github.com/teto/netlink_pm/blob/c5d2254f4c87503b31e8b1f00859203bd86ad2f8/hs/IDiag.hs#L413 . Like when calling decode
can cereal select the appropriate IDiagExtension Constructor ?
When I had the "Unknown encoding for constructor", happens for the InetDiagCong
case, I expect it to call the CongInfo
constructor. The string contains only ASCII characters (Word8 characters), any idea how to fix that ?
I can open a PR if that's ok
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.