heremaps / flatdata Goto Github PK
View Code? Open in Web Editor NEWWrite-once, read-many, minimal overhead binary structured file format.
License: Apache License 2.0
Write-once, read-many, minimal overhead binary structured file format.
License: Apache License 2.0
I think it might make sense to test multiple platforms in CI, in-order to be able to support them officially and make sure it builds correctly.
I suppose testing on Ubuntu LTS 16.04 and CentOS 7 would already be enough to begin with.
Oftentimes flatdata vectors contain a sentinel in the end. It is easy to forget excluding it when iterating.
One idea would be to add an annitation @sentinel
that the generator could use to create two getters for the archive:
get_xyz() const
and get_xyz_with_sentinel() const
When trying to bind implicitly a vector and a multivector the generator fails. Example:
@bound_implicitly( characters: vertices, character_data )
archive Graph {
vertices : vector< Character >;
character_data: multivector< 32, Nickname, Description >;
}
Error:
12:41:29 - CRITICAL - app:67 - Error reading schema: Referring to incorrect type. Expected <class 'generator.tree.nodes.resources.vector.Vector'>, actual <class 'generator.tree.nodes.resources.multivector.Multivector'>
Is this expected? As far as I understand, it should be possible to bind both. In my example, I am referencing the elements of vertices and character_data with the same index.
A small sidenote: probably, the error should also refer to the line in flatdata's source, instead of the line in the source code of the generator.
When trying to open an arbitrary archive with subarchives, flatdata will create an empty directory structure. The issue comes from FileResourceStorage::directory()
creating things by default. Probably should not do it.
The rust implementation the flatdata::ResourceStorage
is not thread safe and is missing a std::marker::Send
implementation.
There are usages of RefCell
and rc::Rc
which probably need to be assessed for thread safety.
Schema definitions are not interestring to read and are hidden in the docs anyway. They should down in the file.
flatdata supports sorting single structs in an archive (usually headers). Currently those are generated/written in C++ by creating a ``flatdata::Vector< T > v( 1 );```. Adding a class that represents this concept more closely would be more intuitive/readable.
Proposal:
template< typename T > flatdata::Struct{ ... }
for this conceptI came across this bug while generating the rust code from a flatdata schema that was already the dumped schema of a flatdata archive that was previously built.
Here's a wrongly generated macro:
define_variadic_struct!(RelationMembers, RelationMembersItemBuilder,
IndexType32,
0 => (.osm.NodeMember, add_.osm._nodemember),
1 => (.osm.WayMember, add_.osm._waymember),
2 => (.osm.RelationMember, add_.osm._relationmember));
when you should get:
define_variadic_struct!(RelationMembers, RefRelationMembers, BuilderRelationMembers,
IndexType32,
0 => (NodeMember, add_node_member),
1 => (WayMember, add_way_member),
2 => (RelationMember, add_relation_member));
E.g.:
// /*
// * This is a comment about Bar
// */
type Bar struct {
descriptor flatdata.MemoryDescriptor
position int
}
FYI: @artemnikitin
In C++ we have type shortcuts, e.g.: Archive::EdgesType
. In Rust this is more complicated, as only traits can have associated types.
This is needed since right now, the docker image is stored under my account, and so cannot be controlled by other flatdata maintainers.
Add support for multiple output files especially to python generator, to output proper package with respected namespaces. Adapt CMake for easier integration into C++
Since we now have 4 languages, and each has it's one schemas for tests, I was thinking we should maybe create a central place for testcases:
That would make it easy to test compliance for new language implementations, and reduce redundancy of the tests.
Reproducible:
namespace test_structures {
struct A {
x : u32;
}
} // namespace test_structures
namespace test_structures2 {
struct A {
x: u32;
}
} // namespace test_structures2
Fixing this is most likely not backwards compatible, so we should prioritize this?
@boxdot I can easily fix this, but I need some guidance on what would be the most natural way for Rust to handle this? A sub-module in the generated file (users could re-export to get rid of it if they do not like it)?
Right now, a schema-file in flatdata is a single file, which can not be split into separate parts. Especially, since we support optional subarchives, it makes sense to be able to serialize a subarchive standalone, and at the same to include it in a bigger archive.
Example:
Instead of
pub const INFINITE: u32 = 4294967295;
generate
pub const INFINITE: u32 = 4_294_967_295;
This would remove clippy
warning in Rust.
Another possibility, would be to use #![allow(clippy:all)]
. In the end, this is generated code and is not meant to be very readable.
Unlike C/C++ Rust enums are exhaustive: All possible values need to be explicitly mentioned.
That means that casting from the underlying type (e.g. 9 bit integer) to the enumeration can cause UB if the value stored in flatdata does not exist in the enumeration (e.g. corrupted file, etc).
Rust doesn't currently support C-like enums. This means that our options are a bit limited. Some ideas:
Unknown
, and check in reader (speed impact?)raw
reading function might be needed for fast accessAdd autodoc-, doxygen-generated and manually written docs. Publish on GitHub.io
C++ and Rust support enumerations, but python readers to not yet
Right now, we have generated reader tests (in Python) for the C++ byte reader. The problem is, that they always try to read 1 from the beginning of some data segment, never deeper in the data. In particular, they test that we read correcly with varying bit_size but not varying offset. Note, when reading 1, the bit_size is not really significant.
Generated writer tests are missing completely.
Flattdata supports statically sized bitsets as packed members (e.g. bool:1). We do not support a use-case, though, in which the user needs to store a large set of bits (e.g. one for each entity) in one blob (except manually through raw_data)
Should we consider adding something like a bitset
to archives?
Based on issue in #31
In Go:
Raw string literals are character sequences between back quotes, as in `foo`.
Within the quotes, any character may appear except back quote.
https://golang.org/ref/spec#String_literals
Probably, there are more such cases.
Technically, we have this already. See https://github.com/heremaps/flatdata/pull/21/files/4d14db178ea36e9a96fa945996b9d7a0fe156760#diff-03983e4479e8a13251972767b3d281c1:
flatdata::Vector< SomeStruct > data( 1 );
auto obj = data[ 0 ];
// set fields of obj
builder.set_some_struct( obj );
As discussed, it would make sense to add support for that, such we don't have to use a workaround with Vector
.
It would be nice to have a built-in support for enum
in flatdata. We could also consider to support a special variant of enums: bitfield enum
s.
Code contains things like:
In addition to #96 I was thinking that we might want to simplify the generator tests: Each is testing input schemas vs some generated output, all in code.
Would it make sense to instead use the test schemas from #96 and just store a bunch of expectations as files?
E.g.
test-case tests/schemas/signed_enums.flatdata
expectation tests/generator/rust/generated/signed_enums.rs
Similar to the test it could be that we do not require a 1:1 match, just a "contains" test.
The python test runner would then just enumerate all test schemas, and run against the expectations.
Would that make sense? Is that something you normally do when testing python code, or is the current way preferable @boxdot @imagovrn ?
Often we have optional struct fields implemented through special constants, e.g:
const u32 INVALID_XYZ = 0xffff;
struct A {
// INVALID_XYZ in case it's empty
x : u32;
}
How about supporting that in the generated code properly: exposing an Option< T >
instead of T
?
Schema suggestion:
const u32 INVALID_XYZ = 0xffff;
struct A {
@null_value( INVALID_XYZ )
x : u32;
}
Currently flatdata is truly "flat", but we could benefit from supporting nesting structures.
Pros:
Cons:
Typical use cases:
Setup
namespace n{
const i8 FOO = 0;
}
namespace m{
const i8 FOO = 1;
}
Currently results in
const (
...
FOO int8 = 0
FOO int8 = 1
)
Similarly for other constructs. This needs to be fixed if namespaces are to be properly supported by the Go backend.
FYI: @artemnikitin
Benchmark different Flatdata implementations, perhaps also against similar libraries?
It just prints out the struct definition.
Behavior right now: When ExternalVector
is destructed, we have a debug assert, that the user has closed it. Same for MultiVector
.
In situation, when a user has multiple resources (that need to be closed) encounters an error and wants to make an early return, the resources are destructed, which terminates the program in debug mode. This is obviously not perfect. We should rather try to close in the destructor. The problem is, that closing may fail and we have two possible strategies what to do then.
I tend to use the second approach. Our contract specifies that the user should close resources explicitly. Otherwise, we just try to do it for her, and when we can't do it, we don't want to kill the whole program. This also follows the practice in https://doc.rust-lang.org/std/io/struct.BufReader.html.
At the moment, to import flatdata in rust, we need to write:
#[macro_use]
extern crate flatdata
flatdata_intersperse
macro.Often flatdata data needs to be sorted. By not exposing sort methods on Vector the user has to sort another structure first before serializing. This needs special wrapper since the Vector classes just expose proxies / proxy iterators.
This is distinct from external memory sort on ExternalVector: All the data is in memory.
FYI: @boxdot
Current flatdata-py
implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:
dask
).Rust has the problem that all of flatdata's references are strongly types: They use fixed-size integer types, not std::size/std::usize.
This means Rust code is often littered with:
g.strings().substring(vertices.at(data.to_ref() as usize).name_ref() as usize)?;
g.edges().slice(vertex.edges().start as usize .. vertex.edges().end as usize);
I propose adding the following:
at
method for u8, u16, u32, u64slice
method for Range<u8, u16, u32, u64>
substring
method for u8, u16, u32, u64This means flatdata will not support architecture that have smaller address space.... but since flatdata use memory mapped files it cannot do so anyway.
Thoughts @boxdot ?
ExtenalVector
does not provide indexed access to its data since it may not be yet written to disk. However, after the vector is closed, its data is fully written and constant from that point in time. Often, one wants to access the data after it was closed. So, it would make sense on closing to return an ArrayView
to its data for convenience.
a common pattern used in flatdata is too use a fixed value to indicate a missing field, e.g.
const u32 INVALID_INDEX = 0xFFFFFFFF;
struct MyStruct {
@const( INVALID_INDEX )
index : u32;
}
I propose making this a use case that we should support properly:
@optional( <const_name> )
annotationOption<T>/boost::optional/whatever
getters/settersfield_name_raw()
?)0.2 still has multiple issues (lifetimes, etc), so it would be nice to update to the latest version.
Also: Can we set it up that it is either done:
Right now, make test
is not working if make
was not executed before. This is due to the limitation of downloading the external project and needs to be fixed.
I would like also use the opportunity and a start a discussion about changing the unit testing framework. Our unit tests are fairly simple and do not require mocking, so we don't have use for GMock. On the other hand, GTest needs to be downloaded and built. Which is ok for external dependencies that are really needed for the library, but is cumbersome for unit testing which should just work out of the box.
I propose to either:
The latter is a header only single file unit testing library which has proven for me to work already many times.
C++ and Rust support enumerations, but Go readers to not yet
The link to https://github.com/heremaps/flatdata/blob/master/examples/karenina.json on examples/README.md
is broken.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.