ziglibs / diffz Goto Github PK
View Code? Open in Web Editor NEWImplementation of go-diff's diffmatchpatch in Zig
License: MIT License
Implementation of go-diff's diffmatchpatch in Zig
License: MIT License
The merge conversation in #23 has demonstrated that switching to bring-your-own-allocator memory management will require some changes in testing strategy.
If I'm understanding some context correctly, ZLS is using DiffMatchPatch
, and surely doing so with an arena. So for benchmarking and consistency reasons, it would be good to be able to run the tests using an arena as well.
Furthermore, it would be good to assure that an OutOfMemory
incident doesn't leak memory, or improperly double-free it. std.testing
has FailingAllocator
for that kind of check, so that's a minimum of three allocators which would be well to use on the test suite.
Last, it would be good to set up the tests so that they're also benchmarks, using std.Timer
to collect data. That could be conditionally reported from a separate build step, and is harmless to run when the information it provides isn't necessary. This could include running the tests many more times, in order to get useful amounts of timing data.
My sketch of a solution here is pretty simple: change each of the test blocks into a pair: a function, which takes an allocator and performs the tests, and a test block, which calls that function with an allocator. The functions should initialize a Timer and return its results, that's probably the right level of granularity but we should discuss that.
How things are structured from there is less clear to me. I haven't used a failing allocator before, but it seems pretty simple: run a for (0..) |allowed_allocations|
loop, which initializes a FailingAllocator
to permit that many allocations, and break
when we no longer catch
OutOfMemory
errors.
Whether the tests should be run on both the std.testing.allocator
, and an arena, every time, is less clear to me. Currently, on an M1, a test run is absolutely dominated by build time, finishing in a few hundred milliseconds when has_side_effects = true
is used to allow the tests to rerun without any build changes. So some changes which bump the test running time up to a second or so wouldn't really move the needle in the usual workflow where tests are run after a build.
But it isn't obvious to me that double testing with std.testing.allocator
, and an arena, is an important thing to do every time. Another option is to add the arena as a build flag, which would comptime switch from std.testing.allocator to ArenaAllocator
. I also don't know what happens when you make a FailingAllocator
backed by an ArenaAllocator
, but the design seems pretty composable.
So that's my sketch of a plan here, let me know what you think.
I was looking for a diffing library to use on a Zig project, and diffz
looked like the best candidate. Even with just the diffing portion of diff-match-patch, and a lack of recent commits (it happens), I figured hey, community-supported library, it's got the part I actually need, and maybe I could take some time to continue the port.
I was disappointed to see that the library uses an arena as though it were a garbage collector. In a ziglib, functions are expected to take a generic Allocator
, and behave accordingly, by freeing any memory they don't return. If the user wants to use an ArenaAllocator
, that's fine, free
is a no-op unless the allocation happens to be the last one performed, so one ends up with the performance benefits, while maintaining the flexibility which is a core competency of the language.
I assume this unfortunate state of affairs came to be because this version of diff-match-patch is a port from a garbage collected language, CāÆ presumably. It's just a pity, because Zig provides such excellent tools for finding memory leaks, use after free, and double free, and the library has a robust test suite. Proper memory management could have been added during the port, and then this would be a real Zig library, not just a promising sketch of one.
Anyway, now there's an issue to track this, in case anyone wants to close it...
Tracking issue for completing the port of the library.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ššš
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ā¤ļø Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.