Comments (4)
Hi @thkrebs Sorry for the excessive delay in replying.
There isn't a built-in way currently, For the file backstorage, then a compressed filesystem will work.
There seem to be several related use case.
- Reduce the space needed to store patches over the long term without loss of the history by storing compressed. This keeps the replayable history.
- Truncate the log at some point in time and reset the system to have that as the start. Previous history is lost but if the use case of a highly available system then keeping the state changes forever is less useful if there is a full backup of the database.
- Archive the tail of the log : take patches from some point-in time backwards and put them in a compressed archive which can be moved elsewhere, leaving only the necessary patches in the patch system and a redirection in case the full patch history is ever needed.
Does one of those cases cover it for you or is there another case?
from rdf-delta.
Thanks for the comprehensive answer. I think use case 2 seems to be appropriate for me. That was my line of thinking anyhow: Creating a backup which can be used as "starting point". Could you please elaborate what you exactly mean by "reset the system"? How would I implement truncation?
from rdf-delta.
Each client (e.g. instance of a replicated database) knows the version number where it got to.
As the code stands today, there is checking going on when the patch server(s) startup to verify the log. Some patch storages chase from latest to earliest (each patch log entries points to the previous one - they form a one linked list) and the earliest has no previous.
Just truncating the log means changing the earliest entry and it is version 0. It's not arbitrary - it could be but a change (and testing!) is needed to support that. At the moment, the truncate is going to need the client information updated.
A general facility is to have "loglets" - segments of log entries that act to organise the overall log. Then loglets can be offline (archived, deleted).
from rdf-delta.
Hi @afs ,
after operating rdf-delta for a while now, I would like be more specific about the use case which was driving the opening the issue.
We operated now rdf-delta/Fuseki for several month. The number of patches we have is now larger than 500.000. This is due a large number of updates which are caused by harvesting data from other data portals.
Obviously, the ever growing number of patches is driving the question what the best way would be to deal with that? Is there an approach you would consider as best practice?
from rdf-delta.
Related Issues (20)
- Add support to specify zookeeper root dir name HOT 1
- rdf-delta-fuseki tests are broken and not running in the normal build HOT 1
- Error: Unknown argument: s3Bucket HOT 1
- patches applied before fuseki server is started HOT 6
- [bug] version becomes unset when patch not found HOT 7
- rdf* additions are being added to patch files as aborts HOT 1
- Integration tests fail on Windows HOT 10
- Patch server started with file not issues error on M1 Mac HOT 1
- Patch conflict HOT 20
- Patchlog server starts re-syncing HOT 13
- Fuseki unresponsive to requests if performing with a very large sync HOT 2
- Enable $ endpoints HOT 1
- Log output after Fuseki has completed a sync
- queries are A LOT slower after losing and re-establishing a connection to zookeeper HOT 20
- Inconsistent reads from a HA Fuseki ensemble HOT 1
- DeltaLinkHTTP.retry() without retry mechanism? HOT 2
- Migrate RDF Patch module to Apache Jena
- Fuseki setup instructions
- Get Diff HOT 3
- Provide more synchronization options
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rdf-delta.