Comments (4)
I only have vague recollections about this code (it was Duarte who wrote it six years ago), but this may have been deliberate, a (permature?) optimization that if there's nothing interesting to gossip, we won't gossip, and when the other nodes want to know our backlog, they see they don't have any new information and assume (even without this gossip) the value is zero. I remember (again, vaguely) that we remember the time when the backlog information was received (via gossip or piggy-backed on a response), and after a short time we "forget" this value and it becomes zero anyway.
So although I share you puzzlement on why there is this weird maxint corner-case, I'm afraid I don't understand what is the actual problem you are reporting here. Is the problem that we only "forget" (zero) the outdated gossiped value after a long time (?) and not quickly enough ? Don't we get the zero value almost immediately in responses? Did you see an actual problem with this in a real test case, and can you say something about what you did and what you saw?
from scylladb.
So although I share you puzzlement on why there is this weird maxint corner-case, I'm afraid I don't understand what is the actual problem you are reporting here. Is the problem that we only "forget" (zero) the outdated gossiped value after a long time (?) and not quickly enough ? Don't we get the zero value almost immediately in responses? Did you see an actual problem with this in a real test case, and can you say something about what you did and what you saw?
The incorrect values that we had because of this issue were interfering with my testing of #18334.
This issue doesn't have any big consequences in the current codebase because we're only using the backlogs for calculating delays for mv flow control, so the worst that could happen is using too long of a delay.
However, what we have is also just incorrect. Because of the ULLONG_MAX
condition, we simply cannot reduce the tracked view update backlog size to 0 through gossip. Many times, it will be quickly updated through a response, but it doesn't have to - if the requests were sent in one big burst, the backlog returned in their responses may have grown considerably, and if it managed to drop to 0 before the gossip timer finished, we will be stuck with the bigger value
from scylladb.
So although I share you puzzlement on why there is this weird maxint corner-case, I'm afraid I don't understand what is the actual problem you are reporting here. Is the problem that we only "forget" (zero) the outdated gossiped value after a long time (?) and not quickly enough ? Don't we get the zero value almost immediately in responses? Did you see an actual problem with this in a real test case, and can you say something about what you did and what you saw?
However, what we have is also just incorrect. Because of the
ULLONG_MAX
condition, we simply cannot reduce the tracked view update backlog size to 0 through gossip. Many times, it will be quickly updated through a response, but it doesn't have to - if the requests were sent in one big burst, the backlog returned in their responses may have grown considerably, and if it managed to drop to 0 before the gossip timer finished, we will be stuck with the bigger value
I had a vague recollection that we had sort of expiration time on backlogs, so if we remember some old value from a minute ago and never got a new value, we ignore the old value and default to 0. But now that I tried to look for this code, I couldn't find so I guess it doesn't exist, and you're right - gossip should be allowed to send the value 0.
One consequence of this will be that a database that has no materialized views at all will still gossip these zero values all the time. I guess this negligible, but maybe this was the reason why this "optimization" was added originally?
from scylladb.
I had a vague recollection that we had sort of expiration time on backlogs, so if we remember some old value from a minute ago and never got a new value, we ignore the old value and default to 0. But now that I tried to look for this code, I couldn't find so I guess it doesn't exist, and you're right - gossip should be allowed to send the value 0.
One consequence of this will be that a database that has no materialized views at all will still gossip these zero values all the time. I guess this negligible, but maybe this was the reason why this "optimization" was added originally?
Actually, this issue doesn't really concern sending the value, just receiving them (the problematic if
is in view_update_backlog_broker::on_change()
) - we can get the 0 value through gossip but we ignore it.
We have a problem with sending backlog values through gossip as well but it's slightly different: #18461
from scylladb.
Related Issues (20)
- systemd/packaging: relax dependencies on scylla-tools, scylla-jmx HOT 3
- [dtest] cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness is failing on teardown with "AssertionError: Unexpected errors HOT 5
- [tablets] Add an optional table parameter to native nodetool ring HOT 1
- describe schema: generating schema description preempts but does not protect form schema being changed in the meantime
- fromJson() or INSERT JSON fails to set a map<timeuuid, int> HOT 1
- Coredump during a truncate operation HOT 4
- CQL Protocol reports "Non existing table" for tables that are being dropped (while they drop succesfully) HOT 4
- [dtest] heat_weighted_load_balancing_test.TestHeatWeightedLB Failed with AssertionError: Cache difference between node1 and node2 is out of range HOT 1
- Commitlog semaphore counting broken since CRC-sector feature
- nodetool repair command failed with exit code3 during drop keyspace HOT 11
- topology_experimental_raft/test_tablets.py::test_tablet_split fails due to read timeout HOT 1
- Upgrade Node Exporter to 1.8.0
- [SD documentation] Explaination how to run scylla-doctor in case nonroot Scylla installation is missed HOT 3
- hints: add documentation for the "waiting for hints" feature
- Python-driver matrix reports regression in metadata tests for table extensions HOT 6
- Last build failure on Fedora 39
- Get rid of fragile compaction group intrusive list HOT 2
- abseil failures in debug mode in Fedora 39/40 HOT 4
- casting varint to float fails with boost-1.83.0-3.fc40.x86_64 HOT 9
- [tablets] unlike repair-based replace operation, tablet-rebuild for replace streams data from all nodes, not just the node's datacenter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylladb.