world-federation-of-advertisers / cross-media-measurement Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
world-federation-of-advertisers/cross-media-measurement-api#29 switched the v2alpha API resources to use resource names instead of the non-standard resource key messages. Some v2alpha services (e.g. related to Exchange resources) have already been implemented. These need to be updated in order to pull in the newer version of the API repo.
For now, having handwritten functions for parsing/assembling each resource name should be sufficient. In the future, we may want to investigate code generation based on the resource annotations in the protobuf so that the name patterns can live in one place.
Currently, we use a fixed set of ElGamal Keys for all computations, which is not secure.
We need to generate a new set of keys for each computation.
When user revokes a certificate, the measurement coordinator could cancel or fail all active measurement using that certificate.
This allow the MPC workers to skip checking certificate revocationState when consuming it.
This includes redefining the protocol buffer messages and services. There will be separate issues tracking updating the Kingdom database schema and updating the internal service implementations.
This blocks implementation of the v2alpha public API services, since they depend on the internal service definitions.
Think about auto-fail case for ExchangeStepAttempt. It could be as simple as; After client started working on an ExchangeStep and not called finishExchangeStepAttempt (or called finishExchangeStepAttempt with ACTIVE status) in X minutes (hours, etc.) passed, automatically fail the attempt.
Implement the services (re)defined as part of issue #52. Unlike the existing service implementations, we can drop the database interfaces as they add little value. Instead, we'll have separate service implementations for each set of cloud products.
Since the only supported database right now is Google Cloud Spanner, this issue only covers the Spanner implementation of the services.
In #389 we added new columns to the EventGroups
table. Some of these should be NOT NULL
, but that would require updates to the internal service which was out of scope for that single PR. Once the internal service has been updated to populate these columns, they should be made NOT NULL
:
UpdateTime
EventGroupDetails
EventGroupDetailsJson
The Requisition Service provides EDPs with instructions about what sketches they should generate.
This service will need substantial refactoring to accommodate consent signaling.
When SimpleReport create measurement is run with deltas = 0.0, get measurement returns null
We're currently using https://github.com/grpc-ecosystem/grpc-health-probe/ for our health probes. K8s v1.23 introduced a built-in gRPC probe type.
Note that this probe type doesn't support TLS, so we'll need to run our health service on a separate non-TLS port. This would actually significantly simplify things, as we won't need to deal with health probe certificates and hostname overrides.
Use /DataProviders, /ModelProviders, /RecurringExchanges, and /Exchanges services to add more test cases within the ExchangeStepsServiceTest.
The metadata
field of EventGroup
is optional, and the corresponding measurement_consumer_public_key
is only required when the metadata
field is set.
Reported by @mariolamassaavedra .
The test fails inside of our build container, but passes on GitHub Actions (Ubuntu) as well as my Linux machine (Debian). Here's a sample failure log:
java.lang.IllegalStateException: Process [/usr/local/google/home/sanjayvas/hg/cmms/bazel-container-output/sandbox/processwrapper-sandbox/1411/execroot/wfa_measurement_system/_tmp/d560f597e4ce2ee36ef8c9f75d0f431a/embedded-pg/PG-a026f08d3ce9851e748092d9c1e5585c/bin/initdb, -A, trust, -U, postgres, -D, /usr/local/google/home/sanjayvas/hg/cmms/bazel-container-output/sandbox/processwrapper-sandbox/1411/execroot/wfa_measurement_system/_tmp/d560f597e4ce2ee36ef8c9f75d0f431a/epg4783638036069027391, -E, UTF-8] failed
at com.opentable.db.postgres.embedded.EmbeddedPostgres.system(EmbeddedPostgres.java:602)
at com.opentable.db.postgres.embedded.EmbeddedPostgres.initdb(EmbeddedPostgres.java:221)
at com.opentable.db.postgres.embedded.EmbeddedPostgres.<init>(EmbeddedPostgres.java:142)
at com.opentable.db.postgres.embedded.EmbeddedPostgres$Builder.start(EmbeddedPostgres.java:554)
at com.opentable.db.postgres.junit.SingleInstancePostgresRule.pg(SingleInstancePostgresRule.java:46)
at com.opentable.db.postgres.junit.SingleInstancePostgresRule.before(SingleInstancePostgresRule.java:39)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at com.google.testing.junit.runner.internal.junit4.CancellableRequestFactory$CancellableRunner.run(CancellableRequestFactory.java:108)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at com.google.testing.junit.runner.junit4.JUnit4Runner.run(JUnit4Runner.java:116)
at com.google.testing.junit.runner.BazelTestRunner.runTestsInSuite(BazelTestRunner.java:159)
at com.google.testing.junit.runner.BazelTestRunner.main(BazelTestRunner.java:85) ```
It looks like some `initdb` command from embedded Postgres is failing.
Hi. How do I obtain a CLA as per the contributing guidelines? It's not clear if I need one before I submit a PR.
Thanks!
Currently, we use timestamp
as the pageToken in various public APIs. This won't work since it is possible that multiple rows have same commit timestamp.
We need to redesign our pagination strategy. Here are some basic requirements.
The redesign also impact the kingdom internal StreamXXX methods, since it can not use the create_after
filter any more.
It appears to be happening during SETUP_PHASE. I can't find any matching error in Kingdom's system API server, so I'm guessing the UNKNOWN is coming from a computation control server? The computation eventually appears to fail with "Failing computation due to too many failed attempts.".
Stack trace:
java.util.concurrent.CancellationException: Collection of responses completed exceptionally
at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22)
at kotlinx.coroutines.JobKt__JobKt.cancel(Job.kt:596)
at kotlinx.coroutines.JobKt.cancel(Unknown Source)
at io.grpc.kotlin.HelpersKt.cancelAndJoin(Helpers.kt:47)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invokeSuspend(ClientCalls.kt:325)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invoke(ClientCalls.kt)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invoke(ClientCalls.kt)
at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:89)
at kotlinx.coroutines.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:166)
at kotlinx.coroutines.BuildersKt.withContext(Unknown Source)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1.invokeSuspend(ClientCalls.kt:324)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
at org.wfanet.measurement.duchy.deploy.common.daemon.mill.liquidlegionsv2.LiquidLegionsV2MillDaemon.run(LiquidLegionsV2MillDaemon.kt:132)
at org.wfanet.measurement.duchy.deploy.gcloud.daemon.mill.liquidlegionsv2.GcsLiquidLegionsV2MillDaemon.run(GcsLiquidLegionsV2MillDaemon.kt:34)
at picocli.CommandLine.executeUserObject(CommandLine.java:1919)
at picocli.CommandLine.access$1100(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
at picocli.CommandLine.execute(CommandLine.java:2058)
at org.wfanet.measurement.common.CommandLinesKt.commandLineMain(CommandLines.kt:40)
at org.wfanet.measurement.duchy.deploy.gcloud.daemon.mill.liquidlegionsv2.GcsLiquidLegionsV2MillDaemonKt.main(GcsLiquidLegionsV2MillDaemon.kt:38)
Caused by: io.grpc.StatusException: UNKNOWN
at io.grpc.Status.asException(Status.java:550)
at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:296)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closedInternal(ClientCallImpl.java:751)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closed(ClientCallImpl.java:687)
at io.grpc.internal.RetriableStream$Sublistener$2.run(RetriableStream.java:853)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.internal.RetriableStream$Sublistener.closed(RetriableStream.java:848)
at io.grpc.internal.ForwardingClientStreamListener.closed(ForwardingClientStreamListener.java:34)
at io.grpc.internal.InternalSubchannel$CallTracingTransport$1$1.closed(InternalSubchannel.java:693)
at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:459)
at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:221)
at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:442)
at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:278)
at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31)
at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:233)
at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191)
at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:200)
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:445)
at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:401)
at io.grpc.internal.AbstractClientStream$TransportState.inboundTrailersReceived(AbstractClientStream.java:384)
at io.grpc.internal.Http2ClientStreamTransportState.transportTrailersReceived(Http2ClientStreamTransportState.java:183)
at io.grpc.netty.NettyClientStream$TransportState.transportHeadersReceived(NettyClientStream.java:341)
at io.grpc.netty.NettyClientHandler.onHeadersRead(NettyClientHandler.java:372)
at io.grpc.netty.NettyClientHandler.access$1200(NettyClientHandler.java:91)
at io.grpc.netty.NettyClientHandler$FrameListener.onHeadersRead(NettyClientHandler.java:940)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:337)
at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:56)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:476)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:484)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378)
at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
For example, deleting a Report is blocked due to foreign key references. Those referencing rows would need to be deleted manually first.
We'd need to add a schema update that re-creates the constraints with ON DELETE CASCADE
.
If a Measurement transitions to the FAILED state without all Duchies having filled their RequisitionParams, required fields in DuchyValue may not be set.
The intent is for such Requisitions to not be visible in the public API. The current implementation filters by internal Measurement state, which misses this case.
Update WORKSPACE to replace;
"com.nhaarman.mockitokotlin2:mockito-kotlin": "2.2.0" with;
"org.mockito.kotlin:mockito-kotlin": "3.1.0"
The common JVM code has been copied from this repository to the https://github.com/world-federation-of-advertisers/common-jvm repository. The next step is to import that repository and drop the copies in this one. This includes imports and some of the Starlark definitions in the top-level build
package. Some Bazel target visibilities may need to be adjusted.
Note that the common-jvm repo has some build definitions and possible imports that aren't actually used by that repo and don't belong there. For example, Bazel rules for CUE aren't common JVM dependencies. These should not be imported from common-jvm, and should eventually be deleted from there.
In theory, the mill's memory usage should be reset to ground level between processing two computations. However, when doing extensive performance test, I find the "non-evictable memory" usage of the mill keeps increasing as it processes more and more computations. It is unclear where these "non-evictable memory" comes from. Also the jvm maximum memory is set to 4GB, but the "non-evictable memory" usage can be way more than that, e.g. 10+GB.
We don't think there is any memory leakage in either the kotlin or c++ code. But we need a deeper investigation on this issue at the appropriate time.
We currently use ktlint
as both a linter and formatter. Unfortunately, ktlint
does not produce deterministic output (given two code samples that differ only by whitespace, it produces different output). ktfmt
provides this. It's based on google-java-format, and therefore follows principles such as the Rectangle Rule.
In order to continue to use ktlint as a linter, there are a couple ktlint rules that will need to be disabled in .editorconfig
to avoid corner cases where it conflicts with ktfmt.
Parent not found -> FAILED_PRECONDITION and furthermore have PreconditionFailure set in the error details (see https://github.com/googleapis/api-common-protos/blob/37d5125da5c90f2124d15908a54a32ed3f470bc2/google/rpc/error_details.proto#L140)
self not found -> NOT_FOUND
some resource other than parent or self not found -> INVALID_ARGUMENT
For example, when a EDP wants to fulfill a requisition which is already locally marked FULFILLED at the duchy. The duchy should confirm the state of the requisition at the kingdom.
Implement the v2alpha public API Accounts service, as well as the relevant methods that involve modifying resource ownership.
The initial version for Milestone 1B need not depend on OpenID Connect. Instead, another identity type (e.g. simple username/password) can be added for creating admin accounts for testing, with the Accounts service not being exposed outside the cluster.
It looks like the Herald can get stuck reprocessing the same computation change forever. We should detect such conditions and mark the computation as permanently failed.
See #87 which updated the system API for this purpose.
The --filter
option is not required, but an exception is thrown when it's not set. It's incorrectly defined as a lateinit
property, which implies it will always eventually be set.
Stack trace:
kotlin.UninitializedPropertyAccessException: lateinit property celFilter has not been initialized
at org.wfanet.measurement.reporting.service.api.v1alpha.tools.ListEventGroups.run(Reporting.kt:433)
at picocli.CommandLine.executeUserObject(CommandLine.java:1919)
at picocli.CommandLine.access$1100(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
at picocli.CommandLine.execute(CommandLine.java:2058)
at org.wfanet.measurement.common.CommandLinesKt.commandLineMain(CommandLines.kt:40)
at org.wfanet.measurement.reporting.service.api.v1alpha.tools.ReportingKt.main(Reporting.kt:497)
This was spotted in the logs of the Spanner computations server within the "worker2" Duchy on halo-cmm-dev.
WARNING: Stage attempt with primary key (-1008988325, liquid_legions_sketch_aggregation_v2: EXECUTION_PHASE_TWO, 6) did not have an ending reason set when ending computation, setting it to 'CANCELLED'.
Querying the ComputationStageAttempts table for that ComputationId results in the following:
ComputationId | ComputationStage | Attempt | BeginTime | EndTime | Details | DetailsJSON | |
---|---|---|---|---|---|---|---|
-1008988325 | 1 | 1 | 2022-08-30T23:42:09.652017Z | 2022-08-30T23:42:11.454366Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 2 | 1 | 2022-08-30T23:42:11.454366Z | 2022-08-30T23:45:40.688327Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 3 | 1 | 2022-08-30T23:45:41.156663Z | 2022-08-30T23:45:42.171272Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 4 | 1 | 2022-08-30T23:45:42.171272Z | 2022-08-30T23:45:50.158201Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 6 | 1 | 2022-08-30T23:45:50.753601Z | 2022-08-30T23:45:55.385577Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 7 | 1 | 2022-08-30T23:45:55.385577Z | 2022-08-30T23:46:27.052964Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 8 | 1 | 2022-08-30T23:46:27.477582Z | 2022-08-30T23:47:03.349957Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 9 | 1 | 2022-08-30T23:47:03.349957Z | 2022-08-30T23:48:52.269688Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 10 | 1 | 2022-08-30T23:48:53.053464Z | 2022-08-30T23:55:26.668691Z | CAM= | {"reasonEnded":"LOCK_OVERWRITTEN"} | |
-1008988325 | 10 | 2 | 2022-08-30T23:55:26.668691Z | 2022-08-31T00:05:04.430031Z | CAQ= | {"reasonEnded":"CANCELLED"} | |
-1008988325 | 10 | 3 | 2022-08-30T23:56:57.478655Z | 2022-08-31T00:05:04.430031Z | CAQ= | {"reasonEnded":"CANCELLED"} | |
-1008988325 | 10 | 4 | 2022-08-30T23:57:12.992429Z | 2022-08-31T00:05:04.430031Z | CAQ= | {"reasonEnded":"CANCELLED"} | |
-1008988325 | 10 | 5 | 2022-08-30T23:57:41.693306Z | 2022-08-31T00:05:04.430031Z | CAQ= | {"reasonEnded":"CANCELLED"} | |
-1008988325 | 10 | 6 | 2022-08-30T23:58:20.300635Z | 2022-08-31T00:05:04.430031Z | CAQ= | {"reasonEnded":"CANCELLED"} | |
-1008988325 | 10 | 7 | 2022-08-31T00:00:21.112744Z | 2022-08-31T00:00:27.892296Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 11 | 1 | 2022-08-31T00:00:27.892296Z | 2022-08-31T00:04:17.380198Z | CAE= | {"reasonEnded":"SUCCEEDED"} | |
-1008988325 | 12 | 1 | 2022-08-31T00:04:28.110317Z | 2022-08-31T00:05:04.430031Z | CAE= | {"reasonEnded":"SUCCEEDED"} |
Querying the Computations table gave the following for ComputationDetailsJson (prettified using jq
):
{
"blobsStoragePrefix": "computation-blob-storage/Ih4-Qg7iAys",
"endingState": "SUCCEEDED",
"kingdomComputation": {
"publicApiVersion": "v2alpha",
"measurementSpec": "CuoBCAES5QEIrfyPlQUS3AEKzwEKPXR5cGUuZ29vZ2xlYXBpcy5jb20vZ29vZ2xlLmNyeXB0by50aW5rLkVjaWVzQWVhZEhrZGZQdWJsaWNLZXkSiwESRAoECAIQAxI6EjgKMHR5cGUuZ29vZ2xlYXBpcy5jb20vZ29vZ2xlLmNyeXB0by50aW5rLkFlc0djbUtleRICEBAYARgBGiEA2VOaFS7y+xGQvC2VydELNZValk7ASE8EXHny7b0tOX4iIC1a/MlmcWmzimHAUyPjtZfuRLqtlS6qUdkLj8/eb15bGAMQARit/I+VBSABEiAK2jUhaG3CB4sA8IS1t582X7x2aWCdOEr4+lqjqBkDJRIgVi/udYPVVK3ocMFYIggJTC21hMdelowv9jyFiLV79LoSINTgsqFF0GJjrqzzpp+8kdOHgijNPPYLNqdoY9uMZCO3EiDUJg7UQQgGifrpLCazfG5sQTFCv1ckAGxY+3ThDCvjDhIgLjlj+A1Nd/4SfKninOX/2pv4mJkf3/x0lcwcAxIXu+kSILCgiiq0Lb1tNvVCFl5rAMzMYfqzOb8AmbUBN9NAttPfGgUVAACAPyIoChIJmpmZmZmZuT8Rje21oPfGsD4SEgmamZmZmZm5PxGN7bWg98awPg==",
"measurementPublicKey": {
"format": "TINK_KEYSET",
"data": "CK38j5UFEtwBCs8BCj10eXBlLmdvb2dsZWFwaXMuY29tL2dvb2dsZS5jcnlwdG8udGluay5FY2llc0FlYWRIa2RmUHVibGljS2V5EosBEkQKBAgCEAMSOhI4CjB0eXBlLmdvb2dsZWFwaXMuY29tL2dvb2dsZS5jcnlwdG8udGluay5BZXNHY21LZXkSAhAQGAEYARohANlTmhUu8vsRkLwtlcnRCzWVWpZOwEhPBFx58u29LTl+IiAtWvzJZnFps4phwFMj47WX7kS6rZUuqlHZC4/P3m9eWxgDEAEYrfyPlQUgAQ=="
}
},
"liquidLegionsV2": {
"role": "NON_AGGREGATOR",
"parameters": {
"maximumFrequency": 10,
"liquidLegionsSketch": {
"decayRate": 12,
"size": "100000"
},
"noise": {
"reachNoiseConfig": {
"blindHistogramNoise": {
"epsilon": 1,
"delta": 1
},
"noiseForPublisherNoise": {
"epsilon": 1,
"delta": 1
},
"globalReachDpNoise": {
"epsilon": 0.1,
"delta": 1e-06
}
},
"frequencyNoiseConfig": {
"epsilon": 0.1,
"delta": 1e-06
}
},
"ellipticCurveId": 415
},
"participant": [
{
"duchyId": "worker2",
"publicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "Axql5u/MneZ+Me25qE06TKoKDNF3Get5XxtdcIHivzF1"
},
"elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQMapebvzJ3mfjHtuahNOkyqCgzRdxnreV8bXXCB4r8xdQ==",
"elGamalPublicKeySignature": "MEUCIDORmvsjh07/tUI4HrGmONRElUpRThijtBO837moHe8hAiEA8b0mAhLuhkqhYbjmkbJvOr2rBaYSmnzzT78fHQEgSqE=",
"duchyCertificateDer": "MIIB5TCCAYugAwIBAgIUD78MTB4sT79XxdvNwvBr/qHA5OowCgYIKoZIzj0EAwIwLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApXb3JrZXIyIENBMB4XDTIyMDcyOTE4MTMzOVoXDTMyMDcyNjE4MTMzOVowKTEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRAwDgYDVQQDDAdXb3JrZXIyMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAENwAOHu7k/lpUwdQe0y904QkD0JIvpfty8Sjo9IXVLjq53wqO7bSgmDIt2QYebVThciOd4UKC8doi7xFQXciimKOBjTCBijAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFK9pSQh8B6Q598dGOpiTvvUl16O7MB0GA1UdDgQWBBQPQhRN905ulZIoJnHJNyscGulbXTALBgNVHQ8EBAMCBeAwMAYDVR0RBCkwJ4IaKi53b3JrZXIyLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNIADBFAiEA15agdGryz+9x7uouhDw/Zf2C/xnOUo9OVbQ53EDvdK8CIGpl7DEzpIYItGAGPFBltA5gEepfpL88OrAdZHF8bod3"
},
{
"duchyId": "worker1",
"publicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "Az6TDc5eiEUB6K7GH1OEqObinexiepQzy//kVxgz31zj"
},
"elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQM+kw3OXohFAeiuxh9ThKjm4p3sYnqUM8v/5FcYM99c4w==",
"elGamalPublicKeySignature": "MEYCIQDzMqDqV1bBhpvZ2sRsozrkpK3bVPtdcwq15/DYaAmLeAIhAIa2zmT9xTl0Gs/0wj/sELa62cu/9ONcvH/ykNugZ6Kd",
"duchyCertificateDer": "MIIB5TCCAYugAwIBAgIUPYEx9H1BSNg7BAxMpgKP8f6akPAwCgYIKoZIzj0EAwIwLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApXb3JrZXIxIENBMB4XDTIyMDcyOTE4MTUwNFoXDTMyMDcyNjE4MTUwNFowKTEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRAwDgYDVQQDDAdXb3JrZXIxMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEe5z8YtYYtKimvGRnOwpnxZQS2WLepJ/MW9M5AXp79Wg0f2ppOXvXPncuw6ECJDbI034avjfOJLXUF5Zx8SltU6OBjTCBijAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFBBZC7JmE03G1c74SRXJ11Y1eaLpMB0GA1UdDgQWBBRphWYwUOESM+daAXNc+yy4XV3oOjALBgNVHQ8EBAMCBeAwMAYDVR0RBCkwJ4IaKi53b3JrZXIxLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNIADBFAiEAnem13Lo79Nc36ur5U37JoGTULVbZmWwPjklUn+sZPyMCIFe8PWzymm1NJbG/42iMTBs1d9e2Eg/TVM1OB9L1sXtV"
},
{
"duchyId": "aggregator",
"publicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "A20YcYJacmlE1rWcZWiC7HTi8AUIZzSS1jXBY295j3q4"
},
"elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQNtGHGCWnJpRNa1nGVogux04vAFCGc0ktY1wWNveY96uA==",
"elGamalPublicKeySignature": "MEUCIQC7IaNLHAFOcrEzPQhlfoQTDxWzgKzw0Z08duSPAZ4/+wIgVfTemiYP62WPp4LaLeNCxsd2acma5J6UoeKalqF/+Dc=",
"duchyCertificateDer": "MIIB7TCCAZSgAwIBAgIUMWePkRD78fxf7XR/FJRI+99c8kYwCgYIKoZIzj0EAwIwLzEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRYwFAYDVQQDDA1BZ2dyZWdhdG9yIENBMB4XDTIyMDcyOTE4MTgzN1oXDTMyMDcyNjE4MTgzN1owLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApBZ2dyZWdhdG9yMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE4Pt+OVbT1ne1+jHHKKMEwYwAA8oHf2wCsl7x8G5LoSQqvVR8V8JKJUNOA5i4z5jBWHGt/1TAZ8A7iYROuZZiFaOBkDCBjTAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFEetV5EszT17lLVF0r4f7V52atw5MB0GA1UdDgQWBBRYH+y724vCS2A9KdGUevcPWIwtrTALBgNVHQ8EBAMCBeAwMwYDVR0RBCwwKoIdKi5hZ2dyZWdhdG9yLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNHADBEAiA44QCvuMHdmYMq5ROUmPA5XnxWKtbX3bl+Wb8mMCL3mQIgQJGcGkMPOqy274MNK8DTlNLqyVNO0IDdQVSgpaE8WZQ="
}
],
"combinedPublicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "A8leFaNR9+Vw/NUfBkGksb4di3Wb0pTeGP1Zg2uVgN+B"
},
"partiallyCombinedPublicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "AlyNNsjQuLnaDAEx62NGjOKKdmyd7VnSd4ewrcJ/0DcA"
},
"localElgamalKey": {
"secretKey": "osvxlSP1mb8apAAH991QndvgVm/ZbjPnfJGQ53bOm30=",
"publicKey": {
"generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
"element": "Axql5u/MneZ+Me25qE06TKoKDNF3Get5XxtdcIHivzF1"
}
}
}
}
Report from @mariolamassaavedra:
We are hitting Halo’s Dev using listEventGrooups and getting the following error:
Measurement Consumer Name in filter invalid.
This is due to the Reporting server calling the kingdom’s listEventGroups method with the filter field value as filter: ex87HOXdfPY (sample id) instead of being filter: measurementConsumers/ex87HOXdfPY as per the kingdom’s api definition - We have deployed our own version of the code in our environment and the call works correctly.
Retries for start/update were added in f02986e to deal with a potential race condition. This was prior to the codebase being open-sourced, so the following is copied from the Google-internal issue b/168551392:
The Herald has a race condition:
- the mill in the last unconfirmed duchy checks the local requisition and finds all are available
- the mill confirms to the kingdom
- the mill updates local computation to WAIT_TO_START
- the kingdom updates the computation to RUNNING and sends out the status change to all connected heralds
- the herald tries to update the local computation to TO_ADD_NOISE
Unfortunately, this retry is unconditional on any exception. syncStatuses
also wraps the higher-level processSystemComputationChange
in retries, but those check the gRPC status to determine if the operation is retriable. This would likely be insufficient for the start/update case as it's a state mismatch.
Digging into an error that I sometimes (but not always) see popping up in the integration test. It shows up as TRANSIENT
error in the LLv2 Mill. The source appears to be GcpSpannerComputationsDatabaseTransactor
. Adding more logging gives the following:
wfa.measurement.internal.duchy.Computations RecordOutputBlobPath
WARNING: [DefaultDispatcher-worker-2 @coroutine#3852] gRPC error: UNKNOWN
java.lang.IllegalStateException: Failed to update because of editVersion mismatch.
Token's editVersion: 1662062844075 (2022-09-01T20:07:24.075Z)
Computations table's UpdateTime: 1662062846260 (2022-09-01T20:07:26.260354Z)
Difference: PT2.185354S
at org.wfanet.measurement.duchy.deploy.gcloud.spanner.computation.GcpSpannerComputationsDatabaseTransactor$runIfTokenFromLastUpdate$2.invokeSuspend(GcpSpannerComputationsDatabaseTransactor.kt:690)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
When we're failing the RPC due to an internal exception, we should be including the cause in the StatusRuntimeException (e.g. using Status.withCause) The causal information isn't transmitted to the client, but can be useful for debugging (e.g. it can be propagated by the in-process channel we use for integration testing).
Our integer ID fields utilize the proto3 default value of 0
as equivalent to not being specified. Therefore, we must ensure that we never consider 0
to be a valid ID.
We could ensure that IdGenerator
impls never return 0
, or make sure internal service impls never insert a row with a 0
ID (including adding tests for this).
If an MC creates an event filter without a PBM field like without an age restriction, PBM should understand and charge matching rows with all the ages. This is not done now.
The execute
method of SpannerWriter
has a misleading clock
parameter, which is used to construct a SpannerWriter.TransactionScope
. It is used neither to set the timestamp bound of the transaction nor has any impact on commit timestamps. Instead, it appears to only be used to populate the timestamps in ExchangeStepAttemptDetails
.
This parameter should be dropped. It can be passed to subclass constructor in cases where it's really needed.
Excerpt from build log:
INFO: From KotlinCompile @com_github_grpc_grpc_kotlin//compiler/src/main/java/io/grpc/kotlin/generator/protoc:protoc { kt: 18, java: 0, srcjars: 0 } for k8 [for host]:
warning: runtime JAR files in the classpath should have the same version. These files were found in the classpath:
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk7.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk8.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-common/1.4.30/kotlin-stdlib-common-1.4.30.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib/1.4.30/kotlin-stdlib-1.4.30.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.61/kotlin-reflect-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-script-runtime.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-reflect.jar (version 1.4)
warning: consider providing an explicit dependency on kotlin-reflect 1.4 to prevent strange errors
warning: some runtime JAR files in the classpath have an incompatible version. Consider removing them from the classpath
INFO: From KotlinCompile @com_github_grpc_grpc_kotlin//compiler/src/main/java/io/grpc/kotlin/generator:generator { kt: 6, java: 0, srcjars: 0 } for k8 [for host]:
warning: runtime JAR files in the classpath should have the same version. These files were found in the classpath:
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk7.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk8.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-common/1.4.30/kotlin-stdlib-common-1.4.30.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib/1.4.30/kotlin-stdlib-1.4.30.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.61/kotlin-reflect-1.3.61.jar (version 1.3)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-script-runtime.jar (version 1.4)
/usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-reflect.jar (version 1.4)
warning: consider providing an explicit dependency on kotlin-reflect 1.4 to prevent strange errors
warning: some runtime JAR files in the classpath have an incompatible version. Consider removing them from the classpath
Currently, the aggregator is randomly chosen for each computation from all workers. In production, we would like a fix aggregator worker.
There is now Kotlin-specific protobuf code generation as of protocolbuffers/protobuf#8272. We need not migrate everything to the new DSL builder syntax at once, but we should use Bazel rules that generate Kotlin protobuf code. We'll need to figure out if there are any pending issues related to how this interacts with grpc-kotlin.
See https://developers.google.com/protocol-buffers/docs/reference/kotlin-generated.
With the v2alpha design, the kingdom will only contain a list of grpc services. No need to have those daemons anymore.
This may indicate a memory leak, as the Kingdom data server doesn't handle large payloads. This persisted even after increasing container memory to 512MiB and increasing the heap percentage to 40.
Sample traces:
com.google.cloud.spanner.SpannerException: UNKNOWN: Java heap space
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:291)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:297)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:61)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:181)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:110)
at com.google.cloud.spanner.SpannerExceptionFactory.asSpannerException(SpannerExceptionFactory.java:100)
at com.google.cloud.spanner.AbstractResultSet$GrpcResultSet.next(AbstractResultSet.java:137)
at com.google.cloud.spanner.ForwardingResultSet.next(ForwardingResultSet.java:54)
at com.google.cloud.spanner.SessionPool$AutoClosingReadContext$1.internalNext(SessionPool.java:273)
at com.google.cloud.spanner.SessionPool$AutoClosingReadContext$1.next(SessionPool.java:253)
at com.google.cloud.spanner.AsyncResultSetImpl$ProduceRowsCallable.call(AsyncResultSetImpl.java:340)
at com.google.cloud.spanner.AsyncResultSetImpl$ProduceRowsCallable.call(AsyncResultSetImpl.java:334)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at com.google.protobuf.UnknownFieldSet$Builder.<init>(UnknownFieldSet.java:308)
at com.google.protobuf.UnknownFieldSet$Builder.create(UnknownFieldSet.java:311)
at com.google.protobuf.UnknownFieldSet$Builder.access$000(UnknownFieldSet.java:304)
at com.google.protobuf.UnknownFieldSet.newBuilder(UnknownFieldSet.java:71)
at com.google.protobuf.Value.<init>(Value.java:50)
at com.google.protobuf.Value.<init>(Value.java:17)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
at com.google.protobuf.ListValue.<init>(ListValue.java:64)
at com.google.protobuf.ListValue.<init>(ListValue.java:14)
at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)
at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:849)
at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
at com.google.protobuf.Value.<init>(Value.java:101)
at com.google.protobuf.Value.<init>(Value.java:17)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
at com.google.protobuf.ListValue.<init>(ListValue.java:64)
at com.google.protobuf.ListValue.<init>(ListValue.java:14)
at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)
at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:849)
at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
at com.google.protobuf.Value.<init>(Value.java:101)
at com.google.protobuf.Value.<init>(Value.java:17)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
at com.google.protobuf.ListValue.<init>(ListValue.java:64)
at com.google.protobuf.ListValue.<init>(ListValue.java:14)
at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)
Currently, the herald just processes kingdom updates one by one, if it fails to process a computation it would retry. It is not possible to skip a computation if it fails indefinitely.
We need to allow the herald to permanently fail a computation locally and update to the kingdom if necessary.
The system is currently using Kotlin language version 1.3. More recent versions of rules_kotlin
support language version 1.4.
See Kotlin release details for the appropriate matching version of kotlinx.couroutines
.
rules_kotlin
that supports language version 1.4.kotlinx.coroutines
).Currently, we are using String "DAILY|WEEKLY|MONTHLY" hardcoded for cronSchedule in the tests. We need to figure out best format for this field.
This covers redefining the resources and services. Updating the implementation for these services will be tracked separately.
SpannerReader
has functionality for reading by external ID that assumes that the row in question has exactly one ID component in its external key. This is not true for child tables, as the external ID is only unique for a given parent. That is to say that keys made up of external IDs have the same parent-child relationship as primary keys with multiple components.
Most likely, SpannerReader
should just be dropped. It has some other ugliness, such as:
readExternalId
which doesn't make sense with how Kotlin handles nullability.withBuilder
method that returns the same SpannerReader
instance after mutation, as opposed to following the common pattern of withX
methods returning a copy.I tried to follow this instruction, and updated the ServerPod to
#ServerPod: #Pod & {
_ports: [{containerPort: 8080}]
spec: containers: [{
readinessProbe: {
exec: command: [
"/app/grpc_health_probe/file/grpc-health-probe",
"--addr=:8080",
"--tls=true",
"--tls-ca-cert=/var/run/secrets/files/all_root_certs.pem",
"--tls-client-cert=/var/run/secrets/files/aggregator_server.pem",
"--tls-client-key=/var/run/secrets/files/aggregator_server.key",
"--connect-timeout=10s",
"--rpc-timeout=10s",
]
periodSeconds: 60
timeoutSeconds: 10
}}]
}
The kingdom servers can be at READY, but duchies servers are NOT.
NAME READY STATUS RESTARTS AGE
aggregator-async-computation-control-server-pod 0/1 Running 0 74s
aggregator-computation-control-server-pod 0/1 Running 0 74s
aggregator-requisition-fulfillment-server-pod 0/1 Running 0 73s
aggregator-spanner-computations-server-pod 0/1 Running 0 73s
gcp-kingdom-data-server-pod 1/1 Running 0 75s
system-api-server-pod 1/1 Running 0 75s
worker-1-async-computation-control-server-pod 0/1 Running 0 72s
worker-1-computation-control-server-pod 0/1 Running 0 72s
worker-1-requisition-fulfillment-server-pod 0/1 Running 0 72s
worker-1-spanner-computations-server-pod 0/1 Running 0 72s
worker-2-async-computation-control-server-pod 0/1 Running 0 71s
worker-2-computation-control-server-pod 0/1 Running 0 71s
worker-2-requisition-fulfillment-server-pod 0/1 Running 0 71s
worker-2-spanner-computations-server-pod 0/1 Running 0 71s
error info
Readiness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 10s exceeded: context deadline exceeded
I've no idea why kingdom and duchy have different result.
Temporarily removed readinessProbe from all pods to unblock the correctnessTest. will revisit later.
Rather than having custom retry logic, we should use what is built-in to gRPC itself. See https://github.com/grpc/grpc/blob/master/doc/service_config.md
As of Java 10 (and backported to later versions of Java 8), the UseContainerSupport
option is on by default. This means that we can set the max heap size to be a fraction of the container memory limit rather than having to manually ensure that the JVM max heap size is sufficiently smaller than the container memory limit.
We likely want -XX:InitialRAMPercentage
to be the same value, for the same reasons that it's recommended to have -Xms
be the same value as -Xmx
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.