Git Product home page Git Product logo

cross-media-measurement's Issues

Update implemented v2alpha API services for resource name parsing.

world-federation-of-advertisers/cross-media-measurement-api#29 switched the v2alpha API resources to use resource names instead of the non-standard resource key messages. Some v2alpha services (e.g. related to Exchange resources) have already been implemented. These need to be updated in order to pull in the newer version of the API repo.

For now, having handwritten functions for parsing/assembling each resource name should be sufficient. In the future, we may want to investigate code generation based on the resource annotations in the protobuf so that the name patterns can live in one place.

Redefine Kingdom internal services to support v2alpha public API.

This includes redefining the protocol buffer messages and services. There will be separate issues tracking updating the Kingdom database schema and updating the internal service implementations.

This blocks implementation of the v2alpha public API services, since they depend on the internal service definitions.

Auto fail for /ExchangeStepAttempts.finishExchangeStepAttempt

Think about auto-fail case for ExchangeStepAttempt. It could be as simple as; After client started working on an ExchangeStep and not called finishExchangeStepAttempt (or called finishExchangeStepAttempt with ACTIVE status) in X minutes (hours, etc.) passed, automatically fail the attempt.

Implement internal Kingdom measurement services for v2alpha public API.

Implement the services (re)defined as part of issue #52. Unlike the existing service implementations, we can drop the database interfaces as they add little value. Instead, we'll have separate service implementations for each set of cloud products.

Since the only supported database right now is Google Cloud Spanner, this issue only covers the Spanner implementation of the services.

https://rally1.rallydev.com/#/604612930531d/portfolioitemstreegrid?detail=%2Fuserstory%2F606023411415&fdp=true&fdp=true?fdp=true

Mark certain EventGroups Spanner columns as NOT NULL

In #389 we added new columns to the EventGroups table. Some of these should be NOT NULL, but that would require updates to the internal service which was out of scope for that single PR. Once the internal service has been updated to populate these columns, they should be made NOT NULL:

  • UpdateTime
  • EventGroupDetails
  • EventGroupDetailsJson

Implement Requisitions service

The Requisition Service provides EDPs with instructions about what sketches they should generate.
This service will need substantial refactoring to accommodate consent signaling.

AwsKingdomSchemaTest fails on some environments

The test fails inside of our build container, but passes on GitHub Actions (Ubuntu) as well as my Linux machine (Debian). Here's a sample failure log:

java.lang.IllegalStateException: Process [/usr/local/google/home/sanjayvas/hg/cmms/bazel-container-output/sandbox/processwrapper-sandbox/1411/execroot/wfa_measurement_system/_tmp/d560f597e4ce2ee36ef8c9f75d0f431a/embedded-pg/PG-a026f08d3ce9851e748092d9c1e5585c/bin/initdb, -A, trust, -U, postgres, -D, /usr/local/google/home/sanjayvas/hg/cmms/bazel-container-output/sandbox/processwrapper-sandbox/1411/execroot/wfa_measurement_system/_tmp/d560f597e4ce2ee36ef8c9f75d0f431a/epg4783638036069027391, -E, UTF-8] failed

	at com.opentable.db.postgres.embedded.EmbeddedPostgres.system(EmbeddedPostgres.java:602)
	at com.opentable.db.postgres.embedded.EmbeddedPostgres.initdb(EmbeddedPostgres.java:221)
	at com.opentable.db.postgres.embedded.EmbeddedPostgres.<init>(EmbeddedPostgres.java:142)
	at com.opentable.db.postgres.embedded.EmbeddedPostgres$Builder.start(EmbeddedPostgres.java:554)
	at com.opentable.db.postgres.junit.SingleInstancePostgresRule.pg(SingleInstancePostgresRule.java:46)
	at com.opentable.db.postgres.junit.SingleInstancePostgresRule.before(SingleInstancePostgresRule.java:39)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at com.google.testing.junit.runner.internal.junit4.CancellableRequestFactory$CancellableRunner.run(CancellableRequestFactory.java:108)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at com.google.testing.junit.runner.junit4.JUnit4Runner.run(JUnit4Runner.java:116)
	at com.google.testing.junit.runner.BazelTestRunner.runTestsInSuite(BazelTestRunner.java:159)
	at com.google.testing.junit.runner.BazelTestRunner.main(BazelTestRunner.java:85)	```

It looks like some `initdb` command from embedded Postgres is failing.

How to obtain a CLA?

Hi. How do I obtain a CLA as per the contributing guidelines? It's not clear if I need one before I submit a PR.

Thanks!

Redesign public API pagination

Currently, we use timestamp as the pageToken in various public APIs. This won't work since it is possible that multiple rows have same commit timestamp.

We need to redesign our pagination strategy. Here are some basic requirements.

  1. the pageToken should contain the original request parameters, such that user can get the next page using a request which has only pageToken set.
  2. the pageToken should preciously point to the start of the next page. For example, it can contain the id of the last element in the previous page.
  3. the pageToken should be encoded or even encrypted if it contains potentially sensitive information.

The redesign also impact the kingdom internal StreamXXX methods, since it can not use the create_after filter any more.

Mill looping on "Collection of responses completed exceptionally"

It appears to be happening during SETUP_PHASE. I can't find any matching error in Kingdom's system API server, so I'm guessing the UNKNOWN is coming from a computation control server? The computation eventually appears to fail with "Failing computation due to too many failed attempts.".

Stack trace:

java.util.concurrent.CancellationException: Collection of responses completed exceptionally
	at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22)
	at kotlinx.coroutines.JobKt__JobKt.cancel(Job.kt:596)
	at kotlinx.coroutines.JobKt.cancel(Unknown Source)
	at io.grpc.kotlin.HelpersKt.cancelAndJoin(Helpers.kt:47)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invokeSuspend(ClientCalls.kt:325)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invoke(ClientCalls.kt)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$2.invoke(ClientCalls.kt)
	at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:89)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:166)
	at kotlinx.coroutines.BuildersKt.withContext(Unknown Source)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1.invokeSuspend(ClientCalls.kt:324)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at org.wfanet.measurement.duchy.deploy.common.daemon.mill.liquidlegionsv2.LiquidLegionsV2MillDaemon.run(LiquidLegionsV2MillDaemon.kt:132)
	at org.wfanet.measurement.duchy.deploy.gcloud.daemon.mill.liquidlegionsv2.GcsLiquidLegionsV2MillDaemon.run(GcsLiquidLegionsV2MillDaemon.kt:34)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1919)
	at picocli.CommandLine.access$1100(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
	at picocli.CommandLine.execute(CommandLine.java:2058)
	at org.wfanet.measurement.common.CommandLinesKt.commandLineMain(CommandLines.kt:40)
	at org.wfanet.measurement.duchy.deploy.gcloud.daemon.mill.liquidlegionsv2.GcsLiquidLegionsV2MillDaemonKt.main(GcsLiquidLegionsV2MillDaemon.kt:38)
Caused by: io.grpc.StatusException: UNKNOWN
	at io.grpc.Status.asException(Status.java:550)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:296)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closedInternal(ClientCallImpl.java:751)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closed(ClientCallImpl.java:687)
	at io.grpc.internal.RetriableStream$Sublistener$2.run(RetriableStream.java:853)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
	at io.grpc.internal.RetriableStream$Sublistener.closed(RetriableStream.java:848)
	at io.grpc.internal.ForwardingClientStreamListener.closed(ForwardingClientStreamListener.java:34)
	at io.grpc.internal.InternalSubchannel$CallTracingTransport$1$1.closed(InternalSubchannel.java:693)
	at io.grpc.internal.AbstractClientStream$TransportState.closeListener(AbstractClientStream.java:459)
	at io.grpc.internal.AbstractClientStream$TransportState.access$400(AbstractClientStream.java:221)
	at io.grpc.internal.AbstractClientStream$TransportState$1.run(AbstractClientStream.java:442)
	at io.grpc.internal.AbstractClientStream$TransportState.deframerClosed(AbstractClientStream.java:278)
	at io.grpc.internal.Http2ClientStreamTransportState.deframerClosed(Http2ClientStreamTransportState.java:31)
	at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:233)
	at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191)
	at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:200)
	at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:445)
	at io.grpc.internal.AbstractClientStream$TransportState.transportReportStatus(AbstractClientStream.java:401)
	at io.grpc.internal.AbstractClientStream$TransportState.inboundTrailersReceived(AbstractClientStream.java:384)
	at io.grpc.internal.Http2ClientStreamTransportState.transportTrailersReceived(Http2ClientStreamTransportState.java:183)
	at io.grpc.netty.NettyClientStream$TransportState.transportHeadersReceived(NettyClientStream.java:341)
	at io.grpc.netty.NettyClientHandler.onHeadersRead(NettyClientHandler.java:372)
	at io.grpc.netty.NettyClientHandler.access$1200(NettyClientHandler.java:91)
	at io.grpc.netty.NettyClientHandler$FrameListener.onHeadersRead(NettyClientHandler.java:940)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:337)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:56)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:476)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:484)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

Required fields in v2alpha Requisition.DuchyValue may not be set

If a Measurement transitions to the FAILED state without all Duchies having filled their RequisitionParams, required fields in DuchyValue may not be set.

The intent is for such Requisitions to not be visible in the public API. The current implementation filters by internal Measurement state, which misses this case.

Migrate to common-jvm repository import.

The common JVM code has been copied from this repository to the https://github.com/world-federation-of-advertisers/common-jvm repository. The next step is to import that repository and drop the copies in this one. This includes imports and some of the Starlark definitions in the top-level build package. Some Bazel target visibilities may need to be adjusted.

Note that the common-jvm repo has some build definitions and possible imports that aren't actually used by that repo and don't belong there. For example, Bazel rules for CUE aren't common JVM dependencies. These should not be imported from common-jvm, and should eventually be deleted from there.

The mill has a memory issue

In theory, the mill's memory usage should be reset to ground level between processing two computations. However, when doing extensive performance test, I find the "non-evictable memory" usage of the mill keeps increasing as it processes more and more computations. It is unclear where these "non-evictable memory" comes from. Also the jvm maximum memory is set to 4GB, but the "non-evictable memory" usage can be way more than that, e.g. 10+GB.

image

We don't think there is any memory leakage in either the kotlin or c++ code. But we need a deeper investigation on this issue at the appropriate time.

Use ktfmt to format Kotlin code

We currently use ktlint as both a linter and formatter. Unfortunately, ktlint does not produce deterministic output (given two code samples that differ only by whitespace, it produces different output). ktfmt provides this. It's based on google-java-format, and therefore follows principles such as the Rectangle Rule.

In order to continue to use ktlint as a linter, there are a couple ktlint rules that will need to be disabled in .editorconfig to avoid corner cases where it conflicts with ktfmt.

Duchy should early reject fulfillment request for requisitions at unexpected states.

For example, when a EDP wants to fulfill a requisition which is already locally marked FULFILLED at the duchy. The duchy should confirm the state of the requisition at the kingdom.

  • If it is UNFULFILLED, the duchy should refulfill the requisition, i.e., discarding previously stored data and storing the new data.
  • If it is any other state, the EDP's request should be rejected with an appropriate reason.

Implement Accounts service and resource ownership

Implement the v2alpha public API Accounts service, as well as the relevant methods that involve modifying resource ownership.

The initial version for Milestone 1B need not depend on OpenID Connect. Instead, another identity type (e.g. simple username/password) can be added for creating admin accounts for testing, with the Accounts service not being exposed outside the cluster.

Reporting CLI: Error when filter isn't specified to `event-groups list` command

The --filter option is not required, but an exception is thrown when it's not set. It's incorrectly defined as a lateinit property, which implies it will always eventually be set.

Stack trace:

kotlin.UninitializedPropertyAccessException: lateinit property celFilter has not been initialized
        at org.wfanet.measurement.reporting.service.api.v1alpha.tools.ListEventGroups.run(Reporting.kt:433)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1919)
        at picocli.CommandLine.access$1100(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2159)
        at picocli.CommandLine.execute(CommandLine.java:2058)
        at org.wfanet.measurement.common.CommandLinesKt.commandLineMain(CommandLines.kt:40)
        at org.wfanet.measurement.reporting.service.api.v1alpha.tools.ReportingKt.main(Reporting.kt:497)

No ending reason set for computation

This was spotted in the logs of the Spanner computations server within the "worker2" Duchy on halo-cmm-dev.

WARNING: Stage attempt with primary key (-1008988325, liquid_legions_sketch_aggregation_v2: EXECUTION_PHASE_TWO, 6) did not have an ending reason set when ending computation, setting it to 'CANCELLED'.

Querying the ComputationStageAttempts table for that ComputationId results in the following:

ComputationId ComputationStage Attempt BeginTime EndTime Details DetailsJSON  
-1008988325 1 1 2022-08-30T23:42:09.652017Z 2022-08-30T23:42:11.454366Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 2 1 2022-08-30T23:42:11.454366Z 2022-08-30T23:45:40.688327Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 3 1 2022-08-30T23:45:41.156663Z 2022-08-30T23:45:42.171272Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 4 1 2022-08-30T23:45:42.171272Z 2022-08-30T23:45:50.158201Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 6 1 2022-08-30T23:45:50.753601Z 2022-08-30T23:45:55.385577Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 7 1 2022-08-30T23:45:55.385577Z 2022-08-30T23:46:27.052964Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 8 1 2022-08-30T23:46:27.477582Z 2022-08-30T23:47:03.349957Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 9 1 2022-08-30T23:47:03.349957Z 2022-08-30T23:48:52.269688Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 10 1 2022-08-30T23:48:53.053464Z 2022-08-30T23:55:26.668691Z CAM= {"reasonEnded":"LOCK_OVERWRITTEN"}  
-1008988325 10 2 2022-08-30T23:55:26.668691Z 2022-08-31T00:05:04.430031Z CAQ= {"reasonEnded":"CANCELLED"}  
-1008988325 10 3 2022-08-30T23:56:57.478655Z 2022-08-31T00:05:04.430031Z CAQ= {"reasonEnded":"CANCELLED"}  
-1008988325 10 4 2022-08-30T23:57:12.992429Z 2022-08-31T00:05:04.430031Z CAQ= {"reasonEnded":"CANCELLED"}  
-1008988325 10 5 2022-08-30T23:57:41.693306Z 2022-08-31T00:05:04.430031Z CAQ= {"reasonEnded":"CANCELLED"}  
-1008988325 10 6 2022-08-30T23:58:20.300635Z 2022-08-31T00:05:04.430031Z CAQ= {"reasonEnded":"CANCELLED"}  
-1008988325 10 7 2022-08-31T00:00:21.112744Z 2022-08-31T00:00:27.892296Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 11 1 2022-08-31T00:00:27.892296Z 2022-08-31T00:04:17.380198Z CAE= {"reasonEnded":"SUCCEEDED"}  
-1008988325 12 1 2022-08-31T00:04:28.110317Z 2022-08-31T00:05:04.430031Z CAE= {"reasonEnded":"SUCCEEDED"}  

Querying the Computations table gave the following for ComputationDetailsJson (prettified using jq):

{
  "blobsStoragePrefix": "computation-blob-storage/Ih4-Qg7iAys",
  "endingState": "SUCCEEDED",
  "kingdomComputation": {
    "publicApiVersion": "v2alpha",
    "measurementSpec": "CuoBCAES5QEIrfyPlQUS3AEKzwEKPXR5cGUuZ29vZ2xlYXBpcy5jb20vZ29vZ2xlLmNyeXB0by50aW5rLkVjaWVzQWVhZEhrZGZQdWJsaWNLZXkSiwESRAoECAIQAxI6EjgKMHR5cGUuZ29vZ2xlYXBpcy5jb20vZ29vZ2xlLmNyeXB0by50aW5rLkFlc0djbUtleRICEBAYARgBGiEA2VOaFS7y+xGQvC2VydELNZValk7ASE8EXHny7b0tOX4iIC1a/MlmcWmzimHAUyPjtZfuRLqtlS6qUdkLj8/eb15bGAMQARit/I+VBSABEiAK2jUhaG3CB4sA8IS1t582X7x2aWCdOEr4+lqjqBkDJRIgVi/udYPVVK3ocMFYIggJTC21hMdelowv9jyFiLV79LoSINTgsqFF0GJjrqzzpp+8kdOHgijNPPYLNqdoY9uMZCO3EiDUJg7UQQgGifrpLCazfG5sQTFCv1ckAGxY+3ThDCvjDhIgLjlj+A1Nd/4SfKninOX/2pv4mJkf3/x0lcwcAxIXu+kSILCgiiq0Lb1tNvVCFl5rAMzMYfqzOb8AmbUBN9NAttPfGgUVAACAPyIoChIJmpmZmZmZuT8Rje21oPfGsD4SEgmamZmZmZm5PxGN7bWg98awPg==",
    "measurementPublicKey": {
      "format": "TINK_KEYSET",
      "data": "CK38j5UFEtwBCs8BCj10eXBlLmdvb2dsZWFwaXMuY29tL2dvb2dsZS5jcnlwdG8udGluay5FY2llc0FlYWRIa2RmUHVibGljS2V5EosBEkQKBAgCEAMSOhI4CjB0eXBlLmdvb2dsZWFwaXMuY29tL2dvb2dsZS5jcnlwdG8udGluay5BZXNHY21LZXkSAhAQGAEYARohANlTmhUu8vsRkLwtlcnRCzWVWpZOwEhPBFx58u29LTl+IiAtWvzJZnFps4phwFMj47WX7kS6rZUuqlHZC4/P3m9eWxgDEAEYrfyPlQUgAQ=="
    }
  },
  "liquidLegionsV2": {
    "role": "NON_AGGREGATOR",
    "parameters": {
      "maximumFrequency": 10,
      "liquidLegionsSketch": {
        "decayRate": 12,
        "size": "100000"
      },
      "noise": {
        "reachNoiseConfig": {
          "blindHistogramNoise": {
            "epsilon": 1,
            "delta": 1
          },
          "noiseForPublisherNoise": {
            "epsilon": 1,
            "delta": 1
          },
          "globalReachDpNoise": {
            "epsilon": 0.1,
            "delta": 1e-06
          }
        },
        "frequencyNoiseConfig": {
          "epsilon": 0.1,
          "delta": 1e-06
        }
      },
      "ellipticCurveId": 415
    },
    "participant": [
      {
        "duchyId": "worker2",
        "publicKey": {
          "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
          "element": "Axql5u/MneZ+Me25qE06TKoKDNF3Get5XxtdcIHivzF1"
        },
        "elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQMapebvzJ3mfjHtuahNOkyqCgzRdxnreV8bXXCB4r8xdQ==",
        "elGamalPublicKeySignature": "MEUCIDORmvsjh07/tUI4HrGmONRElUpRThijtBO837moHe8hAiEA8b0mAhLuhkqhYbjmkbJvOr2rBaYSmnzzT78fHQEgSqE=",
        "duchyCertificateDer": "MIIB5TCCAYugAwIBAgIUD78MTB4sT79XxdvNwvBr/qHA5OowCgYIKoZIzj0EAwIwLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApXb3JrZXIyIENBMB4XDTIyMDcyOTE4MTMzOVoXDTMyMDcyNjE4MTMzOVowKTEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRAwDgYDVQQDDAdXb3JrZXIyMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAENwAOHu7k/lpUwdQe0y904QkD0JIvpfty8Sjo9IXVLjq53wqO7bSgmDIt2QYebVThciOd4UKC8doi7xFQXciimKOBjTCBijAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFK9pSQh8B6Q598dGOpiTvvUl16O7MB0GA1UdDgQWBBQPQhRN905ulZIoJnHJNyscGulbXTALBgNVHQ8EBAMCBeAwMAYDVR0RBCkwJ4IaKi53b3JrZXIyLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNIADBFAiEA15agdGryz+9x7uouhDw/Zf2C/xnOUo9OVbQ53EDvdK8CIGpl7DEzpIYItGAGPFBltA5gEepfpL88OrAdZHF8bod3"
      },
      {
        "duchyId": "worker1",
        "publicKey": {
          "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
          "element": "Az6TDc5eiEUB6K7GH1OEqObinexiepQzy//kVxgz31zj"
        },
        "elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQM+kw3OXohFAeiuxh9ThKjm4p3sYnqUM8v/5FcYM99c4w==",
        "elGamalPublicKeySignature": "MEYCIQDzMqDqV1bBhpvZ2sRsozrkpK3bVPtdcwq15/DYaAmLeAIhAIa2zmT9xTl0Gs/0wj/sELa62cu/9ONcvH/ykNugZ6Kd",
        "duchyCertificateDer": "MIIB5TCCAYugAwIBAgIUPYEx9H1BSNg7BAxMpgKP8f6akPAwCgYIKoZIzj0EAwIwLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApXb3JrZXIxIENBMB4XDTIyMDcyOTE4MTUwNFoXDTMyMDcyNjE4MTUwNFowKTEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRAwDgYDVQQDDAdXb3JrZXIxMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEe5z8YtYYtKimvGRnOwpnxZQS2WLepJ/MW9M5AXp79Wg0f2ppOXvXPncuw6ECJDbI034avjfOJLXUF5Zx8SltU6OBjTCBijAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFBBZC7JmE03G1c74SRXJ11Y1eaLpMB0GA1UdDgQWBBRphWYwUOESM+daAXNc+yy4XV3oOjALBgNVHQ8EBAMCBeAwMAYDVR0RBCkwJ4IaKi53b3JrZXIxLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNIADBFAiEAnem13Lo79Nc36ur5U37JoGTULVbZmWwPjklUn+sZPyMCIFe8PWzymm1NJbG/42iMTBs1d9e2Eg/TVM1OB9L1sXtV"
      },
      {
        "duchyId": "aggregator",
        "publicKey": {
          "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
          "element": "A20YcYJacmlE1rWcZWiC7HTi8AUIZzSS1jXBY295j3q4"
        },
        "elGamalPublicKey": "EiEDaxfR8uEsQkf4vOblY6RA8ncDfYEt6zOg9KE5RdiYwpYaIQNtGHGCWnJpRNa1nGVogux04vAFCGc0ktY1wWNveY96uA==",
        "elGamalPublicKeySignature": "MEUCIQC7IaNLHAFOcrEzPQhlfoQTDxWzgKzw0Z08duSPAZ4/+wIgVfTemiYP62WPp4LaLeNCxsd2acma5J6UoeKalqF/+Dc=",
        "duchyCertificateDer": "MIIB7TCCAZSgAwIBAgIUMWePkRD78fxf7XR/FJRI+99c8kYwCgYIKoZIzj0EAwIwLzEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRYwFAYDVQQDDA1BZ2dyZWdhdG9yIENBMB4XDTIyMDcyOTE4MTgzN1oXDTMyMDcyNjE4MTgzN1owLDEVMBMGA1UECgwMSGFsbyBDTU0gRGV2MRMwEQYDVQQDDApBZ2dyZWdhdG9yMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE4Pt+OVbT1ne1+jHHKKMEwYwAA8oHf2wCsl7x8G5LoSQqvVR8V8JKJUNOA5i4z5jBWHGt/1TAZ8A7iYROuZZiFaOBkDCBjTAJBgNVHRMEAjAAMB8GA1UdIwQYMBaAFEetV5EszT17lLVF0r4f7V52atw5MB0GA1UdDgQWBBRYH+y724vCS2A9KdGUevcPWIwtrTALBgNVHQ8EBAMCBeAwMwYDVR0RBCwwKoIdKi5hZ2dyZWdhdG9yLmRldi5oYWxvLWNtbS5vcmeCCWxvY2FsaG9zdDAKBggqhkjOPQQDAgNHADBEAiA44QCvuMHdmYMq5ROUmPA5XnxWKtbX3bl+Wb8mMCL3mQIgQJGcGkMPOqy274MNK8DTlNLqyVNO0IDdQVSgpaE8WZQ="
      }
    ],
    "combinedPublicKey": {
      "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
      "element": "A8leFaNR9+Vw/NUfBkGksb4di3Wb0pTeGP1Zg2uVgN+B"
    },
    "partiallyCombinedPublicKey": {
      "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
      "element": "AlyNNsjQuLnaDAEx62NGjOKKdmyd7VnSd4ewrcJ/0DcA"
    },
    "localElgamalKey": {
      "secretKey": "osvxlSP1mb8apAAH991QndvgVm/ZbjPnfJGQ53bOm30=",
      "publicKey": {
        "generator": "A2sX0fLhLEJH+Lzm5WOkQPJ3A32BLeszoPShOUXYmMKW",
        "element": "Axql5u/MneZ+Me25qE06TKoKDNF3Get5XxtdcIHivzF1"
      }
    }
  }
}

Reporting Server: Incorrect format used for MC filtering

Report from @mariolamassaavedra:

We are hitting Halo’s Dev using listEventGrooups and getting the following error:

Measurement Consumer Name in filter invalid.

This is due to the Reporting server calling the kingdom’s listEventGroups method with the filter field value as filter: ex87HOXdfPY (sample id) instead of being filter: measurementConsumers/ex87HOXdfPY as per the kingdom’s api definition - We have deployed our own version of the code in our environment and the call works correctly.

Herald retries computation start/update unconditionally.

Retries for start/update were added in f02986e to deal with a potential race condition. This was prior to the codebase being open-sourced, so the following is copied from the Google-internal issue b/168551392:

The Herald has a race condition:

  1. the mill in the last unconfirmed duchy checks the local requisition and finds all are available
  2. the mill confirms to the kingdom
  3. the mill updates local computation to WAIT_TO_START
  4. the kingdom updates the computation to RUNNING and sends out the status change to all connected heralds
  5. the herald tries to update the local computation to TO_ADD_NOISE

Unfortunately, this retry is unconditional on any exception. syncStatuses also wraps the higher-level processSystemComputationChange in retries, but those check the gRPC status to determine if the operation is retriable. This would likely be insufficient for the start/update case as it's a state mismatch.

editVersion mismatch in RecordOutputBlobPath

Digging into an error that I sometimes (but not always) see popping up in the integration test. It shows up as TRANSIENT error in the LLv2 Mill. The source appears to be GcpSpannerComputationsDatabaseTransactor. Adding more logging gives the following:

wfa.measurement.internal.duchy.Computations RecordOutputBlobPath
WARNING: [DefaultDispatcher-worker-2 @coroutine#3852] gRPC error: UNKNOWN
java.lang.IllegalStateException: Failed to update because of editVersion mismatch.
  Token's editVersion: 1662062844075 (2022-09-01T20:07:24.075Z)
  Computations table's UpdateTime: 1662062846260 (2022-09-01T20:07:26.260354Z)
  Difference: PT2.185354S
	at org.wfanet.measurement.duchy.deploy.gcloud.spanner.computation.GcpSpannerComputationsDatabaseTransactor$runIfTokenFromLastUpdate$2.invokeSuspend(GcpSpannerComputationsDatabaseTransactor.kt:690)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Ensure that 0 is never used as valid integer ID

Our integer ID fields utilize the proto3 default value of 0 as equivalent to not being specified. Therefore, we must ensure that we never consider 0 to be a valid ID.

We could ensure that IdGenerator impls never return 0, or make sure internal service impls never insert a row with a 0 ID (including adding tests for this).

Handle PBM fields that are not populated.

If an MC creates an event filter without a PBM field like without an age restriction, PBM should understand and charge matching rows with all the ages. This is not done now.

SpannerWriter.execute takes in a misleading clock parameter.

The execute method of SpannerWriter has a misleading clock parameter, which is used to construct a SpannerWriter.TransactionScope. It is used neither to set the timestamp bound of the transaction nor has any impact on commit timestamps. Instead, it appears to only be used to populate the timestamps in ExchangeStepAttemptDetails.

This parameter should be dropped. It can be passed to subclass constructor in cases where it's really needed.

KotlinCompile action warns of runtime JARs in classpath with incompatible versions

Excerpt from build log:

INFO: From KotlinCompile @com_github_grpc_grpc_kotlin//compiler/src/main/java/io/grpc/kotlin/generator/protoc:protoc { kt: 18, java: 0, srcjars: 0 } for k8 [for host]:
warning: runtime JAR files in the classpath should have the same version. These files were found in the classpath:
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk7.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk8.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-common/1.4.30/kotlin-stdlib-common-1.4.30.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib/1.4.30/kotlin-stdlib-1.4.30.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.61/kotlin-reflect-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-script-runtime.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-reflect.jar (version 1.4)
warning: consider providing an explicit dependency on kotlin-reflect 1.4 to prevent strange errors
warning: some runtime JAR files in the classpath have an incompatible version. Consider removing them from the classpath
INFO: From KotlinCompile @com_github_grpc_grpc_kotlin//compiler/src/main/java/io/grpc/kotlin/generator:generator { kt: 6, java: 0, srcjars: 0 } for k8 [for host]:
warning: runtime JAR files in the classpath should have the same version. These files were found in the classpath:
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk7.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-stdlib-jdk8.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib-common/1.4.30/kotlin-stdlib-common-1.4.30.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-stdlib/1.4.30/kotlin-stdlib-1.4.30.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/maven/v1/https/repo.maven.apache.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.61/kotlin-reflect-1.3.61.jar (version 1.3)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-script-runtime.jar (version 1.4)
    /usr/local/google/home/sanjayvas/.cache/bazel/_bazel_sanjayvas/b471b3a2d1215b70045d7f5bfa478e3e/execroot/wfa_measurement_system/external/com_github_jetbrains_kotlin/lib/kotlin-reflect.jar (version 1.4)
warning: consider providing an explicit dependency on kotlin-reflect 1.4 to prevent strange errors
warning: some runtime JAR files in the classpath have an incompatible version. Consider removing them from the classpath

Remove all kingdom daemons.

With the v2alpha design, the kingdom will only contain a list of grpc services. No need to have those daemons anymore.

Kingdom data server running out of heap space

This may indicate a memory leak, as the Kingdom data server doesn't handle large payloads. This persisted even after increasing container memory to 512MiB and increasing the heap percentage to 40.

Sample traces:

com.google.cloud.spanner.SpannerException: UNKNOWN: Java heap space
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:291)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:297)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:61)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:181)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:110)
	at com.google.cloud.spanner.SpannerExceptionFactory.asSpannerException(SpannerExceptionFactory.java:100)
	at com.google.cloud.spanner.AbstractResultSet$GrpcResultSet.next(AbstractResultSet.java:137)
	at com.google.cloud.spanner.ForwardingResultSet.next(ForwardingResultSet.java:54)
	at com.google.cloud.spanner.SessionPool$AutoClosingReadContext$1.internalNext(SessionPool.java:273)
	at com.google.cloud.spanner.SessionPool$AutoClosingReadContext$1.next(SessionPool.java:253)
	at com.google.cloud.spanner.AsyncResultSetImpl$ProduceRowsCallable.call(AsyncResultSetImpl.java:340)
	at com.google.cloud.spanner.AsyncResultSetImpl$ProduceRowsCallable.call(AsyncResultSetImpl.java:334)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
	at com.google.protobuf.UnknownFieldSet$Builder.<init>(UnknownFieldSet.java:308)
	at com.google.protobuf.UnknownFieldSet$Builder.create(UnknownFieldSet.java:311)
	at com.google.protobuf.UnknownFieldSet$Builder.access$000(UnknownFieldSet.java:304)
	at com.google.protobuf.UnknownFieldSet.newBuilder(UnknownFieldSet.java:71)
	at com.google.protobuf.Value.<init>(Value.java:50)
	at com.google.protobuf.Value.<init>(Value.java:17)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
	at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
	at com.google.protobuf.ListValue.<init>(ListValue.java:64)
	at com.google.protobuf.ListValue.<init>(ListValue.java:14)
	at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)
	at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:849)
	at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
	at com.google.protobuf.Value.<init>(Value.java:101)
	at com.google.protobuf.Value.<init>(Value.java:17)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
	at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
	at com.google.protobuf.ListValue.<init>(ListValue.java:64)
	at com.google.protobuf.ListValue.<init>(ListValue.java:14)
	at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)
	at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:849)
	at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
	at com.google.protobuf.Value.<init>(Value.java:101)
	at com.google.protobuf.Value.<init>(Value.java:17)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1635)
	at com.google.protobuf.Value$1.parsePartialFrom(Value.java:1629)
	at com.google.protobuf.CodedInputStream$ArrayDecoder.readMessage(CodedInputStream.java:889)
	at com.google.protobuf.ListValue.<init>(ListValue.java:64)
	at com.google.protobuf.ListValue.<init>(ListValue.java:14)
	at com.google.protobuf.ListValue$1.parsePartialFrom(ListValue.java:855)

Herald should be able to fail a computation if necessary.

Currently, the herald just processes kingdom updates one by one, if it fails to process a computation it would retry. It is not possible to skip a computation if it fails indefinitely.

We need to allow the herald to permanently fail a computation locally and update to the kingdom if necessary.

Switch Kotlin language version to 1.4

The system is currently using Kotlin language version 1.3. More recent versions of rules_kotlin support language version 1.4.

See Kotlin release details for the appropriate matching version of kotlinx.couroutines.

  • Update to release of rules_kotlin that supports language version 1.4.
  • Switch to language version 1.4 with appropriate library versions (e.g. kotlinx.coroutines).
  • Remove unnecessary annotations for standard/extension library types that are no longer in preview/experimental status.

SpannerReader doesn't handle external keys with multiple components.

SpannerReader has functionality for reading by external ID that assumes that the row in question has exactly one ID component in its external key. This is not true for child tables, as the external ID is only unique for a given parent. That is to say that keys made up of external IDs have the same parent-child relationship as primary keys with multiple components.

Most likely, SpannerReader should just be dropped. It has some other ugliness, such as:

  • Having an unused non-null version of readExternalId which doesn't make sense with how Kotlin handles nullability.
  • Having a withBuilder method that returns the same SpannerReader instance after mutation, as opposed to following the common pattern of withX methods returning a copy.

readinessProbe is not working with mTLS configured.

I tried to follow this instruction, and updated the ServerPod to

#ServerPod: #Pod & {
	_ports: [{containerPort: 8080}]
	spec: containers: [{
		readinessProbe: {
			exec: command: [
				"/app/grpc_health_probe/file/grpc-health-probe",
				"--addr=:8080",
				"--tls=true",
				"--tls-ca-cert=/var/run/secrets/files/all_root_certs.pem",
				"--tls-client-cert=/var/run/secrets/files/aggregator_server.pem",
				"--tls-client-key=/var/run/secrets/files/aggregator_server.key",
				"--connect-timeout=10s",
				"--rpc-timeout=10s",
			]
			periodSeconds: 60
			timeoutSeconds: 10
		}}]
}

The kingdom servers can be at READY, but duchies servers are NOT.

NAME                                              READY   STATUS      RESTARTS   AGE
aggregator-async-computation-control-server-pod   0/1     Running     0          74s
aggregator-computation-control-server-pod         0/1     Running     0          74s
aggregator-requisition-fulfillment-server-pod     0/1     Running     0          73s
aggregator-spanner-computations-server-pod        0/1     Running     0          73s
gcp-kingdom-data-server-pod                       1/1     Running     0          75s
system-api-server-pod                             1/1     Running     0          75s
worker-1-async-computation-control-server-pod     0/1     Running     0          72s
worker-1-computation-control-server-pod           0/1     Running     0          72s
worker-1-requisition-fulfillment-server-pod       0/1     Running     0          72s
worker-1-spanner-computations-server-pod          0/1     Running     0          72s
worker-2-async-computation-control-server-pod     0/1     Running     0          71s
worker-2-computation-control-server-pod           0/1     Running     0          71s
worker-2-requisition-fulfillment-server-pod       0/1     Running     0          71s
worker-2-spanner-computations-server-pod          0/1     Running     0          71s

error info

Readiness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 10s exceeded: context deadline exceeded

I've no idea why kingdom and duchy have different result.

Temporarily removed readinessProbe from all pods to unblock the correctnessTest. will revisit later.

Try `-XX:MaxRAMPercentage` option for JVM in K8s

As of Java 10 (and backported to later versions of Java 8), the UseContainerSupport option is on by default. This means that we can set the max heap size to be a fraction of the container memory limit rather than having to manually ensure that the JVM max heap size is sufficiently smaller than the container memory limit.

We likely want -XX:InitialRAMPercentage to be the same value, for the same reasons that it's recommended to have -Xms be the same value as -Xmx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.