spacemeshos / post Goto Github PK
View Code? Open in Web Editor NEWSpacemesh POST protocol implementation
License: MIT License
Spacemesh POST protocol implementation
License: MIT License
There is a big demand for postcli. We should include postcli as a binary so users don't have to compile it by hand.
When let's say one is renting GPUs for initialisation or NOT using storage on the same device then it's very inconvenient to hope that the process never restarts.
We need a way to specify --startFrom xyz
where xyz
is file,labels
The current implementation is limited by Golang possibilities #99
We have done PoC version in Rust.
We need to follow with full rust implementation of the algorithm and then integration of it to the Golang codebase.
Besides the implementation we need to make sure that we remember about all the needed improvements that is (but not limited to):
To guarantee that proof is persisted we need to:
Otherwise, if the node crashes proof might be lost (depends if kernel was able to sync it or not in the background).
Support building a stand-alone system process. In this mode, the process provides a GRPC interface for creating post init jobs which should include the single provider and multiple providers job. The CI should build this stand-alone process for all supported platforms / architectures (currently 3 targets).
Node ID and commitment ATX ID are formatted in base64 both in postdata_metadata.json and in post-rs CLI initializer. However, postcli accepts them in hex. They are also formatted in hex when logged in go-spacemesh. It is inconsistent and causes extra work to convert.
We should format ATXes in the same way everywhere (either hex or base64 - this is to decide).
We need to make using multiple (100s of thousands) nonces in proving harder. Therefore we need to add POW to the initial phase of AES.
quoting Iddo:
the scrypt+blake3 hash for k2pow will require that either all miners use GPU
for proof generation, or lousy k1,k2 params and large ATX size for no good reason.
The scrypt+sha3 that we use for labels is the worst thing that we can do because
it has two extra disadvantages (efficient adversarial method for label computation
will also break k2pow for free, and sha3 is inferior in general and we shouldn't use
it in software ever).
We should just link to standard randomx repo (both rust crate and golang package
are available), it's easy to do in 5 minutes because it's totally unrelated to the GPU
code, it's probably even easier than having two separate scrypt+sha3 functions
and definitely easier than scrypt+blake3.
Proof generation and verification should be able to be cancelled for multiple reasons:
VerifyVRF
, Verify
and Generate
can be passed a context that when cancelled aborts verification / generation of proof.
Generate
is passed a context with a timeout at the end of the cycle gap.Verify
and VerifyVRF
is passed the "App-Context" that is canceled when the node is shut down.VerifyVRF
a passed context.Context
that it is checked for before calling the oracle is sufficient.Verify
a passed context.Context
that t it checks before calling the underlying Rust code is sufficient as wellGenerate
already receives a context.Context
that is not evaluated at the moment. A simple go-routine that that signals the rust code to stop when needed should be enough:// returns wrapper for object from Rust
// that wraps FFI that needs to be defined
generator := postrs.Generate()
defer generator.Stop()
done := make(chan struct{})
var eg errgroup.Group
eg.Go(func() error {
select {
case <-done:
return nil
case <-ctx.Done():
return generator.Stop()
}
})
proof, err := generator.Start()
close(done)
return proof, err
> 2023-06-14T12:27:43.666+0100 INFO 02682.post proving: generated proof {"node_id": "02682d8d0eefae9596c65a9d9c4fac73576a6ecaaae236e167b4e7329512c7b9", "module": "post"}
2023-06-14T12:27:43.666+0100 DEBUG 02682.post proving: generated proof {"node_id": "02682d8d0eefae9596c65a9d9c4fac73576a6ecaaae236e167b4e7329512c7b9", "module": "post", "Nonce": 8, "Indices": "de76780728c2a9103883264000533cf1a0497ef9e2bd4fe5466fd12b0e912dd71cf693ec3fa0caff833896b621f12388464276b9bdc450c77d0a301a7f1945785236f105cd7742a90d0949c8a813ba4a455b74bb75e0a8f560ea116bbb54a7442cfe4f1117e7acce9ab45b994047c1d295b3a2e598411c6cecf65c581f56af33e931276dbe50cedd68127bd2c1f0b3b96300d574b78ae9ac0f", "K2PoW": 5764607523034254532}
2023-06-14T12:55:10.148+0100 DEBUG 02682.post Initializing labels 1329758894..1329758895... {"node_id": "02682d8d0eefae9596c65a9d9c4fac73576a6ecaaae236e167b4e7329512c7b9", "module": "post", "module": "post::initialize", "file": "src\\[initialize.rs](http://initialize.rs/)", "line": 108}
2023-06-14T12:55:10.149+0100 DEBUG 02682.post Initializing labels 2288144977..2288144978... {"node_id": "02682d8d0eefae9596c65a9d9c4fac73576a6ecaaae236e167b4e7329512c7b9", "module": "post", "module": "post::initialize", "file": "src\\[initialize.rs](http://initialize.rs/)", "line": 108}
that's what's logged during nipost. Should be removed because it's confusing.
testing on win image fails with errors
==2832==ERROR: ThreadSanitizer failed to allocate 0x000000999000 (10063872) bytes at 0x200db79ba0000 (error code: 87)
possible solution add this rows before run unit tests
- name: Install mingw 10.2.0
if: matrix.os == 'windows-latest'
run: choco install mingw --version 10.2.0 --allow-downgrade
c:/programdata/chocolatey/lib/mingw/tools/install/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.2.0/../../../../x86_64-w64-mingw32/lib/../lib/libmsvcrt.a(/2203): duplicate symbol reference: _unlock_file in both libgcc(.text) and libgcc(.data)
libgcc(.text): relocation target atexit not defined
The package structure of PoST is quite confusing and could be improved for the next major release. I propose the following layout for clearer naming of types and generally reducing the number of individual packages a user of the library has to interact with:
post
: The root package containing the module definition, the initializer, all types and functions associated with it and everything that is currently in config
and shared
and doesn't fit and one of the following packagespost/internal
: Internal functionality not to be used directly by users of this library. This includes the bridges to the C and rust code and possibly other internal types and functions not intended to be available outside of the modulepost/internal/oracle
: The oracle should be an internal package since its use outside of generating and verifying proofs is limited and users of the library should call those functions over the oracle directly.post/proof
: Contains the Generate
and Verify
functions, as well as the Proof
and Metadata
types from shared
.With this structure it is unlikely that a user of the library has to import more than one package at a time, while at the moment proving
and verifying
also always require to also import config
and shared
both of which collide with packages with the same name in other modules. post.Config
is more meaningful than config.Config
, proving.Generate
/ verifying.Verify
are less clear compared to proof.Generate
/ proof.Verify
.
Also there is a clear dependency chain between packages of these structure:
post/proof
-> post
-> post/internal
The former packages only import the later and not vice versa.
The goal is to replace the existing gpu-post
integration with the newly written OpenCL intitializer
gpu-post
integration is disabled, not removed (in case it is still needed). Switching between OpenCL and gpu-post
is OK to require a rebuild of post
.gpu-post
Multiple miners in 0.2 devnet are leaking goroutines:
**goroutine profile: total 598
323 @ 0x43c745 0x407d0a 0x407ab5 0xbe1d87 0x472161
# 0xbe1d86 github.com/spacemeshos/post/proving.(*Prover).tryNonces.func1+0x66 /go/pkg/mod/github.com/spacemeshos/[email protected]/proving/proving.go:263
30 @ 0x43c745 0x44c6cf 0xe222f4 0x472161
# 0xe222f3 github.com/spacemeshos/go-spacemesh/p2p/net.(*MsgConnection).sendListener+0xf3 /go/src/github.com/spacemeshos/go-spacemesh/p2p/net/msgcon.go:157
23 @ 0x43c745 0x44c6cf 0xe2ce05 0xe23bea 0x472161
# 0xe2ce04 github.com/spacemeshos/go-spacemesh/p2p/net.(*udpConnWrapper).Read+0xc4 /go/src/github.com/spacemeshos/go-spacemesh/p2p/net/udp.go:402
# 0xe23be9 github.com/spacemeshos/go-spacemesh/p2p/net.(*MsgConnection).beginEventProcessing+0x89 /go/src/github.com/spacemeshos/go-spacemesh/p2p/net/msgcon.go:265**
Stack frames are accurate for current head of develop (5cf91a7)
We wish to be able to run "initialize" by post as CLI.
The CLI should be able to receive 3 arguments as follow:
Example:
./post-init --space=104876 --filesize=262144
This will create init files of size 262144 bytes each in the default home dir
return value: Exit code =0 on success
Add feature to verify an existing post data. User specifies the post params (including smesher id) and the method should run tests to verify that the post is valid for the smesher. The motivation of this is enabling users to verify if a post they created is valid without having to wait for the node to use it and to log errors in case it is invalid.
For more information please check logs here:
Sentry Issue on Windows: SMAPP-14S
Sentry issue on Linux Sentry issue: SMAPP-15D
2023-01-06T10:51:05.417-0800 WARN d898e.hare missed hare window, skipping layer {"node_id": "d898e00cca6e1549a760df69947bc63c544313f30f8800e1ed0f48e5f9c7d160", "module": "hare", "layer_id": 5530, "name": "hare"}
2023-01-06T10:51:05.417-0800 INFO 00000.defaultLogger starting new grpc server with 7 registered service(s)
2023-01-06T10:51:05.417-0800 INFO app started
2023-01-06T10:51:05.417-0800 INFO 00000.defaultLogger starting new grpc server on :9092
2023-01-06T10:51:06.256-0800 INFO 00000.defaultLogger GRPC MeshService.CurrentLayer
2023-01-06T10:51:06.261-0800 INFO 00000.defaultLogger GRPC GlobalStateService.AccountDataQuery
2023-01-06T10:51:06.268-0800 INFO 00000.defaultLogger GRPC NodeService.Status
2023-01-06T10:51:06.270-0800 INFO 00000.defaultLogger GRPC SmesherService.SmesherID
2023-01-06T10:51:06.297-0800 INFO 00000.defaultLogger GRPC NodeService.Echo
2023-01-06T10:51:06.299-0800 INFO 00000.defaultLogger GRPC NodeService.Status
2023-01-06T10:51:06.301-0800 INFO 00000.defaultLogger GRPC NodeService.StatusStream
2023-01-06T10:51:06.301-0800 INFO 00000.defaultLogger GRPC NodeService.ErrorStream
2023-01-06T10:51:06.304-0800 INFO 00000.defaultLogger GRPC SmesherService.PostSetupStatus
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xa0 pc=0x107b105]
goroutine 272 [running]:
github.com/spacemeshos/post/initialization.(*Initializer).NumLabelsWritten(...)
/home/runner/go/pkg/mod/github.com/spacemeshos/[email protected]/initialization/initialization.go:302
github.com/spacemeshos/go-spacemesh/activation.(*PostSetupManager).Status(0xc0003e2480)
/home/runner/work/go-spacemesh/go-spacemesh/activation/post.go:114 +0xa5
github.com/spacemeshos/go-spacemesh/api/grpcserver.SmesherService.PostSetupStatus({{0x1b7b210, 0xc0003e2480}, {0x1b7f400, 0xc001258d20}, 0x3b9aca00}, {0xc0013f2510, 0xf}, 0x1c)
/home/runner/work/go-spacemesh/go-spacemesh/api/grpcserver/smesher_service.go:173 +0x59
github.com/spacemeshos/api/release/go/spacemesh/v1._SmesherService_PostSetupStatus_Handler.func1({0x1b7a138, 0xc000472c90}, {0x157a960?, 0xc000472c00})
/home/runner/go/pkg/mod/github.com/spacemeshos/api/release/[email protected]/spacemesh/v1/smesher.pb.go:646 +0x78
github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1({0x1b7a138, 0xc000472c60}, {0x157a960, 0xc000472c00}, 0xc000946680, 0xc000111758)
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/logging/zap/server_interceptors.go:31 +0x115
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x1b7a138?, 0xc000472c60?}, {0x157a960?, 0xc000472c00?})
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware/tags.UnaryServerInterceptor.func1({0x1b7a138?, 0xc000472bd0?}, {0x157a960, 0xc000472c00}, 0xc000946680, 0xc0009466a0)
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/tags/interceptors.go:23 +0xa6
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x1b7a138?, 0xc000472bd0?}, {0x157a960?, 0xc000472c00?})
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x1b7a138, 0xc000472bd0}, {0x157a960, 0xc000472c00}, 0xc001569a20?, 0x14e7420?)
/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xbe
github.com/spacemeshos/api/release/go/spacemesh/v1._SmesherService_PostSetupStatus_Handler({0x167b2a0?, 0xc001431530}, {0x1b7a138, 0xc000472bd0}, 0xc00027b030, 0xc001431050)
/home/runner/go/pkg/mod/github.com/spacemeshos/api/release/[email protected]/spacemesh/v1/smesher.pb.go:648 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0017a63c0, {0x1b846a0, 0xc000007d40}, 0xc000f17200, 0xc0014315c0, 0x252c3b8, 0x0)
/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1340 +0xd23
google.golang.org/grpc.(*Server).handleStream(0xc0017a63c0, {0x1b846a0, 0xc000007d40}, 0xc000f17200, 0x0)
/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1713 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:963 +0x28a
2023-01-06T10:59:55.009-0800 INFO 00000.defaultLogger App version: v0.2.20-beta.0. Git: 8c5a399 - 8c5a3991491b3663973766942d1b7e06fc194af8 . Go Version: go1.19.4. OS: linux-amd64
2023-01-06T10:59:55.009-0800 INFO 00000.defaultLogger Welcome to Spacemesh. Spacemesh full node is starting...```
The postcli can be used to initialize only a subset of data using -fromFile
and -toFile
flags (for example when re-initializing corrupted/missing data on a different machine or splitting initialization between machines).
The current behavior is to continue looking for VRF nonce (which is very likely to happen if initializing only a subset). It should be possible to disable looking for VRF nonce past the requested size (either automatically when -fromFile
or -toFile
flag is used or via a flag like -disableEagerVrfSearch
or something like that.
Environment is macbook pro M1
Consider following logs:
3476:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: file #28 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23477:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: starting to write file #29; target number of labels: 625000, start position: 18125000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23478:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: file #29 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23479:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: starting to write file #30; target number of labels: 625000, start position: 18750000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23480:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: file #30 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23481:2023-03-29T15:01:06.595+0200 INFO 7b957.app.7b957.post initialization: starting to write file #31; target number of labels: 625000, start position: 19375000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
[cut cut cut all was the same]
23600:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: file #90 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23601:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: starting to write file #91; target number of labels: 625000, start position: 56875000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23602:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: file #91 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23603:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: starting to write file #92; target number of labels: 625000, start position: 57500000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23604:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: file #92 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23605:2023-03-29T15:01:06.605+0200 INFO 7b957.app.7b957.post initialization: starting to write file #93; target number of labels: 625000, start position: 58125000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23606:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #93 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23607:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #94; target number of labels: 625000, start position: 58750000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23608:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #94 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23609:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #95; target number of labels: 625000, start position: 59375000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23610:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #95 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23611:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #96; target number of labels: 625000, start position: 60000000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23612:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #96 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23613:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #97; target number of labels: 625000, start position: 60625000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23614:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #97 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23615:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #98; target number of labels: 625000, start position: 61250000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23616:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #98 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23617:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #99; target number of labels: 625000, start position: 61875000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23618:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: file #99 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23619:2023-03-29T15:01:06.606+0200 INFO 7b957.app.7b957.post initialization: starting to write file #100; target number of labels: 414560, start position: 62500000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23620:2023-03-29T15:01:06.607+0200 INFO 7b957.app.7b957.post initialization: file #100 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
23621:2023-03-29T15:01:06.607+0200 INFO 7b957.app.7b957.post post setup completed {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "commitment_atx": "0657ad4e89", "data_dir": "../pos_data/0", "num_units": "3", "labels_per_unit": "20971520", "provider": "1", "name": "post"}
23622:2023-03-29T15:01:06.607+0200 ERROR 7b957.app.7b957.atxBuilder Failed to generate proof: %!w(*fmt.wrapError=&{post execution: generate proof: not completed 0x14000c556c0}) {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "atxBuilder"}
When I restart the node it starts to initailize again with the following log
46428:2023-03-29T15:11:29.205+0200 INFO 7b957.app.7b957.post initialization: file #17 already initialized; number of labels: 625000, start position: 10625000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
46429:2023-03-29T15:11:29.205+0200 INFO 7b957.app.7b957.post initialization: file #18 already initialized; number of labels: 625000, start position: 11250000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
46430:2023-03-29T15:11:29.205+0200 INFO 7b957.app.7b957.post initialization: file #19 already initialized; number of labels: 625000, start position: 11875000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
46431:2023-03-29T15:11:29.205+0200 INFO 7b957.app.7b957.post initialization: file #20 already initialized; number of labels: 625000, start position: 12500000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
46432:2023-03-29T15:11:29.205+0200 INFO 7b957.app.7b957.post initialization: continuing to write file #21; current number of labels: 573440, target number of labels: 625000, start position: 13125000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
The network config is:
"post-labels-per-unit": 20971520,
"post-max-numunits": 5000
}
node config for smashing is:
"smeshing-opts-maxfilesize": 10000000,
"smeshing-opts-numunits": 3,
"smeshing-opts-provider": 1,
"smeshing-opts-throttle": false
before the first log what happens is:
2023-03-29T15:01:06.593+0200 INFO 7b957.app.7b957.post initialization: file #21 completed; number of labels written: 573440 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
2023-03-29T15:01:06.593+0200 INFO 7b957.app.7b957.post initialization: starting to write file #22; target number of labels: 625000, start position: 13750000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
2023-03-29T15:01:06.593+0200 INFO 7b957.app.7b957.post initialization: file #22 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
2023-03-29T15:01:06.593+0200 INFO 7b957.app.7b957.post initialization: starting to write file #23; target number of labels: 625000, start position: 14375000 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
2023-03-29T15:01:06.594+0200 INFO 7b957.app.7b957.post initialization: file #23 completed; number of labels written: 0 {"node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "app", "node_id": "7b957421cc71909520121ea38720e56c2ee45d4faa336e0f9502f65caf62d790", "module": "post"}
Add support for using more than 1 provider for creating a post init job.
The motivation for this are users with multiple gpus on their systems. For example, a laptop with 2 gpus or a desktop with 2 discrete gpu cards and a supported gpu on the motherboard.
We'd like to enable these users to create a post init in less time and fully utilize their system gpu compute resources.
Add API method where user can provide an array of distinct system providers for creating a post job. The implementation should utilize all providers in parallel to create the post init data. Note that the gpu-post lib already supports executions on multiple providers in parallel.
Hello, I read in the Chia paper that they only need one challenge, why do you use ~100 challenges on the labels to prove?
@noamnelke let me know if you want me to work on it.
We could catch if initialized labels are invalid very early on when they are created and before writing to a file. This would slow down initialization (how much depends on the % of labels checked), but give a higher confidence that the generated POS is valid and allow catching problems early.
It's essential that the problem of invalid labels is communicated to users in a way that is understandable so that they can take action by themself. We should not just panic
with the message "labels are invalid" because users will be confused.
First of all, measuring proof generation performance is not easy as the time to find the proof is random. Changing a single bit of input data might make it do much more work.
I ran some experiments and I was able to come up with input data (8MB POS file) for which it takes 95s to generate a proof on my machine (i7-12700H with 14 cores). Below is the top from pprof:
flat flat% sum% cum cum%
40.17s 33.09% 33.09% 40.17s 33.09% runtime.futex
8.96s 7.38% 40.48% 36.37s 29.96% runtime.selectgo
6.13s 5.05% 45.53% 9.83s 8.10% runtime.lock2
6.02s 4.96% 50.49% 6.18s 5.09% runtime.unlock2
5.68s 4.68% 55.17% 5.68s 4.68% github.com/minio/sha256-simd.blockSha
3.66s 3.02% 58.18% 3.66s 3.02% runtime.procyield
2.79s 2.30% 60.48% 7.80s 6.43% runtime.mallocgc
2.43s 2.00% 62.48% 2.43s 2.00% runtime.memmove
2.25s 1.85% 64.34% 2.76s 2.27% runtime.casgstatus
2.15s 1.77% 66.11% 11.38s 9.38% runtime.sellock
The proof generation procedure spends most of the time synchronizing the channel that is used to pass input data to workers label by label (8B at a time).
There are claims in the community that after initialization postdata_metadata.json
was corrupted or empty.
Some say that it happened because of running out of disk space, but some Users do not have any problems with disk space. So it seems that in case PoS initialization fails for some reason — then it breaks everything.
That means:
postdata_metadata.json
is corrupted/empty, while there is no postdata_N.bin
— it can just recreate everything.Manage the gpu contractors, build, review and benchmark their code.
Currently setting the dev env on one of our windoz box and trying to build & benchmark.
TestInisialize, TestInitialize_NumUnits_Decrease, TestInitialize_MutlipleFiles are failed.
Fails are related to windows file system specific. Fix requires slightly rewrite file operations.
Based on the discussion in these two comments:
At the moment the Initializer
type is overly complex in its usage. It is designed to be re-used, when it doesn't need to be and over-uses channels and go routines. The type can be simplified in the following manner:
Initializer
by making it non-reusable; besides tests the type isn't used in this manner anywayReset()
: it is only used in tests and the directory used by Initializer
is known to the caller. In tests the caller can delete the directory itself. In Production this shouldn't be done anyway; switching identities should instead use a different datadir for PoST.SessionNumLabelsWrittenChan()
; there is no need for this method when SessionNumLabelsWritten()
returns the same information in a synchronous way. Callers interested in updates should rely on the values returned by the second method and the status from Started()
, Completed()
, isInitializing()
or even a new single method that combines the purpose of all 3.Initialize()
should take a context
as parameter and instead of Stop()
callers should just cancel the context they passed in.Refactorings of types using Initializer
in go-spacemesh:
PostSetupManager
:
StatusChan()
should just call SessionNumLabelsWritten()
and return the status directly instead of via a channelSmesherService
:
PostStetupStatusStream()
should instead of relying on receiving values in regular intervals via the channel, call the (new) synchronous method in PostSetupManager
in regular intervals.Add a new command to postcli to verify POS data at a given location. The idea is to cross-check a randomly picked subset of labels in the POS data against labels generated with the CPU implementation.
Line 81 in eee6c3c
Is not called as part of the initialization flow.
Not sure if it's possible but it would be good if we could calculate it dynamically based on:
We need to create some "bench" so no one commits too much storage. We need to make sure that:
2023/05/12 09:33:41 DEBUG initialization: file #10 current position: 9437184, remaining: 57671680
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
2023/05/12 09:33:46 DEBUG initialization: file #10 current position: 10485760, remaining: 56623104
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
2023/05/12 09:33:50 DEBUG initialization: file #10 current position: 11534336, remaining: 55574528
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
2023/05/12 09:33:54 DEBUG initialization: file #10 current position: 12582912, remaining: 54525952
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
2023/05/12 09:33:59 DEBUG initialization: file #10 current position: 13631488, remaining: 53477376
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
It should not work like that. That causes a significant performance drop (more than 50%)
linux/arm64
and darwin/arm64
The work oracle is implemented to return the index of the first hash that is below the given difficulty threshold for Pow when calculating labels.
Instead it should return the index of the hash with the lowest value between StartPosition
and EndPosition
. Additionally when calculating leaves in batches the Initializer
should continue to look for indices where the resulting hash is lower than the one already found.
The reason for this is that instead of finding the first Nonce that satisfies the given difficulty, this finds the "best" (lowest) nonce. If a smesher decides to increase their PoST storage in the future this gives them a higher chance of being able to re-use the nonce instead of being required to search for a new one. Additionally if the lowest found nonce doesn't satisfy the difficulty for the larger PoST they can be sure no index in the PoST already calculated does.
gpu-post
can also just return all nonces that satisfy the difficulty threshold and post
can use the CPU to find the lowest one among those found, since we are only expecting ~ 8 nonces during initialization the impact of this should be negligible.gpu-post
the comparison from the calculated hash to D
(the difficulty threshold) is here. This needs to be changed such that D
is updated to the value found and the loop isn't aborted.gpu-post
additionally to the index of the found nonce it's value should be returned as well so it can be used as new difficulty thresholdpost
a found nonce should not stop the oracle from looking for one, but instead here the difficulty should be updated with the found value and the Initializer
should look for better Nonces in successive batchesThe PoST repo is missing a coverage job that uploads code coverage to codecov.io
. As a reference the job from poet
can be used:
and
https://github.com/spacemeshos/poet/blob/develop/.github/codecov.yml
The postcli is configured with very opinionated zap's Production preset. It is configured with a Sampler
limiting the rate of logging up to 100 logs/s. It causes big confusion among users - postcli output looks like it was broken.
when you initialize with postcli you'll get error:
023-07-10T22:21:18.166Z INFO 00000.defaultLogger Looking for identity file at `/spacemesh/data/post_data/key.bin`
2023-07-10T22:21:18.166Z FATAL 00000.defaultLogger could not retrieve identity: decoding private key: encoding/hex: invalid byte: U+00E5 'å' {"name": ""}
when trying to use same key in go-sm.
It will be really great if we can provide a cross-platform cli app that verifies post.
Not sure what's the best way to implement this - maybe we add a smesher API method to verify post and we provide this via smrepl (was CLIWallet)?
The motivation is for users to be able to quickly verify an existing post data.
@moshababo - what do you think?
Copied from the Epic which was open in go-spacemesh about this task:
We want to implement a stand-alone desktop GPU post generator process that can be started from other local processes such as a full node and provide a simple post init and progress api to local clients.
command:
~/smesh/post/build$ ./postcli -commitmentAtxId=c230c51669d1fcd35860131e438e234726b2bd5f9adbbd91bd88a718e7e98ecb -provider=1 -id=804ce71657a91d935936700c03c0fa8ffb15a6dc0d1dd56b76c90299dd591334
output:
2023/01/24 14:18:59 INFO initialization: datadir: /home/alchemist/post/data, number of units: 99000001, max file size: 4294967296000, number of labels per unit: 4096, number of bits per label: 4096
2023/01/24 14:18:59 INFO initialization: files layout: number of files: 1, number of labels per file: 4294967296000, last file number of labels: 405504004096
2023/01/24 14:18:59 INFO initialization: continuing to write file #0; current number of labels: 4194304, target number of labels: 405504004096, start position: 0
2023/01/24 14:18:59 DEBUG initialization: file #0 current position: 4194304, remaining: 405499809792
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
2023/01/24 14:19:01 DEBUG initialization: file #0 current position: 4210688, remaining: 405499793408
2023/01/24 14:19:02 DEBUG initialization: file #0 current position: 4227072, remaining: 405499777024
2023/01/24 14:19:03 DEBUG initialization: file #0 current position: 4243456, remaining: 405499760640
2023/01/24 14:19:04 DEBUG initialization: file #0 current position: 4259840, remaining: 405499744256
2023/01/24 14:19:05 DEBUG initialization: file #0 current position: 4276224, remaining: 405499727872
2023/01/24 14:19:06 DEBUG initialization: file #0 current position: 4292608, remaining: 405499711488
providers:
([]gpu.ComputeProvider) (len=2 cap=2) {
(gpu.ComputeProvider) {
ID: (uint) 1,
Model: (string) (len=32) "llvmpipe (LLVM 12.0.0, 256 bits)",
ComputeAPI: (gpu.ComputeAPIClass) Vulkan
},
(gpu.ComputeProvider) {
ID: (uint) 2,
Model: (string) (len=3) "CPU",
ComputeAPI: (gpu.ComputeAPIClass) CPU
}
}
nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 Off | 00000000:03:00.0 Off | Off |
| 41% 46C P8 10W / 140W | 7MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A4500 Off | 00000000:82:00.0 Off | Off |
| 30% 35C P8 23W / 200W | 3MiB / 20186MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 56135 G ./postcli 3MiB |
+-----------------------------------------------------------------------------+
To summarize, I can tell from htop, provider 1 seems to be using a multicore approach, but I still hardly see any utilization across either GPU.
We must fix the pipeline so the needed libraries are included in the artifacts. Currently Linux build does not have libpost.so
included
2023-06-16T06:13:53.748+1000 DEBUG 5c0d8.post Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce GTX 980 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 314}
2023-06-16T06:13:53.748+1000 DEBUG 5c0d8.post device memory: 4036 MB, max_mem_alloc_size: 1009 MB, max_compute_units: 16, max_wg_size: 1024 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 134}
2023-06-16T06:13:53.824+1000 DEBUG 5c0d8.post preferred_wg_size_multiple: 32, kernel_wg_size: 256 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 168}
2023-06-16T06:13:53.824+1000 DEBUG 5c0d8.post Using: global_work_size: 2016, local_work_size: 32 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 181}
2023-06-16T06:13:53.824+1000 DEBUG 5c0d8.post Allocating buffer for input: 32 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 185}
2023-06-16T06:13:53.824+1000 DEBUG 5c0d8.post Allocating buffer for output: 64512 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 193}
2023-06-16T06:13:53.824+1000 DEBUG 5c0d8.post Allocating buffer for lookup: 1056964608 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 201}
Also this
2023-06-16T17:40:17.305+1000 DEBUG 5c0d8.post Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce GTX 960 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 314}
2023-06-16T17:40:17.305+1000 DEBUG 5c0d8.post device memory: 1996 MB, max_mem_alloc_size: 499 MB, max_compute_units: 8, max_wg_size: 1024 {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 134}
2023-06-19T06:03:59.717+1000 DEBUG 5c0d8.post Allocating buffer for input: 32 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 185}
2023-06-19T06:03:59.717+1000 DEBUG 5c0d8.post Allocating buffer for output: 31744 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 193}
2023-06-19T06:03:59.717+1000 DEBUG 5c0d8.post Allocating buffer for lookup: 520093696 bytes {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 201}
2023-06-19T06:03:59.735+1000 DEBUG 5c0d8.post initializing 1 -> 993 (992 labels, GWS: 992) {"node_id": "5c0d8fa190b0ec32a285fd2f663670939ac7a1367b5ab747550cb9403e19fa07", "module": "post", "module": "scrypt_ocl", "file": "scrypt-ocl/src/lib.rs", "line": 253}
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: OclError(OclCore(Api(
################################ OPENCL ERROR ###############################
Error executing function: clEnqueueNDRangeKernel("scrypt")
Status error code: CL_MEM_OBJECT_ALLOCATION_FAILURE (-4)
Please visit the following url for more information:
https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html#errors
#############################################################################
)))', ffi/src/initialization.rs:135:10
When generating a PoST proof a node can fail to read data from disk, when some or all of the PoST data was deleted, or it could generate a proof based on corrupted / altered PoST data.
NumUnit
value.
NumUnits
value.NumUnits
value update the post metadata such that the next time an ATX is published is uses the lower value.After genesis
As discussed here:
We have multiple occurrences of initial Post indices included in challenge does not equal to the initial Post indices included in the atx
in the logs from different users.
It needs to be debugged and categorized correctly. User claim that he DID NOT reinitialize or change initialization.
Third_PC_with_4070_withoutRewards(1).zip
spacemesh-log-7f8f332c.txt.2(1).zip
The Initializer
has the option to be passed a logger with WithLogger
as parameter. At the moment that logger is assumed to implement the interface shared.Logger
. This interface unfortunately doesn't allow for a structured logger like zap
and has the additional drawback that we need custom implementations during testing like here:
post/initialization/initialization_test.go
Lines 28 to 35 in f3ec621
We should remove the interface and instead have direct dependency to zap
for logging in post
. This would allow us to more easily test logging with zaptest
and enable structured logging in post
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.