puzpuzpuz / xsync Goto Github PK

View Code? Open in Web Editor NEW

927.0 927.0 37.0 220 KB

Concurrent data structures for Go

License: MIT License

Go 100.00%

xsync's People

Contributors

Stargazers

Watchers

xsync's Issues

Doesn't compile on 32bit

This function created a compile error when trying to build with arm32 (CGO_ENABLED=0 GOOS=linux GOARCH=arm go build test.go)

xsync/util.go

Lines 18 to 24 in d5f0f9d

 // murmurhash3 64-bit finalizer 

 func hash64(x uintptr) uint64 { 

 x = ((x >> 33) ^ x) * 0xff51afd7ed558ccd 

 x = ((x >> 33) ^ x) * 0xc4ceb9fe1a85ec53 

 x = (x >> 33) ^ x 

 return uint64(x) 

 }

Following error is shown:

#0 7.331 /go/pkg/mod/github.com/puzpuzpuz/[email protected]/util.go:20:24: 0xff51afd7ed558ccd (untyped int constant 18397679294719823053) overflows uintptr
#0 7.331 /go/pkg/mod/github.com/puzpuzpuz/[email protected]/util.go:21:24: 0xc4ceb9fe1a85ec53 (untyped int constant 14181476777654086739) overflows uintptr

This can likely be fixed by casting the argument to uint64 before performing the operations.

Specialized versions of Map: CounterMap, ByteSliceMap, etc.

Current xsync.Map supports string keys and interface{} values. This is sufficient for many use cases, but not all of them. So, that's why specialized map data structures could be added to the library.

IMO the most interesting one is CounterMap which would hold int64s as both keys and values and support Inc/Dec operations. Such map may be used in certain niche use cases. Also, it should outperform any synchronized built-in map, as well as xsync.Map.

Another option might be a map that would support []byte slices as keys. This is not supported by the built-in maps (both map and sync.Map) since slices are not comparable.

I'll be collecting feedback to understand the demand for specialized map collections, so leave comments on this issue if you need them.

[Question] CPU cache friendly byte slice

Thank you for your library, articles and talks. Could you please help me.

I have a logger that writes a lot of messages in parallel to stdout. The problem is that messages were written simultaneously and shuffled. So I had to add a mutex and lock before printing:

l.mu.Lock()
fmt.Fprintf(os.Stdout format, v...)
l.mu.Unlock()

I wish to avoid the locking because I need as small latency as possible. But I'm fine with some pauses and I don't care much about order of messages.
On my server I have 24 CPUs and each has it's own cache. I have an idea to make per-cpu list of byte slices and then periodically gather all of them and dump to a log.
Can this work in practice? I'm feeling that I'm reinventing some existing structure. Could you please recommend an optimal way to do that.

I see that the state striping is something that probably can help me.
I see that the library has a Counter that uses the same principle.

I also asked the question on SO https://stackoverflow.com/questions/74954360/cpu-cache-friendly-byte-slice

Is there any reason to not use xsync.Map?

I have an application that makes use of a lot of different sync.Maps. It mostly fits the recommended use case for sync.Map ("append-only maps" with a lot more reads than writes), but the benchmarks for xsync.Map do look better across the board, so I was thinking of maybe just swapping them all out. I see your blog post mentions "When it comes to reads which are the strongest point of sync.Map, both data structures are on par.", though in the repo README you say "Due to this design, in all considered scenarios Map outperforms sync.Map."

Just wondering if you can foresee any reason to not do this. This blog post mentions potential memory concerns, but I think that probably won't be an issue for me.

Range Causes extra memory allocations

Hi,
Your project and hard-work is very much appreciated.
My use case needs the map to be ranged as fast as possible ( Maybe million of times every second and this behavior will continue for entire service duration) . However, the following line

xsync/mapof.go

Line 319 in f620963

bentries := make([]rangeEntry, 0, entriesPerMapBucket)

cause memory allocation always. ( causing unnecessary garbage and thus GC cycles )
This memory allocation can go away if a fixed array is used instead.

Thank you.

Support nil values in Map

See golang/go#47643

"Drawbacks" you say?

Hi.

I hope you are having a great new year so far.

In the article at https://gopheradvent.com/calendar/2021/journey-to-a-faster-concurrent-map, you say xsync.Map "has some drawbacks". To which drawbacks do you refer?

Thanks in advance.

github.com/puzpuzpuz/[email protected]: invalid version: module contains a go.mod file, so module path must match major version ("github.com/puzpuzpuz/xsync/v2")

Thank you for releasing 2.0.0!!! This library is awesome :-)

Unfortunately, I can't install 2.0.0 right now:

$ go get github.com/puzpuzpuz/xsync
go: added github.com/puzpuzpuz/xsync v1.5.2

$ go get github.com/puzpuzpuz/[email protected]
go: github.com/puzpuzpuz/[email protected]: invalid version: module contains a go.mod file, so module path must match major version ("github.com/puzpuzpuz/xsync/v2")

Built-in hash function for comparable types

Hey @puzpuzpuz
Thanks for such a cool concurrent data synchronization package

I really like the example of a hash function for a comparable structure, because there is no magic in it, math and only

xsync/example_test.go

Lines 16 to 32 in 051117f

 type Person struct { 

 GivenName string 

 FamilyName string 

 YearOfBirth int16 

 } 

 age := xsync.NewTypedMapOf[Person, int](func(seed maphash.Seed, p Person) uint64 { 

 var h maphash.Hash 

 h.SetSeed(seed) 

 h.WriteString(p.GivenName) 

 hash := h.Sum64() 

 h.Reset() 

 h.WriteString(p.FamilyName) 

 hash = 31*hash + h.Sum64() 

 h.Reset() 

 binary.Write(&h, binary.LittleEndian, p.YearOfBirth) 

 return 31*hash + h.Sum64() 

 })

And this comment gave me the idea to get the built-in hashing function from the map, in a slightly tricky and insecure magical way

xsync/mapof.go

Lines 32 to 33 in 051117f

 // are supported. That's because Golang standard library does not 

 // expose the built-in hash functions for interface{} values.

Here's what I got: https://goplay.tools/snippet/w91su4PCNMY

func main() {
	type t struct{ int }

	hash := HashFuncOf[t]()
	if hash == nil {
		log.Fatalln("hash func is nil")
	}

	seed := maphash.MakeSeed()
	a, b := hash.Sum64(seed, t{1}), hash.Sum64(seed, t{1})
	fmt.Println(a == b)

	// Output:
	// true
}

This is not safe because future versions of golang may change the internal arrangement of types.
On the other hand, it might be OK if you put this idea in a separate package like github.com/puzpuzpuz/xsync/unsafehasher and warn the user about the potential problem in the documentation or init function, like this:

// This init function will prevent the application from starting
// if the internals of golang types change in such a way that
// this package causes a panic.
func init() {
	type t struct{ bool; int; string; float64 }
	HashFuncOf[t]().Sum64(maphash.MakeSeed(), t{})
}

Also, the trick has a minus, which consists in the fact that the built-in hash function requires passing by pointer, which can negatively affect performance, but it can probably be reduced using sync.Pool

What do you think about supporting this way of creating hash functions?

And yet, it seems that this comment about string keys is superfluous here, apparently accidentally copied from the implementation of Map

xsync/mapof.go

Lines 31 to 34 in 051117f

 // One important difference with sync.Map is that only string keys 

 // are supported. That's because Golang standard library does not 

 // expose the built-in hash functions for interface{} values. 

 type MapOf[K comparable, V any] struct {

Nesting multiple xsync.MapOf maps not supported, or...?

Hello! I've been trying out the xsync.MapOf on one of my projects and I ran into a strange issue when nesting the maps. Am I doing something wrong, or did I just discover a bug? Here is a runnable example (using latest xsync version):

package main

import (
	"github.com/puzpuzpuz/xsync"
)

type tmp struct {
	Hello string
}

func main() {
	const stringKey = "test"
	const intKey = 123

	outerStringMap := xsync.NewMapOf[*xsync.MapOf[uint32, *tmp]]()

	innerIntMap, loaded := outerStringMap.LoadOrCompute(stringKey, func() *xsync.MapOf[uint32, *tmp] {
		return xsync.NewIntegerMapOf[uint32, *tmp]()
	})
	if loaded {
		panic("expected nothing")
	}

	innerIntMap.Store(intKey, &tmp{Hello: "world"})

	innerIntMap, loaded = outerStringMap.Load(stringKey)
	if !loaded {
		panic("expected existing map")
	}

	hello, loaded := innerIntMap.Load(intKey)
	if !loaded {
		panic("expected existing value")
	}

	if hello.Hello != "world" {
		panic("unexpected value")
	}
}

grongor@grongor-nb:~/xsync-test$ go run .
panic: expected existing value

goroutine 1 [running]:
main.main()
        /home/grongor/xsync-test/main.go:33 +0x112
exit status 2

Implement shrinking in Map resize

See golang/go#47643

Alternative epoch-based design for Maps

Just like in sync.Map, Store operation in Map allocates an intermediate interface{} struct for each value (see #1 for more details). Hence, each value pointer stored in map buckets references the following chain: *interface{} -> interface{} -> value. If we could specialize the Map to a concrete value type (say, with upcoming generics language feature), we could get rid of the intermediate interface{} struct and the chain could be simplified to the following: *value -> value. There are multiple way to achieve this and this issues describes one of them.

The idea is to replace atomic snapshots (for which we need *interface{} pointers) with epoch-based reader-writer communication.

Currently the bucket layout looks like the following:

| bucket mutex	| keys array		| values array		| pointer to next bucket  |
| 8 bytes	| 24 bytes (3 pointers)	| 24 bytes (3 pointers)	| 8 bytes		  |
|<-					one cache line (64 bytes)			->|

The epoch-based design would need to change it to this:

| bucket mutex	| keys array		| values array		| epoch			  |
| 8 bytes	| 24 bytes (3 pointers)	| 24 bytes (3 pointers)	| 8 bytes		  |
|<-					one cache line (64 bytes)			->|

Notice that the linked list (chain of buckets) is now gone and replaced with the epoch counter (uint64). Each epoch counter is per buckets and assumes two phases:

Even value - update finished phase
Odd value - in progress update phase

Note: 64B assumes only 3 entries per bucket, so that the table will be able to hold only 3*number_of_buckets entries. That could be improved by used 128B bucket sizes which is enough to fit 7 entries. Although, that would have a slight impact on the write performance.

Writers should do the following when updating a bucket:

Lock the bucket
Increment the epoch atomically, so that it's odd (update in progress)
Execute the usual write logic
Decrement the epoch atomically, so that it's even (update finished)

Readers should execute the following seqlock-style logic when reading from a bucket:

Read the epoch atomically. If it's odd, goto 1
Scan the bucket. If the entry is not there, return nil, false
If the entry is found, read the epoch atomically. It the epoch doesn't match with the value obtained at step 1, spin over the epoch along with reading the entry (we could do a goto 1 here, but that's not necessary)

Just like with atomic snapshots, the above design assumes that readers do not block each other and writers allow readers to scan the table concurrently (although, readers might need to do a few spins until they epoch check succeeds).

Avoid locking in Map.Range

ATM Map.Range iterates over the map by acquiring bucket locks sequentially and copying their contents in an intermediate slice. Locking could be avoided via reading each bucket's contents with atomic snapshots as in the Get operation.

Panic while storing

I was trying to benchmark the map implementation against sync.Map with concurrent load, but could not because of panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x56b27a]

goroutine 73 [running]:
github.com/puzpuzpuz/xsync.(*Map).doStore(0xc00009e6c0, {0xc0000b4028, 0x7}, {0x59bb40, 0xc0000d6040}, 0x0)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/[email protected]/map.go:190 +0x21a
github.com/puzpuzpuz/xsync.(*Map).Store(...)
	/home/runner/go/pkg/mod/github.com/puzpuzpuz/[email protected]/map.go:163
github.com/veartutop/cachex_test.xSyncMap.make({0x0, 0x0, 0x4800}, 0x0, 0xf4240)
	/home/runner/work/cache/cache/benchmark_test.go:330 +0x12a
github.com/veartutop/cachex_test.Benchmark_concurrent.func1(0xc0000cb200)

Reproducer: https://github.com/vearutop/cache/pull/6/checks?check_run_id=3389323336
https://github.com/vearutop/cache/blob/xsync/benchmark_test.go#L31
https://github.com/vearutop/cache/blob/xsync/benchmark_test.go#L312-L364

The benchmark stores 1e6 entries into the map and then reads "random" keys from multiple goroutines with occasional 10% writes.

Same benchmark passes for sync.Map, please help to check if I'm misusing this lib.

nilVal is not unique

The compiler is free to make new(struct{}) == new(struct{}), and indeed does so. This means Map can get confused if and returns nil if a user tries to store new(struct{}).

I suggest using the address of a global int, for example.

var nilVal int // use &nilVal

Add copy-on-write list

A COWList data structure might be helpful in certain use cases, like lists of listeners.

Is it possible to use zero value of Map/MapOf struct?

hi,

followed over here from the golang issue.

was testing using the library as a drop in replacement for existing sync.Map and a generic sync.MapOf[K,V] wrapper that we use, but a big issue is that the zero value isn't valid, which would result in a lot of incompatible refactoring for our code. it worked great when i did initialize them.

ideally, something like this would not panic.

func TestMap_ZeroValueValid(t *testing.T) {
	EnableAssertions()
	m := new(Map)
	v := 42
	m.Store("foo", v)
	m.Store("foo", v)
	DisableAssertions()
}

I have a naive solution that uses a sync.Once, happy to open a PR if this is something you would consider changing.

all i did was just just add a sync.Once

type Map struct {
   totalGrowths int64
   totalShrinks int64
   resizing     int64          // resize in progress flag; updated atomically
   resizeMu     sync.Mutex     // only used along with resizeCond
   resizeCond   sync.Cond      // used to wake up resize waiters (concurrent modifications)
   table        unsafe.Pointer // *mapTable

   initialized sync.Once
}

adding init function

func (m *Map) init(sizeHint int) {
	m.initialized.Do(func() {
		m.resizeCond = *sync.NewCond(&m.resizeMu)
		var table *mapTable
		if sizeHint <= minMapTableCap {
			table = newMapTable(minMapTableLen)
		} else {
			tableLen := nextPowOf2(uint32(sizeHint / entriesPerMapBucket))
			table = newMapTable(int(tableLen))
		}
		atomic.StorePointer(&m.table, unsafe.Pointer(table))
	})
}

changing NewMapPresized function

func NewMapPresized(sizeHint int) *Map {
	m := &Map{}
	m.init(sizeHint)
	return m
}

then i added init to these entrypoints:

func (m *Map) Load(key string) (value interface{}, ok bool) {
	m.init(minMapTableCap)
...
}
func (m *Map) Clear() {
	m.init(minMapTableCap)
...
}
func (m *Map) Size() int {
	m.init(minMapTableCap)
...
}
func (m *Map) Range(f func(key string, value interface{}) bool) {
	m.init(minMapTableCap)
...
}
func (m *Map) doCompute(
	key string,
	valueFn func(oldValue interface{}, loaded bool) (interface{}, bool),
	loadIfExists, computeOnly bool,
) (interface{}, bool) {
	m.init(minMapTableCap)
}

and the same thing for the generic.

Map: add way to provide initial size hints

Since the difference in performance caused by map growth is noticeble, it would be great if we could initialize the map with size hints to try to avoid growths on add.

The fact that we spread keys across buckets makes this non-trivial, but maybe it would be enough to assume normally distributed keys and just grow the buckets proportionally to the initial hint?

I've just glossed over the code, but could try a PR if there's no obvious blocker I missed and nobody else beats me to it!

Please improve error handling on methods and interfaces

Methods like LoadOrCompute() doesn't have a way to bubble the errors up of the compute function. Could you please elaborate on the error handling strategy used in the project.

Generic map

Now that type parameters are coming to Go, it might be an optimization opportunity to reduce usages of interface{} with generics.

Consider faster hash function for Map

FNV-1a which is used in Map has mediocre performance. A faster non-cryptographic hash function with good enough hash code distribution might be a good replacement.

Map: make size() public

Hello,

It would be really nice to have this function in the public API:

xsync/mapof.go

Line 325 in d5f0f9d

func (m *MapOf[V]) size() int {

Thanks!

LoadOrCompute duplicate compute bug on v1 implementation

Hey, I know v1 is a bit out of date, just thought you might be interested in a bug I found on the latest v1 tag (v1.5.2), but which was fixed somewhere in v2.

https://github.com/puzpuzpuz/xsync/blob/v1.5.2/mapof.go#L237

					value := valueFn()
					var wv interface{} = valueFn()

During insertion of a new value, the valueFn is called twice instead of using value for wv. That's OK for LoadOrStore, but causes issues in LoadOrCompute, which may end up creating the new object twice and returns a different object from the one that was just inserted.

Not sure what the position on v1 maintenance is, but since upgrading to v2 is a breaking change, it might be worth releasing the fix as v1.5.3, WDYT?

Consider storing 6 bits of key hash codes in tagged pointers

Since 64-bit pointers in Go are 8-byte aligned, we could use free 2x3 bits to store a (MSB?) part of the key hash code as tagged pointers in key and value pointers. This should improve search performance since this way we won't need to calculate equality of all scanned keys, but only a part of them.

	// murmurhash3 64-bit finalizer
	func hash64(x uintptr) uint64 {
	x = ((x >> 33) ^ x) * 0xff51afd7ed558ccd
	x = ((x >> 33) ^ x) * 0xc4ceb9fe1a85ec53
	x = (x >> 33) ^ x
	return uint64(x)
	}

	type Person struct {
	GivenName string
	FamilyName string
	YearOfBirth int16
	}
	age := xsync.NewTypedMapOf[Person, int](func(seed maphash.Seed, p Person) uint64 {
	var h maphash.Hash
	h.SetSeed(seed)
	h.WriteString(p.GivenName)
	hash := h.Sum64()
	h.Reset()
	h.WriteString(p.FamilyName)
	hash = 31*hash + h.Sum64()
	h.Reset()
	binary.Write(&h, binary.LittleEndian, p.YearOfBirth)
	return 31*hash + h.Sum64()
	})

	// are supported. That's because Golang standard library does not
	// expose the built-in hash functions for interface{} values.

	// One important difference with sync.Map is that only string keys
	// are supported. That's because Golang standard library does not
	// expose the built-in hash functions for interface{} values.
	type MapOf[K comparable, V any] struct {

puzpuzpuz / xsync Goto Github PK

xsync's People

Contributors

Stargazers

Watchers

Forkers

xsync's Issues

Recommend Projects

Recommend Topics

Recommend Org