colgreen / redzen Goto Github PK

View Code? Open in Web Editor NEW

92.0 5.0 16.0 1.32 MB

General purpose C# code library.

License: Other

C# 99.95% Smalltalk 0.05%

random prng gaussian ziggurat timsort numerics io xoshiro wyrand

redzen's Issues

Improved method for Random.Next(max)

See https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/#random
dotnet/runtime#79790

The nuget package's license is MIT, but the license of TimSort is GPL

So, if I include the package in the commercial project, not providing the source code, it seems to cause a very serious problem.
I don't care if TimSort is included, I really want this package to be used in commercial projects.

Of cause I can copy the source codes to my project, then delete the TimSort files. But that will make it difficult to sync to the newest version.

ISampler<T> performance tuning.

Some ideas to consider:

Add an ISampler.Sample(out T value) method. This allows storing the sample result directly into an existing memory location, instead of returning a value that is subsequently moved to the mem location. This affects all PRNGs and 'numeric' samplers, i.e., e.g. ZigguratSampler.

In ZigguratSampler, we can avoid multiplying by the 'sign' double, with the following technique:

 ulong signBit =  (u & 0x400UL) << 53;
 SetSignBit(ref x, ref signBit);

 private static void SetSignBit(ref double x, ref ulong signBit)
 {
     Unsafe.As<double,ulong>(ref x) |= signBit;
 }

Initial performance tests showed a slowdown using .NET5, and a speedup using .NET6 (preview 3); but with .NET 6 being slower than .NET 5 to start with! Hence, I am leaving this open for now, and will revisit when a later .NET6 preview is out (or perhaps the final release).

Add an .editorconfig file to root of project.

See: Create portable, custom editor settings with EditorConfig

EnumerableUtils.RangeRandomOrder performance improvement

A small performance improvement to the above: There's no need to store the returned value back into the array. That value will not be used again:

                   arr[swapIdx] = arr[i];
 -                 arr[i] = tmp;
   
 -                 yield return arr[i];
 +                 yield return tmp;
               }

Move Random number generators into their own package (Redzen.Random?)

Redzen originated as a collection of classes I had written for use elsewhere, and I wanted a central place where I could develop and maintain them. Going forward I think it makes sense to split the random number generator classes into a distinct project/package that is focused on one goal, and the remaining code can remain in the original 'Redzen' package for now - much of that other code is loose ends that don't belong together, but that is a separate problem for another time.

Backport fix to 9.x branch: Base64EncodingOutputStream.Write() - incorrect offset bound check.

Backport this fix to the 9.x branch, and publish a new package.

fba37e4#diff-0a0a9ad8e2e6a3fc322ec68611a09c6b8e6139fb15b6ea7d56c9719b20442374

Implement a lightweight List class that exposes the inner array

Currently if you use List you cannot access the internal/wrapped array, and therefore cannot get a Span on the list items. This in turn means there is a need to support IList method overloads in some cases, e.g.:

SearchUtils.BinarySearch<T,V>(Span span, V value, Func<T,V,int> compareFn)
SearchUtils.BinarySearch<T,V>(IList list, V value, Func<T,V,int> compareFn)

and

SortUtils.IsSortedAscending(Span span)
SortUtils.IsSortedAscending(IList list)

This is quirky because an Array is both a Span and an IList, and thus the caller has to cast to choose one or the other, and the IList overload simple tests for an array and defers to the Span implementation.

The proposal is to remove the IList overloads o fthe above methods, thus providing only Span based methods for those operations. this will result in a functionality gap for some, but for my own code I intend to define a new List type that exposes its internal array, which can then be accessed (as a span) directly.

For those that are stuck with List and want to use the above methods (and similar), then I simply say that Redzen won't support that. Implementing those methods (e.g. binary search) over an IList is arguably the wrong abstraction layer for high performing code. This is Sort is defined directly on List, as opposed to being an extension method on IList.

Consider implementing wyrand PRNG.

See https://rurban.github.io/dieharder/QUALITY.html

rng function	ints/sec	doubles/sec
wyrand_g210_a_Y1.out:0	319345	283551
xoshiro128+_g213_a.out:0	289544	257579
xoroshiro64ss_g214_a.out:0	261519	267279
xoshiro128++_g211_a.out:0	260260	246457
xoroshiro64s_g215_a_Y1.out:0	254123	272197
lxm_g232_a_Y1.out:0	215959	205161
efiix64_g231_a.out	173178	171191
threefry2x64_g235_a_Y1.out:0	94407	87967

Also note the the current default PRNG fails some PractRand tests:

https://github.com/lemire/testingRNG

This 'just' may be the known characteristic of all LCGs of having low quality randomness in the low order bits.

Possible implementation here:

https://github.com/cocowalla/wyhash-dotnet

ZigguratGaussian - always returns mean

No matter which type of distribution of the ZigguratGaussian function used - it always returns mean
I tried to use older releases - the same thing.
Are you sure, this will work on Xamarin - IOS/Android/ other devices?

Xoshiro256PlusRandom.cs use xoshiro256** too ?

https://github.com/colgreen/Redzen/blob/master/Redzen/Random/Xoshiro256PlusRandom.cs

Why NextBytes () uses both xoshiro256+ and xoshiro256** ?

ulong result = RotateLeft(s1 * 5, 7) * 9;

RandomSourceBase.NextDoubleNonZero() can produce zero

This:

return ((NextULongInner() >> 11) & 0x1f_ffff_ffff_fffe) * INCR_DOUBLE;

Should be:

return (((NextULongInner() >> 11) & 0x1f_ffff_ffff_fffe)+1) * INCR_DOUBLE;

Otherwise a zero will be returned when NextULongInner() returns zero, which is approximately every 2^64 calls (approx. 1.8*10^19) if the underlying PRNG is capable of generating zero, which all of the redzen PRNGs are. This makes the error an exceedingly rare event, but it is possible and thus a definite bug. E.g. if this method were called a billion times a second, then we can expect to see a zero approximately once every 31 years!

MathSpanUtils: Support a wider set of types.

E.g. We have MathSpanUtils.Sum(Span), MathFSpanUtils.Sum(Span), and would be good to do the same for Int32 and other native types. E.g. see the Clamp() overloads here:

https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Math.cs

Consider merging MathSpanUtils and MathFSpanUtils, and using partial classes to split the code into multiple files. Perhaps put all of the like method overloads in one file (i.e. all Clamp method in one file).

Also, consider renaming the class to MathSpan or SpanMath.

Use overlapping SIMD vectors

Inspired by dotnet/runtime#83488

The relates to where there are loops using Vector, and there are tail elements that don't completely fill a vector. Currently those loops all revert to scalar operations to handle the tail elements, but in some places it will be possible to use a Vector over the tail elements such that some elements are calculated twice (i.e. the tail vector overlaps the last vector slice from the loop).

It seem there are multiple examples of this pattern in the .NET runtime repo; see https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/#zeroing

Review how Vector<double>.Count is used, e.g. MathSpan helper methods

Sometimes we do:

if(Vector.IsHardwareAccelerated && (s.Length >= Vector<double>.Count << 1))

I.e. we compare the span length with 2 * vector_length, so that the vector loop isn't entered unless there will be at least two loops.

(NB. that same logic can be achieved with (s.Length > Vector<double>.Count)

Questions:

Is this pattern actually better performing? It introduces the exra shift op in all scenarions (minor point?)
I seem to recall there are places where it makes sense to do a minim of two loops. What are those places, and are they properly commented, and let's also document the places that use '<< 1' where it's just done as a micro-op.

Separately we also do this:

int width = Vector<double>.Count;

And from then on use the local 'width' variable. At time of writing, changing the code to use Vector.Count has substantially different x86 codegen (according to sharplab swebsite), perhaps to avoid switching between vector and scalar CPU instructions(?). It might be worth fine tuning the performance in .NET 7 (i.e. using up-to-date code gen, as codegen is changing all the time).

Make use of SkipLocalsInit

Local field init (zeroing) can be optionally omitted. See:

C# 9 - Improving performance using the SkipLocalsInit attribute

It appears from the benchmarks in that link, that this is most impactful where a stackalloc of a reasonable size is performed. There are also risky scenarios if using 'Unsafe' access to fields, hence possible/probably this should be done on a per method basis, rather than a wider scope (class or module/project level). That said, the dotnet runtime repo seems to use this csproj element '', albeit conditionally per project:

  <PropertyGroup>
    <SkipLocalsInit Condition="'$(SkipLocalsInit)' == '' and '$(MSBuildProjectExtension)' == '.csproj' and '$(IsNETCoreAppSrc)' == 'true' and ($([MSBuild]::IsTargetFrameworkCompatible('$(TargetFramework)', '$(NetCoreAppCurrent)')))">true</SkipLocalsInit>
  </PropertyGroup>

Otherwise, possible targets here are the Random and Distribution classes.

TimSort classes: Support sorting of Span<T> instead of T[]

TimSort (and other sorting classes) will currently sort the items of an Array only. Consider changing this to handling Span instead.

Issue could also be extended to cover wider 'spanification', e.g. SortUnstable() accepts a List, because we rely on List.Sort(), but that ought to accept a Span and use DefaultComparer, or a supplied Comparer. .NET has new support methods for sorting items of a Span. See https://docs.microsoft.com/en-us/dotnet/api/system.memoryextensions.sort?view=net-5.0

Functionally separate distribution representation from sampling

The various distribution classes all currently wrap an inner IRandomSource in order to provide methods for sampling from the distribution(s). A cleaner pattern is to have distribution classes that represent the distribution only, and distribution sampler classes (and/or utility methods) that sample from them.

In addition, the distribution classes in the Redzen.Random namespace should probably be moved in the Redzen.Numerics namespace.

Review/revisit Ziggurat Algorithm maths and floating point arithmetic correctness.

There are a few things I want to do with the Ziggurat Gaussian sampling code:

Consider switching to this faster variant:
"A modified ziggurat algorithm for generating exponentially- and normally-distributed pseudorandom numbers"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4812161/
Consider switching from 128 to 256 segments.
My gut feel is that 256 will be faster in synthetic benchmarks, but overall we are doubling our reliance of RAM and the CPU caches for a small reduction in invocations of the slow path. It would be good to quantify what that slow path reduction is, bearing in mind this would be for the new improved variant from point #1 above.
Re-derive the magic constants in the source code to provide confidence they are correct. We'll probably have to do this anyway for the 128/256 segment change.

Add new PRNG xoshiro256++

From:
http://prng.di.unimi.it/

Paper with details:
http://vigna.di.unimi.it/ftp/papers/ScrambledLinear.pdf

Define EnableNETAnalyzers in csproj files

See: EnableNETAnalyzers

I think these can replace the current stylecop analyzers, i.e., this section in the csproj:

  <ItemGroup>
    <PackageReference Include="StyleCop.Analyzers" Version="1.1.118">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
  </ItemGroup>

colgreen / redzen Goto Github PK

redzen's Issues

Recommend Projects

Recommend Topics

Recommend Org