dzhang314 / multifloats.jl Goto Github PK

Fast, SIMD-accelerated extended-precision arithmetic for Julia

License: MIT License

Julia 100.00%

multifloats.jl's Issues

Some rounding error issue between CPU & GPU

I've tried on MultiFloats.jl on the GPU, but I'm getting loss of precision compared to the CPU:

using CUDA, MultiFloats
A = rand(Float64x8, 100, 100)
B = rand(Float64x8, 100, 100)
A * B - Array(CuArray(A) * CuArray(B))

Gives me

100×100 Matrix{MultiFloat{Float64, 8}}:
 -1.23827e-98    9.35263e-99  -8.83181e-99  …  -4.70324e-99   -1.3348e-98
 -1.98421e-99    8.20389e-99   1.67043e-98      1.45499e-98    2.32225e-98
 -2.77264e-99   -3.30951e-99   1.32426e-98     -1.09181e-98    7.84157e-100
  1.92544e-98    6.35776e-99  -8.85547e-99      1.29435e-98   -4.89252e-99
 -5.52038e-99    5.35901e-99  -3.705e-98        1.53947e-99    7.38954e-99
 -2.16904e-98    1.64505e-98  -1.16536e-98  …  -3.19036e-98    7.5397e-99
  6.72487e-98    6.07349e-99  -2.87359e-98      ...

but eps(Float64x8) is 5.9091063153828709e-126.

What explain this? The order of iteration?

conversion to/from other extended precision types

This is really a question rather than an issue. I hope that's OK.

I would like to convert between various extended-precision types.
Are these conversions for MultiFloats correct and idiomatic?

using DoubleFloats
using MultiFloats
using Quadmath

@inline MultiFloat{Float64,2}(x::DoubleFloats.Double64) = Float64x2(x.hi) + x.lo

@inline function MultiFloat{Float64,2}(x::Quadmath.Float128)
    hi = Float64(x)
    lo = Float64(x - hi)
    return Float64x2(hi) + lo
end

@inline Quadmath.Float128(x::Float64x2) = Quadmath.Float128(x._limbs[1]) + x._limbs[2]

function test(x)
    d = Double64(x)
    m = Float64x2(x)
    q = Float128(x)
    b = BigFloat(x)
    return d, m, q, b
end

test(Double64(1.0))
test(Float64x2(2.0))
test(Float128(3.0))
test(BigFloat(4.0))

Support a^b

if A and B are MultiFloats, then a^b should return a proper output. For the moment i fall back to exp(b*log(a))

Missing methods in conversion to Integer

julia> Int(Float64x4(3.0))
ERROR: MethodError: no method matching Int64(::MultiFloat{Float64, 4})
Closest candidates are:
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  (::Type{T})(::BigInt) where T<:Union{Int128, Int16, Int32, Int64, Int8} at gmp.jl:356
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[37]:1

julia> Integer(Float64x4(3.0))
ERROR: MethodError: no method matching Integer(::MultiFloat{Float64, 4})
Closest candidates are:
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  (::Type{T})(::Base.TwicePrecision) where T<:Number at twiceprecision.jl:243
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[38]:1

It would be nice for these to work to easily switch between Float64 and MultiFloats.

bug: "clean" mode has worse accuracy than regular Float64

Here's a case where MultiFloats "clean" mode is worse than "standard" mode. To make matters worse, "clean" mode is worse than Float64 in this case:

julia> using MultiFloats

julia> MultiFloats.use_clean_multifloat_arithmetic(20)

julia> const F = MultiFloat{Float64, n} where {n}
Float64x (alias for MultiFloat{Float64})

julia> f(x) = isone(x*inv(x))
f (generic function with 1 method)

julia> f(3.0^100)
true

julia> f(F{13}(3.0^100))
false

julia> MultiFloats.use_standard_multifloat_arithmetic(20)

julia> f(F{13}(3.0^100))
true

julia> MultiFloats.use_clean_multifloat_arithmetic(20)

julia> f(F{13}(3.0^100))
false

So there seems to be a bug somewhere in "clean" mode.

Warning sign in Feature Comparison table

In the Feature Comparison table, there is a warning sign for DoubleFloats on dynamic memory allocation. Could you please elaborate a little bit? Are you referring to the conversion from a Double64 to a (Float64, Float64) pair? Does this explain why MultiFloats is faster than DoubleFloats (assuming both use the same number of operations)?

BTW, thanks for making such a performant library with precise code!

Promotion of Irrationals

julia> pi * Float64x4(2)
ERROR: MethodError: Cannot `convert` an object of type Irrational{:π} to an object of type NTuple{4, Float64}

Perhaps there might be a way to convert these to BigFloat and then to MultiFloats

julia> 2Float64x4(big(pi))
6.283185307179586476925286766559005768394338798750211641949889184604

parse(Float64xN, "1.234") doesn't work

bug: weird issues with big MultiFloats

julia> using MultiFloats

julia> MultiFloats.use_standard_multifloat_arithmetic(20)

julia> g(n::Integer) = rand(MultiFloat{Float64, n})
g (generic function with 1 method)

julia> x = g(20)
-2.2914735159347884e+302

julia> 1 < x
false

julia> x < -1
true

julia> Float64(x)
0.6714350546186427

So it seems there are multiple issues here:

rand produces a number of huge magnitude
rand produces negative number
the conversion to Float64 (producing a positive number) is not consistent with the < operator

No method matching round(MF)

Doing some optimisations, I encountered the lack of rounding method for MultiFloats:

ERROR: MethodError: no method matching round(::MultiFloat{Float64, 4}, ::RoundingMode{:Up})
Closest candidates are:
  round(::Real, ::RoundingMode; digits, sigdigits, base) at floatfuncs.jl:129
  round(::Complex, ::RoundingMode) at complex.jl:1038
  round(::Complex, ::RoundingMode, ::RoundingMode; kwargs...) at complex.jl:1038

Same thing happend for trunc. Setting:

import Base.round, Base.trunc
Base.round(x::MultiFloat{Float64, 4}, y::RoundingMode{:Up}) = MultiFloat{Float64, 4}(Base.round(Float64(x),y))
Base.trunc(x::Type{Int64}, y::MultiFloat{Float64, 4}) = Base.trunc(x::Type{Int64}, Float64(y))

fixed the problem for my script, but i'm not sure it's the right thing to do.

Implement String -> MultiFloat conversion

This is hard to do in a fashion that guarantees the MultiFloat -> String -> MultiFloat round-trip conversion exactly reconstructs the original value, due to the variable-sized gaps between the components of a MultiFloat. We may ultimately want two conversion pathways: one that guarantees round-trip reconstructibility, and one that produces a nice readable result.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Vectorization problems (and how to fix them)

Current codegen is unfortunately not great, for instance a trivial loop like this:

julia> function trivial_sum(xs::Vector{Float64x4})
         t = zero(Float64x4)
         @simd for i in 1:length(xs)
           t += @inbounds xs[i]
         end
         t
       end

generates the following inner loop:

L48:
	vmovsd	-24(%rdx), %xmm5                # xmm5 = mem[0],zero
	vmovsd	-16(%rdx), %xmm6                # xmm6 = mem[0],zero
	vmovsd	-8(%rdx), %xmm7                 # xmm7 = mem[0],zero
	vaddsd	%xmm5, %xmm4, %xmm9
	vsubsd	%xmm4, %xmm9, %xmm0
	vsubsd	%xmm0, %xmm9, %xmm3
	vsubsd	%xmm3, %xmm4, %xmm3
	vsubsd	%xmm0, %xmm5, %xmm0
	vaddsd	%xmm3, %xmm0, %xmm0
	vaddsd	%xmm6, %xmm2, %xmm3
	vsubsd	%xmm2, %xmm3, %xmm4
	vsubsd	%xmm4, %xmm3, %xmm5
	vsubsd	%xmm5, %xmm2, %xmm2
	vsubsd	%xmm4, %xmm6, %xmm4
	vaddsd	%xmm2, %xmm4, %xmm2
	vaddsd	%xmm7, %xmm1, %xmm4
	vsubsd	%xmm1, %xmm4, %xmm5
	vsubsd	%xmm5, %xmm4, %xmm6
	vsubsd	%xmm6, %xmm1, %xmm1
	vsubsd	%xmm5, %xmm7, %xmm5
	vaddsd	%xmm1, %xmm5, %xmm1
	vaddsd	(%rdx), %xmm8, %xmm5
	vaddsd	%xmm1, %xmm5, %xmm1
	vaddsd	%xmm0, %xmm3, %xmm5
	vsubsd	%xmm3, %xmm5, %xmm6
	vsubsd	%xmm6, %xmm5, %xmm7
	vsubsd	%xmm7, %xmm3, %xmm3
	vsubsd	%xmm6, %xmm0, %xmm0
	vaddsd	%xmm3, %xmm0, %xmm0
	vaddsd	%xmm2, %xmm4, %xmm3
	vsubsd	%xmm4, %xmm3, %xmm6
	vsubsd	%xmm6, %xmm3, %xmm7
	vsubsd	%xmm7, %xmm4, %xmm4
	vsubsd	%xmm6, %xmm2, %xmm2
	vaddsd	%xmm4, %xmm2, %xmm2
	vaddsd	%xmm0, %xmm3, %xmm4
	vsubsd	%xmm0, %xmm4, %xmm6
	vsubsd	%xmm6, %xmm4, %xmm7
	vsubsd	%xmm7, %xmm0, %xmm0
	vsubsd	%xmm6, %xmm3, %xmm3
	vaddsd	%xmm0, %xmm3, %xmm0
	vaddsd	%xmm0, %xmm2, %xmm0
	vaddsd	%xmm0, %xmm1, %xmm0
	vaddsd	%xmm0, %xmm4, %xmm1
	vsubsd	%xmm4, %xmm1, %xmm2
	vsubsd	%xmm2, %xmm0, %xmm0
	vaddsd	%xmm1, %xmm5, %xmm2
	vsubsd	%xmm5, %xmm2, %xmm3
	vsubsd	%xmm3, %xmm1, %xmm1
	vaddsd	%xmm2, %xmm9, %xmm4
	vsubsd	%xmm9, %xmm4, %xmm3
	vsubsd	%xmm3, %xmm2, %xmm3
	vaddsd	%xmm3, %xmm1, %xmm2
	vsubsd	%xmm3, %xmm2, %xmm3
	vsubsd	%xmm3, %xmm1, %xmm3
	vaddsd	%xmm3, %xmm0, %xmm1
	vsubsd	%xmm3, %xmm1, %xmm3
	vsubsd	%xmm3, %xmm0, %xmm8
	addq	$32, %rdx
	decq	%rcx
	jne	L48

By relaxing the type restrictions in MultiFloats.jl a bit, and patching VectorizationBase.jl to add another Vec type that doesn't allow for certain fast-math ops, it should be possible to get a 4x speedup on AVX2 and potentially 8x on AVX512 (well, that's the upper bound :p).

A minimal set of changes to allow to use VectorizationBase is here: haampie@0e83c05

The idea is to work with a type MultiFloat{Vec{M,T},N} and run the basic *, /, +, and - operations on 2, 4, or 8 numbers simultaneously.

Convertion error

I ran into a hole in the mesh of methods.

Convertion from Complex is ambiguous since there is a function that creates a multifloat from limbs. Defining :

MultiFloat{T, N}(z::Complex) where {T, N} = MultiFloat{T, N}(BigFloat(z))

filled the hole.

Support ranges (if possible)

These all fail with MethodError: no method matching round(::MultiFloat{Float64, 2}, ::RoundingMode{:Down}):

map(Float64x2, 0:5)
Float64x2(0):Float64x2(5)
Float64x2(0):Float64x2(1):Float64x2(5)

The workaround is to use generators:

julia> (Float64x2(i) for i in 0:3)
Base.Generator{UnitRange{Int64}, var"#3#4"}(var"#3#4"(), 0:3)

feature request: make prevfloat and nextfloat accept the optional integer argument

See the second argument here, note that it can also be negative or zero: https://docs.julialang.org/en/v1/base/numbers/#Base.nextfloat

A stopgap implementation is obvious: just call nextfloat/prevfloat repeatedly.

^ not defined.

I have the following stacktrace:

ERROR: ^ not defined for MultiFloat{Float64, 4}
Stacktrace:
  [1] error(::String, ::String, ::Type)
    @ Base .\error.jl:42
  [2] no_op_err(name::String, T::Type)
    @ Base .\promotion.jl:395
  [3] ^(x::MultiFloat{Float64, 4}, y::MultiFloat{Float64, 4})

exp2, exp10, log2, log10 implimentation

It is probably a good idea to use exp2 as the fundamental exponential function (it has fewer roundings) and then just implement exp and exp10 with an extra multiple by log2(e) and log2(10) respectively. A similar strategy can be used for logarithms.

Loss of accuracy in `*` / `+`

After running the QR algorithm for a bunch of iterations, I'm hitting 99% of the cases numbers like these, which lose precision when multiplied:

julia> x = Float64x4((-1.4426767353575964e-39, -4.620737599002311e-57, 1.1532128077549776e-66, -9.883310571778616e-83))

julia> y = big(x)

julia> x * x - y * y
-1.329884630995506956660201769041660088331658728714499508848494315565488248944506e-132

julia> abs(x) * abs(x) - y * y
-5.683388719909582042824734830998315327342882525094367086266805748619453533154653e-144

Relative errors:

julia> (x * x - y * y) / (y * y)
-6.389632939012188407781360510046084336131978165576939912207721601547110752468559e-55

julia> (abs(x) * abs(x) - y * y) / (y * y)
-2.730670535139620858770944035756030902664721599555555738618279743956037242156906e-66

I think they're not normalized.

Example. Input matrix:

julia> H1
6×6 Matrix{MultiFloat{Float64, 4}}:
  1.98795  -0.354736   0.0797913   0.148136     -0.164454   0.369633
 -0.19463   0.598573   0.429203   -0.486762      0.239162   0.00142155
  0.0       0.0290679  0.556247   -0.562008     -0.385439  -0.214523
  0.0       0.0        0.511856    1.12336      -0.602872  -0.206262
  0.0       0.0        0.0        -1.49082e-25   1.38356   -0.937583
  0.0       0.0        0.0         0.0           0.022335   1.20556

Apply a double shift of the QR algorithm, and move the "bulge" to the last two rows, it ends up looking like this:

julia> H2
6×6 Matrix{MultiFloat{Float64, 4}}:
 1.9865    0.358077    0.115383   0.13357      -0.182391    0.361471
 0.200765  0.554469    0.581365  -0.204676     -0.249824   -0.0259713
 0.0       0.0246946   1.0663     0.734426     -0.377182   -0.11517
 0.0       0.0        -0.359419   0.658852     -0.582584   -0.315445
 0.0       0.0         0.0       -1.83218e-50   1.43778    -0.923614
 0.0       0.0         0.0       -3.00836e-51   0.0363045   1.15134

Here accuracy is still fine:

julia> H * Q - Q * H2 |> norm
5.414815616357242382449125468599721608592507512793183811901540346944e-64

julia> Q'Q-I |> norm
4.416391581658617777192079003247210327007965703744734256761335112015e-64

But the last Given's rotation that zeros out the -3.0e-51 value is completely inaccurate:

julia> H3
6×6 Matrix{MultiFloat{Float64, 4}}:
 1.9865    0.358077    0.115383   0.13357      -0.121414    0.386247
 0.200765  0.554469    0.581365  -0.204676     -0.250731    0.0148499
 0.0       0.0246946   1.0663     0.734426     -0.390859   -0.052535
 0.0       0.0        -0.359419   0.658852     -0.625997   -0.216883
 0.0       0.0         0.0       -1.85672e-50   1.28839    -0.946116
 0.0       0.0         0.0        0.0           0.0138023   1.30073

julia> Q'Q-I |> norm
1.1366313071582258344782136697615383422393886966249503201836254869903e-47

julia> H * Q - Q * H3 |> norm
1.656359278104309870083428882117651148392957207448309578999648077939e-47

That's probably because the numbers of which a rotation is computed are not normalized:

julia> H2 .+ 0.0 .=== H2
6×6 BitMatrix:
 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  0  1  1
 1  1  1  0  1  1

If I normalize them "by hand", it looks like accuracy is restored:

julia> c, s, = LinearAlgebra.givensAlgorithm(a, b)
(0.9867864895853341289209073557253047327730824456644072969143861256782, 0.16202599782705631811014484204531232306332276631781213554170381102448, -1.85671644730184222985716651733286367867540289938089541635541279028e-50)

julia> c*c+s*s-1
8.037197050005110683132676344698446e-48

julia> c, s, = LinearAlgebra.givensAlgorithm(a + 0.0, b + 0.0)
(0.9867864895853341289209073557253047327730824456604417981828460524574, 0.16202599782705631811014484204531232306332276631716101810582393526185, -1.856716447301842229857166517332863678675402899388356814331887958311e-50)

julia> c*c+s*s-1
-1.1543976842476927e-64

But... I can't write my algorithm like that.

How can I avoid loss of accuracy?

Wrong or outdated claim, and suggestion

Hi,

Is your claim about ArbFloat here:
@JeffreySarnoff
https://github.com/JeffreySarnoff/ArbNumerics.jl

slower than BigFloat wrong (and also slower than your type)? Is it only about QR? I understood ArbFloat to be faster usually, even at the same precision, but not always.

Of course it is faster than BigFloat on its default, but that's sort of unfair. I guess your 106 bit case is best case, and at higher ArbFloat wins. Many default to BigFloat after Float64, since they don't know better, why I intend to get that package documented, and now likely your package too.

Since you define all the basic operators, you could define all, just not yet done? That's a pro of ArbNumerics, also interval option.

I'm thinking should it add your package as a dependency? Could it use your type unless you ask for some inconvenient/large precision, then fall back to its implementation? And then your type could get the transcendentals that way?

`repr`

repr(x) should output

Float64xN((x1, x2, ..., xN))

That way you can actually copy and paste them.

Suboptimal performance of multiplication by an integer

Just a little note on performance:

f(v::Vector{<:Real}, binop::F, f::Real) where {F <: Function} =
  map!(
    let f = f
      e -> binop(e, f)
    end,
    v,
    v)

@benchmark f(v, *, Float64x4(3)) setup=(v = rand(Float64x4, 2^14))  # median time: 260.556 μs

@benchmark f(v, *, 3.0) setup=(v = rand(Float64x4, 2^14))  # median time: 169.787 μs

@benchmark f(v, *, 0x3) setup=(v = rand(Float64x4, 2^14))  # median time: 250.470 μs

@benchmark f(v, *, Float32(3)) setup=(v = rand(Float64x4, 2^14))  # median time: 249.599 μs

Comparing the first and second benchmarks, it seems that multiplication with Float64 is special-cased, leading to very good performance.

The third and fourth benchmarks are disappointing, though, I guess that the UInt8 and Float32 are first converted to Float64x4 before the multiplication?

Perhaps converting to Float64 instead of to Float64x4 would lead to better performance? Or maybe generating special cases for multiplication with small integers or with Float32?

sincos leads to stackoverflow

Because of this bit of code

_sincos(x::AbstractFloat) = sincos(x)
_sincos(x) = (sin(x), cos(x))

sincos(x) = _sincos(float(x))

in
julia\base\special\trig.jl

Calling sincos with a MultiFloat causes a stackoverflow.

Minimum working example

using MultiFloats
sincos(Float64x2(4))
ERROR: StackOverflowError:
Stacktrace:
     [1] _sincos(x::MultiFloat{Float64, 2})
       @ Base.Math .\special\trig.jl:203
     [2] sincos(x::MultiFloat{Float64, 2})
       @ Base.Math .\special\trig.jl:206
--- the last 2 lines are repeated 39990 more times ---
 [79983] _sincos(x::MultiFloat{Float64, 2})
       @ Base.Math .\special\trig.jl:203

I'm trying to use conv from DSP.jl and GenericFFT.jl, is there a workaround for this? I don't have a firm grasp on how to change the method used by conv in a scenario like this.

P.S. Why would one write code like the above, it seems designed to make a stack overflow in the case that calling float does nothing

Correct implementation of hypot(x, y)

MultiFloats.jl currently has a stop-gap implementation of Base.hypot(x, y) to make MultiFloat usable with the eigenvalue algorithms in GenericLinearAlgebra.jl. This implementation does not correctly avoid undue overflow/underflow, but it should be possible to write an implementation that does.

Dividing x ≠ 0 by zero returns NaN instead of ±Inf

The convention for dividing by zero is not uniform across all levels of precision:

one(Float64) / zero(Float64)      # Inf
one(Float64x1) / zero(Float64x1)  # Inf
one(Float64x2) / zero(Float64x2)  # NaN

Is it possible to have Float64x2, etc follow the same convention as Float64?
I have parts of my code in Tulip.jl that rely on this convention.

I think this should cover all cases:

`x`	`y`	`x/y`
>0	+0	`+Inf`
>0	-0	`+Inf`
±0	±0	`NaN`
<0	+0	`-Inf`
<0	-0	`+Inf`

Catastrophic loss of precision in `hypot`

hypot is just not correctly implemented. it loses all digits of precision for small as well as large values.

using MultiFloats
small = 1e-300
@assert hypot(small, small) / sqrt(2) ≈ 1e-300
xsmall = Float64x2(small)
@assert hypot(xsmall, xsmall) / sqrt(2) ≈ 1e-300

One should probably have some really robust precision audit as part of the unit tests.

`significand`, `frexp` not defined

julia> significand(Float64x2(123.4))
ERROR: MethodError: no method matching significand(::MultiFloat{Float64, 2})
Closest candidates are:
  significand(::T) where T<:Union{Float16, Float32, Float64} at math.jl:896
  significand(::DoubleFloat{T}) where T<:Union{Float16, Float32, Float64} at ~/.julia/packages/DoubleFloats/h3HrU/src/math/prearith/prearith.jl:68
  significand(::BigFloat) at mpfr.jl:873
...

julia> significand(123.4)
1.928125

Xref JuliaMath/DoubleFloats.jl#148

Implement fully-unrolled renormalization loop

Repeatedly adding zero is a pretty awful way to renormalize a MultiFloat.

Exp/log implementation burden ?

Hey,

In my application, I only require exponentials and logarithms uppon what is already implemented.

How hard would it be to add those two functions ? Do you know how it should be done ? I would be happy to contribute.

Edit: Could a temporary fix be introduced by convertion to BigFloat back and forth ? I know this is highly sub-optimal, but it'll allow code to at least run. Since exp/log are not the most part of my workflow, it might still provide a huge performance gain over bigfloat for me.

`exp`, `log` and friends slow.

JuliaMath/DoubleFloats.jl#136 gives DoubleFloats an exp (and exp2, exp10) implementation that is around 4x faster than what MultiFloats is using. If you wanted this version to work for all levels of precision, it would take some extra engineering, but IMO, it would be worth it.

Inf + 1 returns NaN

There seems to be something wrong with how Inf is treated when higher precision is used:

julia> Float64x1(-Inf) - 1
-Inf
julia> Float64x2(-Inf) - 1  # Same happens for Float64x3, etc
NaN

and

julia> Float64x1(Inf) + 1
Inf
julia> Float64x2(Inf) + 1  # Same happens for Float64x3, etc
NaN

Why warning about DoubleFloats allocation?

The table at the bottom of this package's README has a warning sign in the DoubleFloats column for "avoids dynamic memory allocation". Why is this? I've never noticed such allocations.

StackOverflowError using @printf

using MultiFloats
using Printf
@printf("%f", Float64x2(3))

ERROR: StackOverflowError:
Stacktrace:
 [1] fix_dec(::MultiFloat{Float64,2}, ::Int64, ::Array{UInt8,1}) at .\printf.jl:987 (repeats 80000 times)

Probably similar to issue with DoubleFloats:

dzhang314 / multifloats.jl Goto Github PK

multifloats.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org