dzhang314 / multifloats.jl Goto Github PK
View Code? Open in Web Editor NEWFast, SIMD-accelerated extended-precision arithmetic for Julia
License: MIT License
Fast, SIMD-accelerated extended-precision arithmetic for Julia
License: MIT License
I've tried on MultiFloats.jl on the GPU, but I'm getting loss of precision compared to the CPU:
using CUDA, MultiFloats
A = rand(Float64x8, 100, 100)
B = rand(Float64x8, 100, 100)
A * B - Array(CuArray(A) * CuArray(B))
Gives me
100×100 Matrix{MultiFloat{Float64, 8}}:
-1.23827e-98 9.35263e-99 -8.83181e-99 … -4.70324e-99 -1.3348e-98
-1.98421e-99 8.20389e-99 1.67043e-98 1.45499e-98 2.32225e-98
-2.77264e-99 -3.30951e-99 1.32426e-98 -1.09181e-98 7.84157e-100
1.92544e-98 6.35776e-99 -8.85547e-99 1.29435e-98 -4.89252e-99
-5.52038e-99 5.35901e-99 -3.705e-98 1.53947e-99 7.38954e-99
-2.16904e-98 1.64505e-98 -1.16536e-98 … -3.19036e-98 7.5397e-99
6.72487e-98 6.07349e-99 -2.87359e-98 ...
but eps(Float64x8)
is 5.9091063153828709e-126
.
What explain this? The order of iteration?
This is really a question rather than an issue. I hope that's OK.
I would like to convert between various extended-precision types.
Are these conversions for MultiFloats
correct and idiomatic?
using DoubleFloats
using MultiFloats
using Quadmath
@inline MultiFloat{Float64,2}(x::DoubleFloats.Double64) = Float64x2(x.hi) + x.lo
@inline function MultiFloat{Float64,2}(x::Quadmath.Float128)
hi = Float64(x)
lo = Float64(x - hi)
return Float64x2(hi) + lo
end
@inline Quadmath.Float128(x::Float64x2) = Quadmath.Float128(x._limbs[1]) + x._limbs[2]
function test(x)
d = Double64(x)
m = Float64x2(x)
q = Float128(x)
b = BigFloat(x)
return d, m, q, b
end
test(Double64(1.0))
test(Float64x2(2.0))
test(Float128(3.0))
test(BigFloat(4.0))
if A and B are MultiFloats, then a^b
should return a proper output. For the moment i fall back to exp(b*log(a))
julia> Int(Float64x4(3.0))
ERROR: MethodError: no method matching Int64(::MultiFloat{Float64, 4})
Closest candidates are:
(::Type{T})(::T) where T<:Number at boot.jl:760
(::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
(::Type{T})(::BigInt) where T<:Union{Int128, Int16, Int32, Int64, Int8} at gmp.jl:356
...
Stacktrace:
[1] top-level scope
@ REPL[37]:1
julia> Integer(Float64x4(3.0))
ERROR: MethodError: no method matching Integer(::MultiFloat{Float64, 4})
Closest candidates are:
(::Type{T})(::T) where T<:Number at boot.jl:760
(::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
(::Type{T})(::Base.TwicePrecision) where T<:Number at twiceprecision.jl:243
...
Stacktrace:
[1] top-level scope
@ REPL[38]:1
It would be nice for these to work to easily switch between Float64
and MultiFloat
s.
Here's a case where MultiFloats "clean" mode is worse than "standard" mode. To make matters worse, "clean" mode is worse than Float64
in this case:
julia> using MultiFloats
julia> MultiFloats.use_clean_multifloat_arithmetic(20)
julia> const F = MultiFloat{Float64, n} where {n}
Float64x (alias for MultiFloat{Float64})
julia> f(x) = isone(x*inv(x))
f (generic function with 1 method)
julia> f(3.0^100)
true
julia> f(F{13}(3.0^100))
false
julia> MultiFloats.use_standard_multifloat_arithmetic(20)
julia> f(F{13}(3.0^100))
true
julia> MultiFloats.use_clean_multifloat_arithmetic(20)
julia> f(F{13}(3.0^100))
false
So there seems to be a bug somewhere in "clean" mode.
In the Feature Comparison table, there is a warning sign for DoubleFloats on dynamic memory allocation. Could you please elaborate a little bit? Are you referring to the conversion from a Double64 to a (Float64, Float64) pair? Does this explain why MultiFloats is faster than DoubleFloats (assuming both use the same number of operations)?
BTW, thanks for making such a performant library with precise code!
julia> pi * Float64x4(2)
ERROR: MethodError: Cannot `convert` an object of type Irrational{:π} to an object of type NTuple{4, Float64}
Perhaps there might be a way to convert these to BigFloat
and then to MultiFloat
s
julia> 2Float64x4(big(pi))
6.283185307179586476925286766559005768394338798750211641949889184604
julia> using MultiFloats
julia> MultiFloats.use_standard_multifloat_arithmetic(20)
julia> g(n::Integer) = rand(MultiFloat{Float64, n})
g (generic function with 1 method)
julia> x = g(20)
-2.2914735159347884e+302
julia> 1 < x
false
julia> x < -1
true
julia> Float64(x)
0.6714350546186427
So it seems there are multiple issues here:
rand
produces a number of huge magnituderand
produces negative numberFloat64
(producing a positive number) is not consistent with the <
operatorDoing some optimisations, I encountered the lack of rounding method for MultiFloats:
ERROR: MethodError: no method matching round(::MultiFloat{Float64, 4}, ::RoundingMode{:Up})
Closest candidates are:
round(::Real, ::RoundingMode; digits, sigdigits, base) at floatfuncs.jl:129
round(::Complex, ::RoundingMode) at complex.jl:1038
round(::Complex, ::RoundingMode, ::RoundingMode; kwargs...) at complex.jl:1038
Same thing happend for trunc
. Setting:
import Base.round, Base.trunc
Base.round(x::MultiFloat{Float64, 4}, y::RoundingMode{:Up}) = MultiFloat{Float64, 4}(Base.round(Float64(x),y))
Base.trunc(x::Type{Int64}, y::MultiFloat{Float64, 4}) = Base.trunc(x::Type{Int64}, Float64(y))
fixed the problem for my script, but i'm not sure it's the right thing to do.
This is hard to do in a fashion that guarantees the MultiFloat -> String -> MultiFloat round-trip conversion exactly reconstructs the original value, due to the variable-sized gaps between the components of a MultiFloat. We may ultimately want two conversion pathways: one that guarantees round-trip reconstructibility, and one that produces a nice readable result.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Current codegen is unfortunately not great, for instance a trivial loop like this:
julia> function trivial_sum(xs::Vector{Float64x4})
t = zero(Float64x4)
@simd for i in 1:length(xs)
t += @inbounds xs[i]
end
t
end
generates the following inner loop:
L48:
vmovsd -24(%rdx), %xmm5 # xmm5 = mem[0],zero
vmovsd -16(%rdx), %xmm6 # xmm6 = mem[0],zero
vmovsd -8(%rdx), %xmm7 # xmm7 = mem[0],zero
vaddsd %xmm5, %xmm4, %xmm9
vsubsd %xmm4, %xmm9, %xmm0
vsubsd %xmm0, %xmm9, %xmm3
vsubsd %xmm3, %xmm4, %xmm3
vsubsd %xmm0, %xmm5, %xmm0
vaddsd %xmm3, %xmm0, %xmm0
vaddsd %xmm6, %xmm2, %xmm3
vsubsd %xmm2, %xmm3, %xmm4
vsubsd %xmm4, %xmm3, %xmm5
vsubsd %xmm5, %xmm2, %xmm2
vsubsd %xmm4, %xmm6, %xmm4
vaddsd %xmm2, %xmm4, %xmm2
vaddsd %xmm7, %xmm1, %xmm4
vsubsd %xmm1, %xmm4, %xmm5
vsubsd %xmm5, %xmm4, %xmm6
vsubsd %xmm6, %xmm1, %xmm1
vsubsd %xmm5, %xmm7, %xmm5
vaddsd %xmm1, %xmm5, %xmm1
vaddsd (%rdx), %xmm8, %xmm5
vaddsd %xmm1, %xmm5, %xmm1
vaddsd %xmm0, %xmm3, %xmm5
vsubsd %xmm3, %xmm5, %xmm6
vsubsd %xmm6, %xmm5, %xmm7
vsubsd %xmm7, %xmm3, %xmm3
vsubsd %xmm6, %xmm0, %xmm0
vaddsd %xmm3, %xmm0, %xmm0
vaddsd %xmm2, %xmm4, %xmm3
vsubsd %xmm4, %xmm3, %xmm6
vsubsd %xmm6, %xmm3, %xmm7
vsubsd %xmm7, %xmm4, %xmm4
vsubsd %xmm6, %xmm2, %xmm2
vaddsd %xmm4, %xmm2, %xmm2
vaddsd %xmm0, %xmm3, %xmm4
vsubsd %xmm0, %xmm4, %xmm6
vsubsd %xmm6, %xmm4, %xmm7
vsubsd %xmm7, %xmm0, %xmm0
vsubsd %xmm6, %xmm3, %xmm3
vaddsd %xmm0, %xmm3, %xmm0
vaddsd %xmm0, %xmm2, %xmm0
vaddsd %xmm0, %xmm1, %xmm0
vaddsd %xmm0, %xmm4, %xmm1
vsubsd %xmm4, %xmm1, %xmm2
vsubsd %xmm2, %xmm0, %xmm0
vaddsd %xmm1, %xmm5, %xmm2
vsubsd %xmm5, %xmm2, %xmm3
vsubsd %xmm3, %xmm1, %xmm1
vaddsd %xmm2, %xmm9, %xmm4
vsubsd %xmm9, %xmm4, %xmm3
vsubsd %xmm3, %xmm2, %xmm3
vaddsd %xmm3, %xmm1, %xmm2
vsubsd %xmm3, %xmm2, %xmm3
vsubsd %xmm3, %xmm1, %xmm3
vaddsd %xmm3, %xmm0, %xmm1
vsubsd %xmm3, %xmm1, %xmm3
vsubsd %xmm3, %xmm0, %xmm8
addq $32, %rdx
decq %rcx
jne L48
By relaxing the type restrictions in MultiFloats.jl a bit, and patching VectorizationBase.jl to add another Vec type that doesn't allow for certain fast-math ops, it should be possible to get a 4x speedup on AVX2 and potentially 8x on AVX512 (well, that's the upper bound :p).
A minimal set of changes to allow to use VectorizationBase is here: haampie@0e83c05
The idea is to work with a type MultiFloat{Vec{M,T},N} and run the basic *, /, +, and - operations on 2, 4, or 8 numbers simultaneously.
I ran into a hole in the mesh of methods.
Convertion from Complex is ambiguous since there is a function that creates a multifloat from limbs. Defining :
MultiFloat{T, N}(z::Complex) where {T, N} = MultiFloat{T, N}(BigFloat(z))
filled the hole.
These all fail with MethodError: no method matching round(::MultiFloat{Float64, 2}, ::RoundingMode{:Down})
:
map(Float64x2, 0:5)
Float64x2(0):Float64x2(5)
Float64x2(0):Float64x2(1):Float64x2(5)
The workaround is to use generators:
julia> (Float64x2(i) for i in 0:3)
Base.Generator{UnitRange{Int64}, var"#3#4"}(var"#3#4"(), 0:3)
See the second argument here, note that it can also be negative or zero: https://docs.julialang.org/en/v1/base/numbers/#Base.nextfloat
A stopgap implementation is obvious: just call nextfloat
/prevfloat
repeatedly.
I have the following stacktrace:
ERROR: ^ not defined for MultiFloat{Float64, 4}
Stacktrace:
[1] error(::String, ::String, ::Type)
@ Base .\error.jl:42
[2] no_op_err(name::String, T::Type)
@ Base .\promotion.jl:395
[3] ^(x::MultiFloat{Float64, 4}, y::MultiFloat{Float64, 4})
It is probably a good idea to use exp2
as the fundamental exponential function (it has fewer roundings) and then just implement exp
and exp10
with an extra multiple by log2(e)
and log2(10)
respectively. A similar strategy can be used for logarithms.
After running the QR algorithm for a bunch of iterations, I'm hitting 99% of the cases numbers like these, which lose precision when multiplied:
julia> x = Float64x4((-1.4426767353575964e-39, -4.620737599002311e-57, 1.1532128077549776e-66, -9.883310571778616e-83))
julia> y = big(x)
julia> x * x - y * y
-1.329884630995506956660201769041660088331658728714499508848494315565488248944506e-132
julia> abs(x) * abs(x) - y * y
-5.683388719909582042824734830998315327342882525094367086266805748619453533154653e-144
Relative errors:
julia> (x * x - y * y) / (y * y)
-6.389632939012188407781360510046084336131978165576939912207721601547110752468559e-55
julia> (abs(x) * abs(x) - y * y) / (y * y)
-2.730670535139620858770944035756030902664721599555555738618279743956037242156906e-66
I think they're not normalized.
Example. Input matrix:
julia> H1
6×6 Matrix{MultiFloat{Float64, 4}}:
1.98795 -0.354736 0.0797913 0.148136 -0.164454 0.369633
-0.19463 0.598573 0.429203 -0.486762 0.239162 0.00142155
0.0 0.0290679 0.556247 -0.562008 -0.385439 -0.214523
0.0 0.0 0.511856 1.12336 -0.602872 -0.206262
0.0 0.0 0.0 -1.49082e-25 1.38356 -0.937583
0.0 0.0 0.0 0.0 0.022335 1.20556
Apply a double shift of the QR algorithm, and move the "bulge" to the last two rows, it ends up looking like this:
julia> H2
6×6 Matrix{MultiFloat{Float64, 4}}:
1.9865 0.358077 0.115383 0.13357 -0.182391 0.361471
0.200765 0.554469 0.581365 -0.204676 -0.249824 -0.0259713
0.0 0.0246946 1.0663 0.734426 -0.377182 -0.11517
0.0 0.0 -0.359419 0.658852 -0.582584 -0.315445
0.0 0.0 0.0 -1.83218e-50 1.43778 -0.923614
0.0 0.0 0.0 -3.00836e-51 0.0363045 1.15134
Here accuracy is still fine:
julia> H * Q - Q * H2 |> norm
5.414815616357242382449125468599721608592507512793183811901540346944e-64
julia> Q'Q-I |> norm
4.416391581658617777192079003247210327007965703744734256761335112015e-64
But the last Given's rotation that zeros out the -3.0e-51 value is completely inaccurate:
julia> H3
6×6 Matrix{MultiFloat{Float64, 4}}:
1.9865 0.358077 0.115383 0.13357 -0.121414 0.386247
0.200765 0.554469 0.581365 -0.204676 -0.250731 0.0148499
0.0 0.0246946 1.0663 0.734426 -0.390859 -0.052535
0.0 0.0 -0.359419 0.658852 -0.625997 -0.216883
0.0 0.0 0.0 -1.85672e-50 1.28839 -0.946116
0.0 0.0 0.0 0.0 0.0138023 1.30073
julia> Q'Q-I |> norm
1.1366313071582258344782136697615383422393886966249503201836254869903e-47
julia> H * Q - Q * H3 |> norm
1.656359278104309870083428882117651148392957207448309578999648077939e-47
That's probably because the numbers of which a rotation is computed are not normalized:
julia> H2 .+ 0.0 .=== H2
6×6 BitMatrix:
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 0 1 1
1 1 1 0 1 1
If I normalize them "by hand", it looks like accuracy is restored:
julia> c, s, = LinearAlgebra.givensAlgorithm(a, b)
(0.9867864895853341289209073557253047327730824456644072969143861256782, 0.16202599782705631811014484204531232306332276631781213554170381102448, -1.85671644730184222985716651733286367867540289938089541635541279028e-50)
julia> c*c+s*s-1
8.037197050005110683132676344698446e-48
julia> c, s, = LinearAlgebra.givensAlgorithm(a + 0.0, b + 0.0)
(0.9867864895853341289209073557253047327730824456604417981828460524574, 0.16202599782705631811014484204531232306332276631716101810582393526185, -1.856716447301842229857166517332863678675402899388356814331887958311e-50)
julia> c*c+s*s-1
-1.1543976842476927e-64
But... I can't write my algorithm like that.
How can I avoid loss of accuracy?
Hi,
Is your claim about ArbFloat here:
@JeffreySarnoff
https://github.com/JeffreySarnoff/ArbNumerics.jl
slower than BigFloat wrong (and also slower than your type)? Is it only about QR? I understood ArbFloat to be faster usually, even at the same precision, but not always.
Of course it is faster than BigFloat on its default, but that's sort of unfair. I guess your 106 bit case is best case, and at higher ArbFloat wins. Many default to BigFloat after Float64, since they don't know better, why I intend to get that package documented, and now likely your package too.
Since you define all the basic operators, you could define all, just not yet done? That's a pro of ArbNumerics, also interval option.
I'm thinking should it add your package as a dependency? Could it use your type unless you ask for some inconvenient/large precision, then fall back to its implementation? And then your type could get the transcendentals that way?
repr(x)
should output
Float64xN((x1, x2, ..., xN))
That way you can actually copy and paste them.
Just a little note on performance:
f(v::Vector{<:Real}, binop::F, f::Real) where {F <: Function} =
map!(
let f = f
e -> binop(e, f)
end,
v,
v)
@benchmark f(v, *, Float64x4(3)) setup=(v = rand(Float64x4, 2^14)) # median time: 260.556 μs
@benchmark f(v, *, 3.0) setup=(v = rand(Float64x4, 2^14)) # median time: 169.787 μs
@benchmark f(v, *, 0x3) setup=(v = rand(Float64x4, 2^14)) # median time: 250.470 μs
@benchmark f(v, *, Float32(3)) setup=(v = rand(Float64x4, 2^14)) # median time: 249.599 μs
Comparing the first and second benchmarks, it seems that multiplication with Float64
is special-cased, leading to very good performance.
The third and fourth benchmarks are disappointing, though, I guess that the UInt8
and Float32
are first converted to Float64x4
before the multiplication?
Perhaps converting to Float64
instead of to Float64x4
would lead to better performance? Or maybe generating special cases for multiplication with small integers or with Float32
?
Because of this bit of code
_sincos(x::AbstractFloat) = sincos(x)
_sincos(x) = (sin(x), cos(x))
sincos(x) = _sincos(float(x))
in
julia\base\special\trig.jl
Calling sincos with a MultiFloat causes a stackoverflow.
Minimum working example
using MultiFloats
sincos(Float64x2(4))
ERROR: StackOverflowError:
Stacktrace:
[1] _sincos(x::MultiFloat{Float64, 2})
@ Base.Math .\special\trig.jl:203
[2] sincos(x::MultiFloat{Float64, 2})
@ Base.Math .\special\trig.jl:206
--- the last 2 lines are repeated 39990 more times ---
[79983] _sincos(x::MultiFloat{Float64, 2})
@ Base.Math .\special\trig.jl:203
I'm trying to use conv
from DSP.jl and GenericFFT.jl, is there a workaround for this? I don't have a firm grasp on how to change the method used by conv in a scenario like this.
P.S. Why would one write code like the above, it seems designed to make a stack overflow in the case that calling float
does nothing
MultiFloats.jl currently has a stop-gap implementation of Base.hypot(x, y)
to make MultiFloat
usable with the eigenvalue algorithms in GenericLinearAlgebra.jl. This implementation does not correctly avoid undue overflow/underflow, but it should be possible to write an implementation that does.
The convention for dividing by zero is not uniform across all levels of precision:
one(Float64) / zero(Float64) # Inf
one(Float64x1) / zero(Float64x1) # Inf
one(Float64x2) / zero(Float64x2) # NaN
Is it possible to have Float64x2
, etc follow the same convention as Float64
?
I have parts of my code in Tulip.jl that rely on this convention.
I think this should cover all cases:
x |
y |
x/y |
---|---|---|
>0 | +0 | +Inf |
>0 | -0 | +Inf |
±0 | ±0 | NaN |
<0 | +0 | -Inf |
<0 | -0 | +Inf |
hypot
is just not correctly implemented. it loses all digits of precision for small as well as large values.
using MultiFloats
small = 1e-300
@assert hypot(small, small) / sqrt(2) ≈ 1e-300
xsmall = Float64x2(small)
@assert hypot(xsmall, xsmall) / sqrt(2) ≈ 1e-300
One should probably have some really robust precision audit as part of the unit tests.
julia> significand(Float64x2(123.4))
ERROR: MethodError: no method matching significand(::MultiFloat{Float64, 2})
Closest candidates are:
significand(::T) where T<:Union{Float16, Float32, Float64} at math.jl:896
significand(::DoubleFloat{T}) where T<:Union{Float16, Float32, Float64} at ~/.julia/packages/DoubleFloats/h3HrU/src/math/prearith/prearith.jl:68
significand(::BigFloat) at mpfr.jl:873
...
julia> significand(123.4)
1.928125
Repeatedly adding zero is a pretty awful way to renormalize
a MultiFloat
.
Hey,
In my application, I only require exponentials and logarithms uppon what is already implemented.
How hard would it be to add those two functions ? Do you know how it should be done ? I would be happy to contribute.
Edit: Could a temporary fix be introduced by convertion to BigFloat back and forth ? I know this is highly sub-optimal, but it'll allow code to at least run. Since exp/log are not the most part of my workflow, it might still provide a huge performance gain over bigfloat for me.
JuliaMath/DoubleFloats.jl#136 gives DoubleFloats
an exp
(and exp2
, exp10
) implementation that is around 4x faster than what MultiFloats
is using. If you wanted this version to work for all levels of precision, it would take some extra engineering, but IMO, it would be worth it.
There seems to be something wrong with how Inf
is treated when higher precision is used:
julia> Float64x1(-Inf) - 1
-Inf
julia> Float64x2(-Inf) - 1 # Same happens for Float64x3, etc
NaN
and
julia> Float64x1(Inf) + 1
Inf
julia> Float64x2(Inf) + 1 # Same happens for Float64x3, etc
NaN
The table at the bottom of this package's README has a warning sign in the DoubleFloats
column for "avoids dynamic memory allocation". Why is this? I've never noticed such allocations.
using MultiFloats
using Printf
@printf("%f", Float64x2(3))
ERROR: StackOverflowError:
Stacktrace:
[1] fix_dec(::MultiFloat{Float64,2}, ::Int64, ::Array{UInt8,1}) at .\printf.jl:987 (repeats 80000 times)
Probably similar to issue with DoubleFloats
:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.