Git Product home page Git Product logo

Comments (28)

Tortar avatar Tortar commented on May 18, 2024 2

Indeed, it would be really okay if @default_eltype was just public API so that I could just use it as a fallback with confidence when eltype returns Any which can be also mentioned in the docstring:

infer_eltype(itr) = (T = eltype(itr)) === Any ? @default_eltype(itr) : T

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024 2

We might want to special case Union{} because sometimes type inference can infer that the itr is empty, in which case it tells us Union{} which is true and precise and not usually what folks are looking for

julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
           sum([1]) == 0
       end
Int64

julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
           sum(1) == 0
       end
Union{}

from julia.

nsajko avatar nsajko commented on May 18, 2024 1

xref #54207

from julia.

Tortar avatar Tortar commented on May 18, 2024 1

Another possibility is to create a new function as above and export it or even something maybe a bit stronger as

infer_eltype(itr) = ifelse((T2 = Base.@default_eltype(itr)) <: (T1 = eltype(itr)), T2, T1)

with all the needed disclaimers in the docstring

from julia.

Tortar avatar Tortar commented on May 18, 2024 1

If we are going the route where we export a new function, which to me seems preferable, I (maybe mis)understood what @LilithHafner has suggested to be implemented as for example

function infer_eltype(itr)
    T1, T2 = eltype(itr), Base.@default_eltype(itr)
    ifelse(T2 !== Union{} && T2 <: T1, T2, T1)
end

from julia.

aplavin avatar aplavin commented on May 18, 2024 1

when it is possible to prove that an iterator is empty, a really good type inference algorithm will produce Union{}, which is the narrowest type.<...> We happen to be running into this problem now as type inference is good enough to make these emptiness proofs.

This is great – inference is improving!

In the past some folks (including Base) have at times relied on compiler limitations by expecting type inference not to return Union{} here.

And this seems to be the actual issue.
It's just really weird to make the inferred type artificially wider, as if the inference was less precise.

from julia.

Tortar avatar Tortar commented on May 18, 2024 1

Which raises the question: when would someone use this infer_eltype function?

I would say at least whoever is using Base.@default_eltype currently, actually probably more because it is not even exported now

from julia.

Tortar avatar Tortar commented on May 18, 2024 1

Seems like around 20 packages some of which very popular are using it: https://juliahub.com/ui/Search?q=@default_eltype&type=code

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024 1

IMO a pull request adding infer_eltype (with appropriate docs) should be accepted.

from julia.

vtjnash avatar vtjnash commented on May 18, 2024 1

It was a recent performance improvement for collect, which is designed not to return random Union types normally (only having trouble with the empty case). The exact PR for it should be findable by git blame if interested in the full picture

from julia.

nsajko avatar nsajko commented on May 18, 2024

As far as I remember, Base.@default_eltype relies on type inference. Type inference works on a best-effort basis, so it's good for optimizations in some cases, but mustn't be relied on for correctness. So the change you're proposing wouldn't be correct.

Apart from that, the run time cost of type inference wouldn't be acceptable for eltype.

from julia.

Tortar avatar Tortar commented on May 18, 2024

Consider that though collect(::Base.Generator) relies on Base.@default_eltype to infer the type for the array. So it would be logical in my opinion to use it for eltype.

Also I see very good performance using it on some micro-benchmarks:

julia> using BenchmarkTools

julia> iter = (x for x in 1:10)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:10)

julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.061 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.197 ns ± 0.056 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂        █▆       ▄▃                 ▁         ▃▂       ▁ ▁
  █▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█ █
  1.18 ns     Histogram: log(frequency) by time     1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.843 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.202 ns ± 0.085 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                 
  ▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
  1.18 ns        Histogram: frequency by time       1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> iter = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)

julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  22.753 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.192 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.203 ns ±  0.218 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                  
  ▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
  1.18 ns        Histogram: frequency by time        1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.182 ns  6.622 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.193 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.209 ns ± 0.073 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           █                                                 
  ▂▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▇▄▁▁▁▁▁▁▁▁▅ ▂
  1.18 ns        Histogram: frequency by time       1.24 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

from julia.

Tortar avatar Tortar commented on May 18, 2024

A thing I don't understand is why in Base.@default_eltype (https://github.com/JuliaLang/julia/blob/master/base/array.jl#L744) it is used promote_typejoin_union(T), I would prefer to infer a Union of concrete types than an abstract type e.g.

julia> s = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#5#6"}(var"#5#6"(), 1:10)

julia> T = Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(s)})
Union{Float64, Int64}

julia> Base.promote_typejoin_union(T)
Real

Why can't we use Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(s)}) at least when the Union is not too big? Notice that this affects also collect:

julia> collect(s)
10-element Vector{Real}:
...

from julia.

vtjnash avatar vtjnash commented on May 18, 2024

collect does not return a Union, as it gets the types from the real values and not from inference

from julia.

Tortar avatar Tortar commented on May 18, 2024

right, now I see, it widens the type. Nonetheless I guess with an infer_eltype function which returns also Unions one can probably take advantage of this with collect(infer_eltype(itr), itr)

from julia.

Tortar avatar Tortar commented on May 18, 2024

@vtjnash this is a bit derailing the conversation, but I'd like to ask you: is it a bad idea to return a Union (of not too many types)? e.g. I was able to return a Vector{Union{Int, Float64}} in the above case by changing https://github.com/JuliaLang/julia/blob/master/base/array.jl#L822 in new = similar(dest, Union{T, typeof(el)}).

from julia.

vtjnash avatar vtjnash commented on May 18, 2024

Yes. That will never be the type of the object when it actually contains data, so it slows down the case of when there is data to benefit the case where there is not data

from julia.

vtjnash avatar vtjnash commented on May 18, 2024

You typically want the static and dynamic types to nearly coincide, for optimal results

from julia.

aplavin avatar aplavin commented on May 18, 2024

We might want to special case Union{} because sometimes type inference can infer that the itr is empty, in which case it tells us Union{} which is true and precise and not usually what folks are looking for

It would be really unfortunate, especially if this avoidance of Union{} is motivated solely by current compiler limitations (and not language design concerns).

Do you suggest turning Union{} into Any, or something else? There's just no other type that would retain the nice invariant that the (inferred) eltype of concatenation of A + B is a supertype of A and B eltypes.

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024

It would be really unfortunate, especially if this avoidance of Union{} is motivated solely by current compiler limitations (and not language design concerns).

Inference is tasked with finding the narrowest type it can come up with that is a supertype of all types of instances that may actually occur. Coming up with a supertype of all types that could actually occur is a correctness thing: it has to get that right every time or we have compiler bugs. Picking a narrow type is on a best-effort basis. Returns(Any) is a perfectly valid type inference algorithm for Julia, just not a particularly performant one. Consequently a, when it is possible to prove that an iterator is empty, a really good type inference algorithm will produce Union{}, which is the narrowest type. Union{} is not a supertype of any concrete type, but nevertheless it is (vacuously) a supertype of all types of instances that may actually occur in the (empty) iterator. This is all at the language level. We happen to be running into this problem now as type inference is good enough to make these emptiness proofs. In the past some folks (including Base) have at times relied on compiler limitations by expecting type inference not to return Union{} here.

Do you suggest turning Union{} into Any, or something else? There's just no other type that would retain the nice invariant that the (inferred) eltype of concatenation of A + B is a supertype of A and B eltypes.

We cannot guarantee that in any case so long as we rely on inference because inference may decide to provide an arbitrarily wide eltype for A and/or B. Even turning Union{} into Any would not help. For example if A infers to Union{} and B infers to Int, it is quite plausible for their concatenation to infer to Int, which is not a supertype of Any.

from julia.

Tortar avatar Tortar commented on May 18, 2024

if we want to return the narrowest type possible would then be fine to use (a macro version of) Core.Compiler.return_type instead of Base.@default_eltype(itr) so that to infer Union types in some cases instead of abstract ones?

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024

And this seems to be the actual issue.
It's just really weird to make the inferred type artificially wider, as if the inference was less precise.

Which raises the question: when would someone use this infer_eltype function?

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024

Yeah, for example

function _DisjointSet(xs, ::Base.EltypeUnknown)
    T = Base.@default_eltype(xs)
    (isconcretetype(T) || T === Union{}) || return Base.grow_to!(DisjointSet{T}(), xs)
    return DisjointSet{T}(xs)
end

to

function _DisjointSet(xs, ::Base.EltypeUnknown)
    T = infer_eltype(xs)
    isconcretetype(T) || return Base.grow_to!(DisjointSet{T}(), xs
    return DisjointSet{T}(xs)
end

has one semantic change: when inference can infer that the iterator is empty you get at DisjointSet{eltype(iter)}() instead of a DisjointSet{Union{}}().
(source)

from julia.

Tortar avatar Tortar commented on May 18, 2024

I could try to make a PR but I'm still unsure about a thing, consider:

julia> struct A end

julia> struct B end

julia> itr = (x < 5 ? A() : B() for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)

julia> Base.@default_eltype(itr)
Any

julia> Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(itr)})
Union{A, B}

I would really prefer the second one, so I think a possible infer_eltype function should not use Base.@default_eltype but a narrower version of it (without promote_typejoin_union at the end), is that okay?

from julia.

Tortar avatar Tortar commented on May 18, 2024

ok, maybe we could actually have two versions, with a tight/narrow = false argument so that the user can choose

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024

Sure, IIUC that promote_typejoin_union dates back to when the compiler was bad at handling Unions performantly, so it would be reasonable to remove. No need for an option, the user can always call promote_typejoin_union on the result. It's also possible to continue this discussion once you open a PR (most PRs have a bit of discussion and revision before merging).

from julia.

LilithHafner avatar LilithHafner commented on May 18, 2024

The PR that changed @default_eltype was #42046.

collect has been returning abstract types instead of unions since at least 1.0.

from julia.

vtjnash avatar vtjnash commented on May 18, 2024

Yes, looks like the main explanation of that PR is in #30485 though

from julia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.