juliastring / strs.jl Goto Github PK
View Code? Open in Web Editor NEWString support package for Julia
License: Other
String support package for Julia
License: Other
As in the description. The reasons that it fails are different.
Currently join
in base uses sprint
. We should have a special method for UniStr
for it.
Similar for *
which uses string
.
As follows:
julia> using Strs
julia> using Compat
WARNING: Method definition in(Any) in module Strs overwritten in module Compat
WARNING: Method definition ==(Any) in module Strs overwritten in module Compat
WARNING: Method definition contains(AbstractString, Base.Regex) in module Strs overwritten in module Compat
WARNING: Method definition isequal(Any) in module Strsoverwritten in module Compat
Do you have a clear idea how you want to implement caching of hashes?
I see two options (both have pros and cons - which we could discuss, but maybe there is a third way):
cached
and hash
a mutable container (there are several options for this - probably the simplest is a mutable struct
)Currently Str(substring)
falls back to a convert
method which fails.
Additionally currently recommended design in Julia (AFAIK) is to define constructors and make convert
methods call them - not the other way around (but probably this is not so crucial).
Currently hash
for custom strings falls back to hash
of String
which is painfully slow.
This is a crucial thing to fix if we want dictionaries or sets work fast with Str
-family.
Of course we should ensure that we return the same hash values (which might be hard - I have not thought about it in detail).
Current master fails to load on Julia 0.6.2 due to problem in line 20 of src/compat.jl
(missing Base.SamplerType
).
Currently on 0.7 and 0.6.2 the following fail to print:
UniStr(string(Char(0xb5)))
and
UniStr(string(Char(0x00010000)))
(for different reasons)
The reason is that contains(::Str, ::Str)
method is not defined.
Here is an example of problematic behavior:
julia> x = Str("12")
"12"
julia> c
1
julia> hash(c)
ERROR: UndefVarError: hash_uint64 not defined
Stacktrace:
[1] hash(::Strs.ASCIIChr) at .\hashing.jl:5
as it calls hash_uint64
which is not imported or qualified with Base.
(and the method is called with two arguments which is invalid in ).
Whatever we do I think that hash should return the same hash as for corresponding Char
.
python f-string debug mode:
hello = 5
print(f"{hello=}")
hello=5
Currently direct conversions between Char
and Strs.CodePoint
concrete types seem not to be supported both ways (you have to go through UInt32
in the middle). Is this intentional?
I understand why Str
needs the internal fields like cache
etc. (100% support ๐) but I do not understand why their types have to be in a signature and not be fixed - what is the value of this flexibility?
In short why the signature struct Str{T} <: AbstractString
is not enough for this type?
`
I like UniStr
type. I have small performance issues with it.
First is benchmarking:
Union{UniStr, Missing}
and Union{UniStr, Nothing}
The second is broadcasting and mapping. The compiler does not properly detect the required Union-type:
julia> UniStr.(["a","ฤ
","โ"])
3-element Array{Strs.Str{T,Void,Void,Void} where T,1}:
"a"
"ฤ
"
"โ"
julia> map(UniStr, ["a","ฤ
","โ"])
3-element Array{Strs.Str{T,Void,Void,Void} where T,1}:
"a"
"ฤ
"
"โ"
julia> UniStr[UniStr(s) for s in ["a","ฤ
","โ"]]
3-element Array{UniStr,1}:
"a"
"ฤ
"
"โ"
Any thoughts of fixing it (maybe some promotion rules should be added). Additionally such promotion rules should take Missing
into account (but maybe this will be handled automatically on 0.7).
Interestingly Set
works correctly:
julia> Set{UniStr}(UniStr[UniStr(s) for s in ["a","ฤ
","โ"]])
Set(Union{ASCIIStr, _LatinStr, _UCS2Str, _UTF32Str}["ฤ
", "a", "โ"])
julia> Set{UniStr}(UniStr.(["a","ฤ
","โ"]))
Set(Union{ASCIIStr, _LatinStr, _UCS2Str, _UTF32Str}["ฤ
", "a", "โ"])
(although the type signature is lost)
Here is the problem:
julia> x = ["1", "โ"]
2-element Array{String,1}:
"1"
"โ"
julia> Str.(x)
ERROR: InexactError()
Stacktrace:
[1] & at .\promotion.jl:286 [inlined]
[2] _str_encode(::_UCS2Str, ::Int64, ::UInt64) at D:\DEV\Julia\Strs.jl\src\encode.jl:90
[3] convert(::Type{Strs.Str{T,Void,Void,Void} where T}, ::_UCS2Str) at D:\DEV\Julia\Strs.jl\src\encode.jl:103
[4] setindex!(::Array{Strs.Str{T,Void,Void,Void} where T,1}, ::_UCS2Str, ::Int64) at .\array.jl:583
[5] setindex! at .\multidimensional.jl:300 [inlined]
[6] macro expansion at .\broadcast.jl:243 [inlined]
[7] _broadcast!(::Type{Strs.Str}, ::Array{ASCIIStr,1}, ::Tuple{Tuple{Bool}}, ::Tuple{Tuple{Int64}}, ::Tuple{Array{String,1}}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}, ::CartesianIndex{1}, ::Int64) at .\broadcast.jl:219
[8] broadcast_t(::Type{T} where T, ::Type{Any}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::Array{String,1}) at .\broadcast.jl:265
[9] broadcast_c at .\broadcast.jl:321 [inlined]
[10] broadcast(::Type{T} where T, ::Array{String,1}) at .\broadcast.jl:455
The issue is that we are mixing ASCIIString
(which is inferred from the first element of array by broadcast
) with the following _UCS2Str
.
Example:
julia> vcat(Str("1"), Str("ฤ
"))
ERROR: TypeError: setindex!: in typeassert, expected String, got Strs.Str{Strs.CSE{CharSet{UniPlus},Encoding{UTF8}()},Void,Void,Void}
Stacktrace:
[1] setindex!(::Array{String,1}, ::_UCS2Str, ::UnitRange{Int64}) at .\array.jl:591
[2] _cat(::Array{String,1}, ::Tuple{Int64}, ::Tuple{Bool}, ::ASCIIStr, ::Vararg{Any,N} where N) at .\abstractarray.jl:1225
[3] cat_t(::Type{T} where T, ::Type{T} where T, ::ASCIIStr, ::Vararg{Any,N} where N) at .\abstractarray.jl:1208
[4] vcat(::ASCIIStr, ::_UCS2Str) at .\abstractarray.jl:1260
The reason is missing promotion rule:
julia> promote_rule(typeof(Str("1")), typeof(Str("ฤ
")))
String
Hello!
This seems like a great collection of utilities to fill the gap in the standard library, but I can't figure out how you intended for it to be used? Is Strs.jl
the top level package to add, or is StrAPI.jl
the one to use? Is there any module level documentation? Poking around on juliahub I wasn't seeing anything in the Strs or StrAPI packages. Are there any examples anywhere?
Also, the website link just redirects to the Github org page: http://juliastring.org/.
Lastly, I see this roadmap from 2018: #97, with no updates. Is this set of packages still under active development or in maintenance only mode? (In other words, do you want big PRs that change APIs, or should these be forked?)
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Currently test/bench.jl relies on a concrete user machine configuration, e.g.:
const userdir = "/Users/scott/"
It would be good to make it runnable without modifying sources in the future.
Where is the document? Why don`t put the access link on readme?
A small issue when mixing UniStr
and String
:
julia> y = [Strs.UniStr("ฤ
ฤล")]
1-element Array{_UCS2Str,1}:
"ฤ
ฤล"
julia> x = ["ael"]
1-element Array{String,1}:
"ael"
julia> [x y]
ERROR: TypeError: in copyto!, in typeassert, expected String, got UTF8Str
Stacktrace:
[1] setindex! at .\array.jl:688 [inlined]
[2] copyto!(::Array{String,2}, ::Int64, ::Array{_UCS2Str,1}, ::Int64, ::Int64) at .\abstractarray.jl:729
[3] typed_hcat(::Type{String}, ::Array{String,1}, ::Array{_UCS2Str,1}) at .\abstractarray.jl:1162
[4] hcat(::Array{String,1}, ::Array{_UCS2Str,1}) at \sparsevector.jl:1046
[5] top-level scope
ASCIIStr
and _UCS2Str
- this is in general possible to get it) they compare using ===
as false
. This is probably what we want (though normally identical strings compare as true
under Julia 0.7 even if they have a different memory location), but I just want to make sure that this is intended.If you run a benchmark at https://github.com/bkamins/JuliaStrBenchmark
You see that join
fails to run. The reason is that lines 411 and 413 have a typo in io.jl
(d
instead of delim
).
But additionally - if you fix it - at least on my machine they still fail because wmemcpy
function is not found (but maybe I do not have a proper version of Julia as it is a target that is moving fast).
PS. @ScottPJones Apart from this - again on my machine - Strs.jl is sometimes slower than strings from Base (I have noted in the test file where I find which case). This is an initial implementation of the benchmark - I have stopped here because of join
bug.
Working with regexes on UniStr
will be slow as currently all has to be converted to String
to work (regex as well as the string in which we look for it).
This probably will be a problem when dynamically constructing UniStr
:
julia> @code_warntype Strs.UniStr("1")
Variables:
str::String
Body:
begin
# meta: location D:\DEV\Julia\Strs.jl\src\encode.jl convert 82
Core.SSAValue(1) = $(Expr(:invoke, MethodInstance for _str(::String), :(Main.Strs._str), :(str)))::Any
# meta: pop location
return Core.SSAValue(1)
end::Any
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.