Comments (7)
The limitation is based on Int32
being the type to represent the size of a collection (String
in this case), so it can only have Int32::MAX
elements.
Now I'm wondering if it really makes sense to actually have such a large String
instance. It might be quite inconvenient because char indices are non-linear. So that specific detail might be up for debate.
But other collections, particularly Slice
, should certainly be able to hold more than 2GB.
@koute Do you have a real use case where you need this or is it just a theoretical discussion?
As a workaround, you can open the file and read it in chunks, performing operations on each individual chunk of data.
Related: #8111 (comment), https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2
from crystal.
Also related is #4011. It was originally about getting no error when accessing an out-of-range array index (which is fixed). Now it's only about improving the error message to point out the reason for overflow. I suppose this would be an improvement for this use case as well.
from crystal.
@koute Do you have a real use case where you need this or is it just a theoretical discussion?
Yes, this is a real use case.
I was looking to switch from Ruby to Crystal when writing my scripts to speed them up, and this was the very first issue I've encountered, which really surprised me (sure, I expected Crystal to not have automatic bigint promotion like Ruby has, but I didn't expect it to use 32-bit integers for something like this, which seems baffling to me considering how tiny 2GB is nowadays).
I use File.read
in Ruby on big files all the time to process them, especially in my quick & dirty scripts. I know this can be worked around by e.g. reading the file in chunks, and if this was not a quick & dirty script I would certainly do that, but for quick one-off scripts I just want to minimize the friction while writing them.
from crystal.
Not sure about Ruby's situation, but contiguous allocations in that size range will perform poorly with the Boehm GC, especially on Windows (contrast with #14395), so I don't think the standard library is going to accommodate them in the near future.
Memory-mapped I/O is a notable exception where you could have huge contiguous memory ranges without any allocation. To create a read-only view on Windows:
lib LibC
PAGE_READONLY = 0x02
FILE_MAP_READ = 0x0004
fun CreateFileMappingA(hFile : HANDLE, lpFileMappingAttributes : SECURITY_ATTRIBUTES*, flProtect : DWORD, dwMaximumSizeHigh : DWORD, dwMaximumSizeLow : DWORD, lpName : LPSTR) : HANDLE
fun MapViewOfFile(hFileMappingObject : HANDLE, dwDesiredAccess : DWORD, dwFileOffsetHigh : DWORD, dwFileOffsetLow : DWORD, dwNumberOfBytesToMap : SizeT) : Void*
fun UnmapViewOfFile(lpBaseAddress : Void*) : BOOL
end
File.open(...) do |file|
handle = Crystal::System::FileDescriptor.windows_handle(file.fd)
size = file.size
mapping = LibC.CreateFileMappingA(handle, nil, LibC::PAGE_READONLY,
LibC::DWORD.new!(size >> 32), LibC::DWORD.new!(size), nil)
view = LibC.MapViewOfFile(mapping, LibC::FILE_MAP_READ, 0, 0, 0).as(UInt8*)
# this should be okay even if `size > Int32::MAX`
# bytes = Bytes.new(view, size, read_only: true)
# io = IO::Memory.new(bytes, writeable: false)
LibC.UnmapViewOfFile(view)
LibC.CloseHandle(mapping)
end
or on Unix-like systems:
File.open(...) do |file|
size = file.size
view = LibC.mmap(nil, size, LibC::PROT_READ, LibC::MAP_PRIVATE, file.fd, 0).as(UInt8*)
# this should be okay even if `size > Int32::MAX`
# bytes = Bytes.new(view, size, read_only: true)
# io = IO::Memory.new(bytes, writeable: false)
LibC.munmap(view, size)
end
If Slice
does support 64-bit sizes, then bytes
could probably act as a drop-in replacement for File.read
or, more precisely, File.open(..., &.getb_to_end)
. io
would also work as long as you don't need any IO::FileDescriptor
-specific functionality.
from crystal.
@HertzDevil Nice! I've been wondering about mmap for this use case, and this looks exciting.
from crystal.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/1
from crystal.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/built-in-support-for-mmap/6772/2
from crystal.
Related Issues (20)
- Nilable `Proc` types inside libs
- Cannot return `Proc`s from top-level funs
- `ReferenceStorage(T)` is always atomic even when `T` isn't HOT 1
- Add `crystal tool method_types` for listing method parameter types HOT 4
- Passing nil to Addrinfo.getaddrinfo gives unexpected error message HOT 1
- Package installation fails on Windows due to missing SQLite3 .lib files HOT 2
- `File#truncate` raises `File::AccessDeniedError` on Windows when file was opened in append mode HOT 3
- Cache compiler results for tools
- Include more types in `crystal tool hierarchy` HOT 9
- `close_on_exec` on Windows HOT 2
- Pointer equality for `Slice` HOT 4
- Forbid variable assignment in function call HOT 4
- Captured block parameter not recognised when used inside macro HOT 2
- Internal error when using `sizeof` as generic type argument in inferred ivar type
- ECR escape sequences do not work with `-`
- Customizing or hiding `Benchmark.ips`'s output format HOT 3
- Adding a Difference method to the Math module HOT 2
- Visit the Time.local in the macro. HOT 3
- Add Makefile support `--mcpu=native` as override FLAGS to permit build crystal compiler can enable this option optional for a better performance. HOT 4
- Compiler should Emit Warning/Notes when Deduced Type Differs from Annotated Type. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crystal.