Comments (8)
No, it just means that there is much more work involved to make this work. SzArEx_Extract and probably other parts of 7zDec.c need to be rewritten to support decompressing directly to a buffer and keeping the decompression state and dictionary instead of decompressing everything into a private buffer and allowing access to that.
The question is – is it worth it? It actually might be a better idea to rebuild the parsing code from scratch in C99 and only use the low-level parts of the 7z SDK. That way we have better control over what is happening and it is easier to implement full support for 7z archives and not only the limited the SDK provides.
from unarr.
Oh shoot, after looking into it a bit more, it seems not even the LZMA SDK supports archive entries which size exceed 4294967295 bytes, when compiled for 32bits (at least not the C version of the 7z SDK)
from unarr.
The 7z SDK decompresses all data into memory before returning it. This will of course fail on systems that can't address that much memory.
from unarr.
The problem with size_t and huge file entries is that size_t is the maximum chunk of memory your system can address. This means even if you'd change the uncompress function to always use a 64 bit buffer this would fail on a 32 bit system for the simple reason that it is too huge to address. There is a simple way around this, though - use a smaller buffer and call uncompress repeatedly to decompress the data in chunks that you write to disk.
This still leaves the problem with the entry size. Size_t in this context is mainly used because it indicates a size. If the data type prevents us from getting the true size this is a bug and it should be fixed. The issue I see with this is that changing the return type will require digging deeper into the respective archive implementations and we need to take care to only change size_t to 64 bit in the correct contexts.
I also need to consider how to handle the (minor) API breakage this might incur for 32 bit users.
from unarr.
The 7z SDK has this piece of code in the SzArEx_Extract
function:
if (*tempBuf == NULL || *blockIndex != folderIndex)
{
UInt64 unpackSizeSpec = SzAr_GetFolderUnpackSize(&p->db, folderIndex);
/*
UInt64 unpackSizeSpec =
p->UnpackPositions[p->FolderToFile[(size_t)folderIndex + 1]] -
p->UnpackPositions[p->FolderToFile[folderIndex]];
*/
size_t unpackSize = (size_t)unpackSizeSpec;
if (unpackSize != unpackSizeSpec)
return SZ_ERROR_MEM;
This is clearly designed to fail on 32bit systems, I'm guessing they wanted to fail early here because the implementation also has problems deeper down...
Regarding the problem you mention that addressing such a large piece of memory is impossible, that is true, but the way I work around that is by using memory mapping, where I map the largest possible free chunk of memory, unpack that much data into the chunk, unmap that chunk, map the next chunk, unpack, and so on.
from unarr.
Don't bother too much with the 7z SDK. This memory limitation is the main reason I have marked 7z support as experimental. The underlying decompression code should be able to handle large files just fine, but the C code for archive insists on decompressing huge blocks into a memory cache instead of returning the files as they are decompressed. Fixing this would need a partial rewrite of the SDK.
from unarr.
I see, does it mean that extraction of large files (>4GB) with 7z will never be supported on 32bit systems?
from unarr.
Hey, just a quick heads up. I recently rechecked this issue to see if I could include a fix into the upcoming unarr release, but the problem goes deeper than just 7z SDK and the API signature. The bad pattern of using size_t for filesizes is present in a lot of the internal code and structures. Working on this without having proper unit tests to catch regressions is asking for trouble.
I will check if I can improve the situation in the next development cycle. By then I should have a proper test system set up.
from unarr.
Related Issues (16)
- 1.0.1: version problems. HOT 5
- Cannot unarchive 7z files HOT 6
- Missing `raw` parameter in `x_get_name` calls HOT 1
- QtWebApp HOT 1
- Please release a new version so a version with 7zip support can get packaged HOT 6
- 1.0.1: no ctest test units HOT 6
- Path traversal vulnerability HOT 2
- pkg-config file is broken when CMAKE_INSTALL_{INCLUDE,LIB}DIR is absolute
- When un7zip size of 100GB.7z and 10GB.7z files report an error :'unarr: No valid RAR, ZIP, 7Z or TAR archive' HOT 3
- add new release to fix includedir and libdir in libunarr.pc HOT 3
- RAR 5.0 support needed HOT 18
- more clear API documentation for ar_parse_entry failed situation ? HOT 1
- memory leak if 7z is not valid HOT 1
- allocate very big memory for some invalid input HOT 5
- encoding problems in zip files HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unarr.