I am using libzip5 through PHP's zip PECL extension, to create a zip file collecting h

Allow adding files being appended to by another process.,about nih-at/libzip

Comments (21)

dillof commented on June 12, 2024 1

We will fix the bug so zip_source_file doesn't return more data than it reports as its size (i.e. the file size at the time the source was created). Also, we'll add another special value to the length parameter (-2, but we'll add a symbolic name for better readability) that instructs zip_source_file to ignore the file length (i.e. not report it) and read until EOF is reached.

I really don't appreciate your tone of voice and your implications that our point of view is invalid and violating the spirit of Unix that is older than all of us. Like most thing in engineering, this is a question of priorities and tradeoffs, and we obviously judge the factors and their implications on libzip as a whole differently in this case.

In your specific case, it is clear what you want to happen, and I understand that. However, libzip is used in a lot of different situations with a lot of different data sources, and any assumption about what inconsistencies mean will be wrong for some of them. That is why we treat any inconsistency as a potential for data loss. zip_close doesn't know where the data is coming from, and while additional data might be okay when reading from a file, it might indicate data corruption e.g. when read from the network or a compressed stream. Adding a special case to the already very complicated logic of zip_close has its own cost and probably breaks the abstraction sources provide.

Your arguments are quite absolute and confrontational. They don't seem to leave room for other situations or priorities. That makes your bug report much harder to deal with than it has to be. Quite frankly, if it weren't for improving libzip, I would have left it unanswered. libzip is my hobby, and I don't need this kind of stress.

from libzip.

remicollet commented on June 12, 2024 1

If you were using libzip directly (without the intervening PHP layer), you could open the file yourself and pass the FILE pointer to zip_source_fliep(). If libzip opened all files when you added them, you could easily run out of file descriptors on system with a low limit (libzip is also used in embedded systems).

FYI: New zip extension will have a flag (OPEN_FILE_NOW) to use zip_source_filep instead of zip_source_file

from libzip.

dillof commented on June 12, 2024

Where do you get the file data from?

The error means that the source added to the zip archive reports a file size that does not correspond to the amount of data it provides.

from libzip.

shimile commented on June 12, 2024

Where do you get the file data from?

The error means that the source added to the zip archive reports a file size that does not correspond to the amount of data it provides.

So, in other words, libzip is now unable to compress a log file that is being appended to while the zip is being created?

If so, surely there is a flag to revert to the behavior of (probably?) all other archivers in the universe?

from libzip.

dillof commented on June 12, 2024

I could really do without the attitude. libzip is a hobby project we put a lot of time and effort into, both developing and supporting its users.

Ensuring data integrity is a high priority for us. We had another project where files were read incompletely due to a software bug, and this now catches that and makes sure files are not silently truncated.

Your use case was caught as a side effect of it. I'm sure we'll find a solution.

from libzip.

shimile commented on June 12, 2024

Sorry about that.

I do appreciate data integrity, but failing to produce the zip at all is also kind of a data integrity problem. It's actually not my first similar issue with libzip: I also had a hard time trying to add directories that had files removed from them inflight (due to log rotation that by murphy law always happens at the time I do logs collection into ZIP). Apparently 'add file' doesn't really fopen() the file (?), just later on 'close', if the file is removed after add and before close - the zip is also gone with a cryptic 'file not found' (which?). Since I found no solution to this, I had to orchestrate a somewhat crazy solution - each file I want to add to the ZIP, I first create a hardlink to it in an ancestor directory so it can NEVER be removed, and then after zip close I remove all the hardlinks, with the hope I don't have system crash in the middle that leaves orphaned links to old log files.

Really, failure to opening a file should produce a warning and maybe even return an error, but why kill the archive? Likewise for size change. That's what tar does, it outputs something like "file changed as we read it", something like that.

And there should be a flag for people who want to be strict and fail on every issue.

from libzip.

dillof commented on June 12, 2024

The archive isn't killed, it just isn't closed. You can make modifications and try again.

You're right that we should provide more detailed error information so you can recover. In your case, remove the file from the archive and close it again.

If you were using libzip directly (without the intervening PHP layer), you could open the file yourself and pass the FILE pointer to zip_source_fliep(). If libzip opened all files when you added them, you could easily run out of file descriptors on system with a low limit (libzip is also used in embedded systems).

It is hard to find a solution that works well in all cases, and operating on changing files is definitely an edge case, so the defaults may not be best for you. If we find solutions that are generally useful and fit into the design of libzip, we're more than happy to accommodate you.

from libzip.

shimile commented on June 12, 2024

I am sure it is used in embedded systems too, maybe even for the same purpose I am using it (among other things) - to compress log files :)

This can also be a flag that you can enable, I don't think that having a flag that makes the lib easy to use in more use cases (or I even dare say: "out of the box") should conflict with the design of any software. Look at the user comments in https://www.php.net/manual/en/ziparchive.close.php - surely I am not the first to try to do such things and fail...

So, to sum it up, do I understand correctly that at least for the time being, this new behavior is not going to be reverted? That would mean I now have to fork libzip, remove the new code and then start to manually track upstream for changes...

from libzip.

0-wiz-0 commented on June 12, 2024

You don't have to fork libzip.
If you have a source that provides a size but can't be trusted, layer another source on it that passes through all commands and just acts on ZIP_SOURCE_STAT to invalidate the size member.

from libzip.

shimile commented on June 12, 2024

You don't have to fork libzip. If you have a source that provides a size but can't be trusted, layer another source on it that passes through all commands and just acts on ZIP_SOURCE_STAT to invalidate the size member.

I don't think I can do that, as I do not interface with libzip directly. So, that would require me to change and recompile the wrapper; and if I'm already changing and recompiling something, it is a cleaner solution to simply remove the newly introduced bug (from my POV), than re-factoring another code to avoid hitting the new bug...

By the way, I wouldn't call this "provides a size but can't be trusted". I trust the size reported very much - I did not see an evidence that the filesystem is broken. It's just that the new so-called feature assumes that a size of a file cannot change once you opened it. It may be true in Windows(TM), which I think blocks other readers to a file while it is open for a write by another process, but I think[*] it has never been true on *nix, since its' inception, ~50 years ago. As such, code that runs on *nix shouldn't assume that (or at least, should provide an option to not assume that), even if in some other project there was some other bug.

[*] it is older than me, so I don't know for sure, but I don't see why it shouldn't, it's the Windows behavior that doesn't make sense in that regard

from libzip.

0-wiz-0 commented on June 12, 2024

Perhaps the easiest change is to pass a length to zip_source_file()(not -1) when you call it - the length of the file at the time you add it. That will probably fix the problem you're seeing.

from libzip.

shimile commented on June 12, 2024

I actually tried that, as it was suggested by an author in the php wrapper (pierrejoye/php_zip#34 (comment)), however, that did not seem to have any effect. Maybe I missed something, maybe the changing size is not even the issue! who knows... without the error message saying which file caused the failure, I can't even try and isolate it. All I know, is that when I downgrade back, everything starts to work 100%.

Possibly, the code just checks file size before and after the file is opened, and fails if it changed. The amount of bytes asked to be read wouldn't effect the result, if that is the case. Mind you, even if the whole size was automagically calculated by libzip - it should have worked - I am pretty sure my files cannot actually truncate while in motion. So even specifying the real file size wouldn't ever cause a case where you try to read a byte beyond the filesize max value.

from libzip.

dillof commented on June 12, 2024

I looked at the code, and zip_source_file should ignore data that is added to the file after the source has been created. Obviously, it has a bug, I'll look into it.

zip_source_file() is not equivalent to fopen(). Rather, it creates a source that represents the file as it exists at that time.

This has nothing to do with Unix vs. Windows (we're also Unix developers), but with data integrity. I know you don't seem to care if data is lost, but we do. The reason we added this change was that we actually lost data while adding it to a zip archive, it was not just some hypothetical scenario. And not losing data is one of the main design goals of libzip, so this is non-negotiable.

from libzip.

shimile commented on June 12, 2024

I do care for data integrity just like you. I just don't understand how adding more data (or even reading until the filesize at the time of open! so no data is added), means that data is lost. If you would check if the file was truncated, i.e. became smaller than it was comparing to the time it was added, then I would agree more with you (but would still permit this with a flag for whomever may be interested). But we're NOT talking about truncating now.

from libzip.

shimile commented on June 12, 2024

Your arguments are quite absolute and confrontational. They don't seem to leave room for other situations or priorities. That makes your bug report much harder to deal with than it has to be. Quite frankly, if it weren't for improving libzip, I would have left it unanswered. libzip is my hobby, and I don't need this kind of stress.

I think that by mentioning the word 'flag' about 5 times throughout this bug report, implies that I do see other situations, and think about others too; I'm all for choice! That is the point I have been trying to make all along. The change as it has been done so far, leaves no choice.

I understand and appreciate that libzip is your hobby - and I know how it is, I have my own side-projects as well! However it is still open source (thanks for that!) and free to use even commercially... that means that people rely on your code - of course you don't owe us nothing - but if you're like me, and I think you are, I'm sure you'ld want your code to be used by as many people as possible.

Again I'll repeat: I DO NOT think your point of view is invalid! I see your use case too! My only claim is that your use case is definitely not the only use case, which is why I gave decades-old archivers (such as tar[*]) as example.

[*] Case in point: When tar encounters a file that changed 'while it is being read', it emits: 'error: file changed as we read it' - and sets a non-zero exit code, to indicate there was an issue in backup. However, some people may be OK with that (i.e. if they back up logs) - and that's why tar has a flag: --ignore-failed-read, to make this a non-error. And what I was asking in my very first comment, was something similar... because zipping live logs is, I think, also a quite common use-case. True, tar-gzing or tar-xzing them is probably MORE common, but some people have to generate archives better-suited for Windows people, and libzip, being quite powerful, will sometimes be used for that, too.

from libzip.

dillof commented on June 12, 2024

Again I'll repeat: I DO NOT think your point of view is invalid! I see your use case too! My only claim is that your use case is definitely not the only use case, which is why I gave decades-old archivers (such as tar[*]) as example.

Well, you certainly seemed to imply it (emphasis added):

If so, surely there is a flag to revert to the behavior of (probably?) all other archivers in the universe?

the new so-called feature

makes the lib easy to use in more use cases (or I even dare say: "out of the box") should conflict with the design of any software

You're clearly angry about this change (which I get) and are letting this come through quite clearly, which I really don't appreciate. While I don't agree with all of them, I have no problem with the factual points you raise. My problem is with how your words come across: aggressive and hostile.

I'm not implying that's how you meant them, but it is how they make me feel when reading them. If you did not mean them that way, please take a moment to step back, read them again, and see how they might seem that way, so you might avoid this issue in the future.

from libzip.

shimile commented on June 12, 2024

Sorry that this how you received it, it wasn't my intention.

Finally, may I suggest that this new behavior would be documented in the NEWS file (which I checked before opening the bug). It would have saved me debugging time, and may save debugging time for others in the future.

from libzip.

dillof commented on June 12, 2024

Finally, may I suggest that this new behavior would be documented in the NEWS file (which I checked before opening the bug). It would have saved me debugging time, and may save debugging time for others in the future.

Good point. We didn't foresee this impact of the change, otherwise we would have mentioned it. We'll add an entry in the next release, together with the new special length value.

from libzip.

dillof commented on June 12, 2024

I've now implemented the special value ZIP_LENGTH_UNCHECKED (-2) for length. If it is passed to zip_source_file(), the file is treated as if its size is unknown. Please let me know if this solves your issue.

However, if the file grows between creating the source and reading the file, the additional data should be ignored and no error should be raised. I don't quite understand why you ran into this issue, unless a file was truncated.

from libzip.

remicollet commented on June 12, 2024

However, if the file grows between creating the source and reading the file, the additional data should be ignored and no error should be raised. I don't quite understand why you ran into this issue, unless a file was truncated.

Notice that I was unable to reproduce this issue, see test in pierrejoye/php_zip#34 (comment) so I suspect something else...

from libzip.

remicollet commented on June 12, 2024

For your information, an easy way to reproduce is adding files from /proc on Linux

I've added a new test in the pecl extension for this to detect any regression
pierrejoye/php_zip@78cdb65

And indeed, without the LENGTH_UNCHECKED option it fails with "Unexpected length of data"

(perhaps may be also test here, but I'm not familiar with your test system)

from libzip.

Allow adding files being appended to by another process. about libzip HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent