Git Product home page Git Product logo

Comments (56)

rejetto avatar rejetto commented on July 4, 2024 1

some overhead, yes, but not necessarily a problem.
i already programmed it but i'm still undecided how to "bundle" it. It would help to have more insight on the need.
I'm surprised you are interested in detecting a corruption during an http upload, as for what i know, it's extremely unlikely to happen, and HFS is not subject to interrupted uploads since it sets the final filename only after the end.
What do you think about it?

from hfs.

made1990 avatar made1990 commented on July 4, 2024 1

I can confirm, with alpha-6 the code for method 1 is working :)
File written correctly, md5 returned correctly.

I'll do some further testing tomorrow (different file sizes, etc. )

Thx for the great work so far. much appreciated.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

no such feature yet, but i guess i'll make a plugin soon.
to design its features, i'd need you to think if you just need API or also GUI.
What do you use these checksums for?

from hfs.

rejetto avatar rejetto commented on July 4, 2024

for the api part, I made some research, and I could add this header in for PUT, POST, GET, HEAD
Digest: md5=...

one could use HEAD to get the md5 without downloading.
Does it sound good?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

for the api part, I made some research, and I could add this header in for PUT, POST, GET, HEAD Digest: md5=...

one could use HEAD to get the md5 without downloading. Does it sound good?

That sounds perfect.
I only need it via API (PUT, GET), not for GUI
I would need it to verify if uploaded files are 100% identical against original file.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

would you say you need md5 only for the files you upload?
because i'm realizing that to always provide md5 for files for which it was not calculated before seems needlessly heavy.
I would also offer it in case you append ?get=md5 to any file's url.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

would you say you need md5 only for the files you upload? because i'm realizing that to always provide md5 for files for which it was not calculated before seems needlessly heavy. I would also offer it in case you append ?get=md5 to any file's url.

Both cases would be great, but md5 only for newly uploaded files would be enough if is easier.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

are you willing to use the value just as the upload finishes, or also later?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

are you willing to use the value just as the upload finishes, or also later?

As the upload finishes is enough.
I guess otherwise it will be some overhead to save the information somewhere , am i right?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

I agree that hfs and the http protocol itself bring some functionallity to prevent faulty uploads or file corruption. Still network can be interrupted or similiar.
I am using HFS in a semi-professional environment and my users ask for a way to ensure integrity of the files that are uploaded so the idea came up if the md5 can be returned after upload finished to compare with original md5.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i'm willing to offer the functionality, but as i told you, an interrupted upload in HFS will have the word $upload in the name, so you cannot be mistaken

from hfs.

rejetto avatar rejetto commented on July 4, 2024

while i still have to decide how to introduce md5 in HFS,
i made it possible for a simple script to do it.
The script uses new things that i'm about to publish, to read the incoming stream, so that you don't need to re-read the file from the disk after the upload is finished, especially good if the file is big.
I also wrote another script that does the re-reading instead,
and published all in the documentation, as an example
https://github.com/rejetto/hfs/wiki/Middlewares#calculate-md5-on-uploads

If you are willing to test it, I can give you a preview version, but I need to know if you will run hfs with npx or what operating system.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Sounds good, a test version would be great.
I am running the npm version on Windows (it runs as a service on windows)

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i decided to publish the version in the meantime.
the version you need is 0.53.0-alpha5.
with npm or npx you need to specify hfs@beta instead of just hfs, to get it.

so, you used these instructions to set up your service?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

so, you used these instructions to set up your service?

Correct

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i'd like to find an "npx" way of making a service on windows, similarly to linux, so to make update easier (just by restarting the service).
and don't forget to give me a feedback on the md5.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Yeah, the update process for the windows service version of HFS is a bit inconvienient, but still doable.

Sure, will do testing of the md5 thing next week when I am back at the system :)
Just to make sure, the code you documented under https://github.com/rejetto/hfs/wiki/Middlewares#calculate-md5-on-uploads needs to be added to the server code part of Options in the Admin gut, right? And that should do the trick?

from hfs.

rejetto avatar rejetto commented on July 4, 2024

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Do I need to add something to my PUT command to get the md5 in return?

from hfs.

rejetto avatar rejetto commented on July 4, 2024

nope

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Hm. It simply givs me an empty bracket es return: {}
My command is:
curl -X PUT https://my-url.com/myfolder/file1.txt -H "Authorization: Basic XXXXX* -d "Content of file"

from hfs.

rejetto avatar rejetto commented on July 4, 2024

you are looking at the body, while the md5 is in a header

from hfs.

made1990 avatar made1990 commented on July 4, 2024

When I use Method 1: calculate by reading file after it has been written the file is written correctly, but an error is returned:
curl: (56) Failure when receiving data from the peer

When I use Method 2: processing incoming stream then the file is not even written correctly. It remains in the status with the hfs$upload prefix.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

what hfs version are you using?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

0.53.0_alpha5

from hfs.

rejetto avatar rejetto commented on July 4, 2024

ok let me check

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i just tried with your command, and got this using alpha5 and method 1.
i'm not sure what's different on your side.
image

from hfs.

rejetto avatar rejetto commented on July 4, 2024

do you get the same error WITHOUT the server code?

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Without the server code , everything is working normally.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

image

from hfs.

rejetto avatar rejetto commented on July 4, 2024
  1. does it break only the upload and the rest is working?
  2. did you copy the script without any change?
  3. are you accessing hfs directly or through a proxy?
  4. i see port 443. Does that happen with simple http?
  5. is there anything interesting in hfs console?
  6. does the request appear in the log?

from hfs.

made1990 avatar made1990 commented on July 4, 2024
  1. GET is working normally
  2. yes, no changes to the script
  3. no proxy
  4. error message with http is a bit different: Empty reply from server
  5. yes .. some errors
    image
    => But I am getting the same messages on the console without the server code
  6. neither in access nor error log

from hfs.

rejetto avatar rejetto commented on July 4, 2024

I take we are doing all these tests with "method 1". It may be confusing to mix results.

  1. i'm realizing i don't have a fallback mechanism for metadata on FAT volumes. All my tests on Windows were done on NTFS. Do you confirm that is a FAT file system? Anyway, this is not a fatal problem, and I will take care of it asap.
    Back to the main topic, there are no extra errors caused by the server code, and yet the request is abnormally interrupted. Weird.

Please tell me about the system you are running on, what Windows version, what about the drive, are you in a virtual environement, anything peculiar you can think of.

I'm going to make a test on a Windows machine now.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

My test of method1 on Windows 11 was successful. The file was written, and I got the 200 reply with "{}" in the body and the X-MD5 header. I'm not sure if to be glad or sad.

You can run hfs with "--dev" parameter. That will add a lot of more info in console. See if there's anything printed with the request. It's worth a shot.

And... I'd rather do it myself but I don't have access to your server. What I'd do is to gradually remove lines from the middleware block until the problem disappears, and then I'd know that the last line I removed is related to the problem.
So first I would remove these lines, and test

            return new Promise(res => {
                f.once('end', () => {
                    ctx.set({ 'X-MD5': hasher.digest('hex') })
                    res()
                })
            })

And then remove this, and test again.

            const hasher = createHash('md5')
            f = createReadStream(f)
            f.on('data', x => hasher.update(x))

I expect one of these blocks to be the problem.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Yes - I am trying method 1

Its an NTFS Filesystem on Windows Server 2016. Its a physical server, but in fact its a virtual filesystem. An application is running on Windows which virtualizes a NTFS filesystem (CBFS) that HFS is writing to.

If I remove the last line of the code, its already solving the issue, but of course then md5 is not returned.

return new Promise(res => {
                f.once('end', () => {
                    ctx.set({ 'X-MD5': hasher.digest('hex') })
                    res()
                })
            })

It is strange still ,because the file is successfully written, i can see it on the filesystem and can open it.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

The code you removed is not needed for the upload, just for the md5, so it's not strange that once the problem is removed the upload still works. Your feedback was helpful anyway.

your cbfs is not supporting ntfs' "alternate streams" feature.
that's preventing it to save the information about who uploaded the file, no big deal.

it is possible that your cbfs is doing something funny with the md5 code too, as it may explain the difference between my Windows and yours. So that I try to read the file and i fail, for some reason.
I guess that we are getting an error, but that's not handled by the code above.
See what happens with this variation

            return new Promise((resolve, reject) => {
                f.once('end', () => {
                    ctx.set({ 'X-MD5': hasher.digest('hex') })
                    resolve()
                }).on('error', reject)
            })

here i'm both printing the error and ensuring to continue serving the request.
In case of error you won't get md5, but the request will work AND we can see try to better understand the error.

from hfs.

made1990 avatar made1990 commented on July 4, 2024
            return new Promise((resolve, reject) => {
                f.once('end', () => {
                    ctx.set({ 'X-MD5': hasher.digest('hex') })
                    resolve()
                }).on('error', reject)
            })

HTTP Code 200 returned and file written successfully, but without md5 return.
Console output: error middleware plugin ENOENT: no such file or directory

from hfs.

rejetto avatar rejetto commented on July 4, 2024

thanks for your feedback!
ok, i think i've got what's going on here.
timings are different and while on my system the file has already its final name, it's still with temporary name on yours.
i will now see how to solve this.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

ok, it's not a problem in the script.
it's a bug in HFS, calling the middleware too early, but only in some occasions.
I just made the fix, and it would be wonderful if you could confirm that it's effective for you, before i publish it.
I made my tests both on mac and windows.
this is the binary 0.53-alpha5.5 hfs-windows.zip
or if you are running with npm/npx, you need to npm -g update hfs@exp

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i changed my mind and published. It's alpha6 and you get it as hfs@beta
https://github.com/rejetto/hfs/releases/tag/v0.53.0-alpha6
it's actually the same as 5.5, just renamed.
Still, your feedback is welcome.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

cool! i'm glad we have a better tool now

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Is there some file size limitation when uploading via API ?
I uploaded a file with 500MB but it is cut after 250MB and then of course returns the wrong md5. If uploading via GUI the file is uploaded completely.
Does not matter if with or without the middleware code.

Dont tried with stable HFS version, just tried alpha-6 now

from hfs.

rejetto avatar rejetto commented on July 4, 2024

from hfs.

made1990 avatar made1990 commented on July 4, 2024

hm, there is no proxy inbetween. Its the same subnet
Funny..when uploading a 4GB file its also cut almost at the half, finished the upload after around 2GB. HTTP return code 200

from hfs.

rejetto avatar rejetto commented on July 4, 2024

from hfs.

made1990 avatar made1990 commented on July 4, 2024

simple curl

from hfs.

rejetto avatar rejetto commented on July 4, 2024

then i'm going to upload a 500+MB file with curl and see what happens

from hfs.

rejetto avatar rejetto commented on July 4, 2024
image

just uploaded 1gb, completely written and md5.
version alpha6.
then i made same test on a remote (not localhost) server over https, with credentials. Completed again.

I don't know what's different on your side. Ensure you use curl like curl -T file url/
Consider providing a video of what you are doing, because I may see a clue you are not telling.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

also, consider that uploading via API is not really an alternative way, it's the only way.
What the frontend does in Chrome is to call the same API that you are calling, and you said it is working fine in that case.
You can see the api being called pressing F12 and then using the "network" tab.
Just to clarify things.

from hfs.

made1990 avatar made1990 commented on July 4, 2024

oh wow with -T option of curl it works. File uploaded completely, md5 returned. Takes slightly longer than without the code but that makes sense of course
Before I usede -d @file to upload the file and it seems that is a different behaviour

from hfs.

rejetto avatar rejetto commented on July 4, 2024

ok, i studied curl's manual, and -d is for POST instead of PUT, or better, for "form" shaped data. It's not the plain file, it has a different structure.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

https://github.com/rejetto/hfs/wiki/Upload#from-command-line

from hfs.

made1990 avatar made1990 commented on July 4, 2024

Thx for checking.

-d works fine if you put text content as input, e.g. -d "this is my input"
instead of uploading an already existing file

from hfs.

made1990 avatar made1990 commented on July 4, 2024

So after some additional testing I would say it works great :) and it would be great to have this feature in a stable hfs version at some point.
One more slight request for that ;) Is it possible to let the user who uploads the file decide if the md5 checksum creation is triggered or not. E.g. with some information put in the HEADER when using the PUT command?

from hfs.

rejetto avatar rejetto commented on July 4, 2024

easy, after the "middleware" line you can add something like this

if (!ctx.get('x-request-md5')) return

or if you prefer to append in the URL ?get=md5

if (ctx.query.get !== 'md5') return

hopefully the stable version will arrive in 1 or 2 weeks.

from hfs.

rejetto avatar rejetto commented on July 4, 2024

i updated the wiki to use a more standard header for the md5

ctx.set({ 'Content-Digest': `md5=:${hasher.digest('hex')}:` })

it's functionally the same, but adhering to standards is often a good thing

from hfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.