Git Product home page Git Product logo

Comments (11)

mscdex avatar mscdex commented on July 21, 2024 20

@bug249286 Clients should be sending non-latin1 header parameter values using the format (encoded words) defined by RFC5987. If they don't send that, then the values are assumed to be encoded as latin1.

You can safely convert the filename to utf-8 since latin1 preserves individual bytes. For example: Buffer.from(filename, 'latin1').toString('utf8') or using TextDecoder, node supports both.

from busboy.

bug249286 avatar bug249286 commented on July 21, 2024 1

Parsing fails if filename contains UTF-8 characters

Content-Disposition: form-data; name="file"; filename="ทดสอบภาษาไทย.xlsx"
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

const bb = busboy({ headers: req.headers ,defCharset:'utf8'});

info {
filename: 'à¸\x97à¸\x94สอà¸\x9Aภาษาà¹\x84à¸\x97ย.xlsx',
encoding: '7bit',
mimeType: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
}

filename Parsing fail.
how to fix.

Thank you.

from busboy.

CleyFaye avatar CleyFaye commented on July 21, 2024 1

I agree that it is a valid solution (and I am using it already), but my question was more about how we are supposed to do this if we wanted to do it "properly". My use cases are limited to common browsers (and react-native to some extent) and no client send UTF-8 data the way described in RFC5987 (using extended parameters).

This, in addition to the comment in RFC7578 regarding not using extended parameters for filename in form data leaves me confused as to what's the proper way to handle this. In that sense, starting to add filename* into formdata client side would seem to go the opposite direction as everyone else (in addition to being tedious).

Anyway, I just now see that this is a quite old issue, so it's probably not the correct place to discuss this. I only found out recently through an update of the multer library. Sorry for the noise.

from busboy.

mscdex avatar mscdex commented on July 21, 2024

I'm confused, both of your curl statements are the same. Where is the utf-8 filename?

from busboy.

avidenie avatar avidenie commented on July 21, 2024

Yeah, sorry about that, I've updated the report.

from busboy.

mscdex avatar mscdex commented on July 21, 2024

Ok, this should be fixed in master now. Can you give it a try?

from busboy.

avidenie avatar avidenie commented on July 21, 2024

It seems to be working fine now, thank you very much.

from busboy.

bug249286 avatar bug249286 commented on July 21, 2024

@mscdex Thank you very much.

from busboy.

CleyFaye avatar CleyFaye commented on July 21, 2024

I am confused on how we are supposed to send UTF-8 (or other) strings. While RFC5987 do mention extended parameters, RFC7578 discourage their use, and some actual browser do not send the extra filename* (including Chrome, Edge and Firefox), instead putting the utf-8 name in filename.

As it is, we can do the aforementioned conversion by hand outside of busboy (or in my case multer), but is that really something that fall outside the scope of this library, seeing that the "supported" method of using filename* is not used much in the wild?

from busboy.

mscdex avatar mscdex commented on July 21, 2024

@CleyFaye Writing a library like this that works for everyone everywhere is basically impossible. IMO it's safer to err on the side of history for compatibility purposes until all clients overwhelmingly assume UTF-8 values for filenames. The workaround I provided earlier in this issue is a valid solution if you are in control of the client and want to assume UTF-8 (or any other charset for that matter).

from busboy.

mscdex avatar mscdex commented on July 21, 2024

@CleyFaye If you're using HTML forms, I would say either set the page's encoding to utf-8 or set the form's accept-charset attribute to utf-8 (it defaults to the page encoding if not set). If nothing else, another potential solution that would work for HTML and non-HTML would be to send the encoding as the first field in the form.

from busboy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.