Git Product home page Git Product logo

Comments (6)

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

http://stackoverflow.com/questions/3257263/how-do-i-get-stl-stdstring-to-work-with-unicode-on-windows

http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring

Have you run into a specific issue where you actually see/note a failure given the current implementation, or are you just assuming that there will be an issue?

Since a std::string is basically a sloppy wrapper around raw bytes, and the std::strings are used internally only by the lib to compare against other std::strings, you're just basically doing byte to byte comparisons so I'm not seeing where anything is going to go wrong, regardless of platform. Again though if you have a specific bug and reproducible error, please do share.

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

I'm using the .NET WebRequest library, and the HTML responses are of type System::String.

In order to have these usable to gumbo-query, it must be converted into an std::string. Below is my function for doing so:

std::string SystemToStdString(String^ s)
{ msclr::interop::marshal_context context; return context.marshal_asstd::string(s); }

Now, when I print the HTML output of an HttpRequest made in .NET as a system::string, the unicode characters are there, however, after converting that HTML output to an std::string, all unicode characters become '?'.

EDIT: Perhaps there's a bug in my conversion function - I'll look into it and get back to you.

I've also come across this, hence why I have assumed (perhaps wrongly) that gumbo-query doesn't support unicode for at least its .text() function:
http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring

from gumbo-query.

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

tbh I could be wrong too I'm no expert when it comes to character encoding. I've never had an issue where I've needed to learn, so I haven't bothered to. I do know the .NET string object is a proper string object that does concern itself with encoding (where the C++ string is really just a wrapper around an array of bytes), so I can definitely see where you're not going to get a 1:1 conversion. If possible, try marshaling raw a byte array from the .NET side to a C-style array of char on the C++ side and then construct a std::string around that array of bytes. Just look up the appropriate std::string constructor. If that won't work, maybe investigate using some of the available encoding functions System.Text.Encoding available on the .NET side to convert to a more appropriately encoded string before marshaling over to the native side.

https://msdn.microsoft.com/en-us/library/kdcak6ye%28v=vs.110%29.aspx

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

I've spent the afternoon trying all sorts of things, especially trying to convert between system::string and std::string. It seems as though std::string simply can't handle unicode characters reliably, especially on windows, and the universal solution that I've found almost everywhere is a need for std::wstring instead.

I only need std::wstring for the .text() function of any given CSelection, but I'm not too sure where to start in modifying the gumbo-query library to achieve this, as I noticed that gumbo-parser seems to use std::string.

I'll keep trying.

from gumbo-query.

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

If you're already using .NET, don't even bother with gumbo query. That's my 2 cents. There' s an excellent library that I was using in the C# version of my code before I did a full port to C++ called CSQuery. https://github.com/jamietre/CsQuery - It uses the Validator.Nu html parsing engine which is what is used in gecko/firefox and has full blown selector support. It's basically a port of the entire jquery lib to C#. It's available from Nuget. Does that solve you problem?

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

I'm too loyal - it took me ages to get gumbo-query working, so I'm with it for life. Considering I'm using C++ .NET I may as well stick with gumbo.
I've fixed the encoding problem, using the System::string to std::string function from:

http://blog.nuclex-games.com/mono-dotnet/cxx-cli-string-marshaling/

And the std::string to System::string function: (gcnew String(s.c_str(), 0, s.length(), Encoding::UTF8))

Thank you for your assistance :)

I see no need to switch to CsQuery - gumbo-query does everything I need very well.

from gumbo-query.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.