Comments (6)
http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring
Have you run into a specific issue where you actually see/note a failure given the current implementation, or are you just assuming that there will be an issue?
Since a std::string is basically a sloppy wrapper around raw bytes, and the std::strings are used internally only by the lib to compare against other std::strings, you're just basically doing byte to byte comparisons so I'm not seeing where anything is going to go wrong, regardless of platform. Again though if you have a specific bug and reproducible error, please do share.
from gumbo-query.
I'm using the .NET WebRequest library, and the HTML responses are of type System::String.
In order to have these usable to gumbo-query, it must be converted into an std::string. Below is my function for doing so:
std::string SystemToStdString(String^ s)
{ msclr::interop::marshal_context context; return context.marshal_asstd::string(s); }
Now, when I print the HTML output of an HttpRequest made in .NET as a system::string, the unicode characters are there, however, after converting that HTML output to an std::string, all unicode characters become '?'.
EDIT: Perhaps there's a bug in my conversion function - I'll look into it and get back to you.
I've also come across this, hence why I have assumed (perhaps wrongly) that gumbo-query doesn't support unicode for at least its .text() function:
http://stackoverflow.com/questions/402283/stdwstring-vs-stdstring
from gumbo-query.
tbh I could be wrong too I'm no expert when it comes to character encoding. I've never had an issue where I've needed to learn, so I haven't bothered to. I do know the .NET string object is a proper string object that does concern itself with encoding (where the C++ string is really just a wrapper around an array of bytes), so I can definitely see where you're not going to get a 1:1 conversion. If possible, try marshaling raw a byte array from the .NET side to a C-style array of char on the C++ side and then construct a std::string around that array of bytes. Just look up the appropriate std::string constructor. If that won't work, maybe investigate using some of the available encoding functions System.Text.Encoding available on the .NET side to convert to a more appropriately encoded string before marshaling over to the native side.
https://msdn.microsoft.com/en-us/library/kdcak6ye%28v=vs.110%29.aspx
from gumbo-query.
I've spent the afternoon trying all sorts of things, especially trying to convert between system::string and std::string. It seems as though std::string simply can't handle unicode characters reliably, especially on windows, and the universal solution that I've found almost everywhere is a need for std::wstring instead.
I only need std::wstring for the .text() function of any given CSelection, but I'm not too sure where to start in modifying the gumbo-query library to achieve this, as I noticed that gumbo-parser seems to use std::string.
I'll keep trying.
from gumbo-query.
If you're already using .NET, don't even bother with gumbo query. That's my 2 cents. There' s an excellent library that I was using in the C# version of my code before I did a full port to C++ called CSQuery. https://github.com/jamietre/CsQuery - It uses the Validator.Nu html parsing engine which is what is used in gecko/firefox and has full blown selector support. It's basically a port of the entire jquery lib to C#. It's available from Nuget. Does that solve you problem?
from gumbo-query.
I'm too loyal - it took me ages to get gumbo-query working, so I'm with it for life. Considering I'm using C++ .NET I may as well stick with gumbo.
I've fixed the encoding problem, using the System::string to std::string function from:
http://blog.nuclex-games.com/mono-dotnet/cxx-cli-string-marshaling/
And the std::string to System::string function: (gcnew String(s.c_str(), 0, s.length(), Encoding::UTF8))
Thank you for your assistance :)
I see no need to switch to CsQuery - gumbo-query does everything I need very well.
from gumbo-query.
Related Issues (20)
- CNode::startPos 与CNode::endPos颠倒的问题 HOT 3
- Crash when select string include '(' char. HOT 1
- Brew formula is wrong
- nth-of-type always select nothing
- add more examples
- Getting the OuterHTML HOT 2
- Getting nodes with a specific class HOT 5
- nth-child(odd) skips first node HOT 3
- CObject destructor can throw, but shouldn't
- Brew unable to install and unable to make HOT 1
- Unable to install Gumbo Query due to it unable to find Gumbo parser shared library HOT 2
- Problems while encoding russian symbols
- Document.h includes gumbo.h which is missing HOT 1
- Where is <gumbo.h> located at in the src folder? i have my own cmake workspace and just need the source code HOT 3
- Trimming strings for advanced datasets HOT 1
- Error: No available formula with the name "gumbo-query"
- Static library not found
- not support css3 selector
- Conan package
- crashed when syntax error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gumbo-query.