Comments (7)
This isn't possible, because of the nature of gumbo iteself, all of the node data you're having exposed to you is entirely managed internally by gumbo. If you mess with it at all, you're going to bug out or even crash your program, because you're tampering with the managed memory of another object. This is the very clear contract that gumbo provides to you, that if you want to own things, you need to copy them.
from gumbo-query.
I'd assume then a suitable option is to perform replacements on the original string given to CDocument and re-parse it? (I suppose that's not so bad)
from gumbo-query.
I made heavy modifications to gumbo-query just to be able to perform the simplest modifications of nodes at a good speed. These modifications included providing a Get() method to expose the underlying gumbo_node of CDocument/CNode. I then wrote several helper functions, the most important one is generating a unique node ID string. Like so:
std::string SerializeUtil::getUniqueNodeId(GumboNode* node)
{
std::string nodeId = "";
nodeId.append(std::to_string(node->index_within_parent));
GumboNode* parent = node->parent;
while (parent != nullptr)
{
nodeId.append(std::to_string(parent->index_within_parent));
parent = parent->parent;
}
return nodeId;
}
Using this unique node ID, I could then keep a map of nodes I wanted to manipulate by storing them in a simple std::unordered_map<std::string, int>
object. The INT can be set to an integer that represents what manipulation you wish to have done on the node while it is being rendered. For example, remove, modify so on. Then I heavily modified https://github.com/google/gumbo-parser/blob/master/examples/serialize.cc to take an optional pointer to such maps, so that while it's rendering the GumboOutput back to an HTML string, it can perform modifications (by checking the unique ID of each node as it begins to render it against the unordered_map provided).
So yeah, not too bad, but there is a lot involved to doing these modifications. For me, this approach was necessary because I'm doing modifications to HTML in real-time as users browse, so speed was of the utmost importance.
from gumbo-query.
In my case speed isn't an issue. I'm making some POST requests, analysing the response and then making subsequent POST requests. I haven't tried it yet, but I'm assuming my simple idea of using the Replace function on my System::Strings should work (that particular replace function is quite fast), as I probably need to replace a couple of
from gumbo-query.
Look at the code behind the text() methods and such in gumbo-query. They are just convenience functions that copy data from the parsed html, which resides exclusively in and owned by GumboOutput. So if you change the text that you get back from node.text(), this will have absolutely no effect on the actual document that you parsed. gumo-parser and gumbo-query only provide to you a read-only access to traverse parsed html. Maybe I'm not understand your use case, maybe the only text you need you're getting copied to you when you call text() on your node. But I want to make it clear that if you're expecting to get a HTML response, replace the text() of one element and end up with the whole response including your modifications, this simply isn't possible out of the box.
from gumbo-query.
I'm thinking more along the lines of having a global variable string. Everytime I make a new request that responds with pieces of HTML, I merge them into the global string by replacing the current CNode.text() with the HTML piece, and pass this global string to CDocument to be analysed again before making further requests.
from gumbo-query.
I think this feature can be implemented by CNode:startPos and CNode:endPos
You can replace the data from startPos to endPos as what you want.
from gumbo-query.
Related Issues (20)
- CNode::startPos 与CNode::endPos颠倒的问题 HOT 3
- Crash when select string include '(' char. HOT 1
- Brew formula is wrong
- nth-of-type always select nothing
- add more examples
- Getting the OuterHTML HOT 2
- Getting nodes with a specific class HOT 5
- nth-child(odd) skips first node HOT 3
- CObject destructor can throw, but shouldn't
- Brew unable to install and unable to make HOT 1
- Unable to install Gumbo Query due to it unable to find Gumbo parser shared library HOT 2
- Problems while encoding russian symbols
- Document.h includes gumbo.h which is missing HOT 1
- Where is <gumbo.h> located at in the src folder? i have my own cmake workspace and just need the source code HOT 3
- Trimming strings for advanced datasets HOT 1
- Error: No available formula with the name "gumbo-query"
- Static library not found
- not support css3 selector
- Conan package
- crashed when syntax error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gumbo-query.