Git Product home page Git Product logo

Comments (7)

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

This isn't possible, because of the nature of gumbo iteself, all of the node data you're having exposed to you is entirely managed internally by gumbo. If you mess with it at all, you're going to bug out or even crash your program, because you're tampering with the managed memory of another object. This is the very clear contract that gumbo provides to you, that if you want to own things, you need to copy them.

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

I'd assume then a suitable option is to perform replacements on the original string given to CDocument and re-parse it? (I suppose that's not so bad)

from gumbo-query.

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

I made heavy modifications to gumbo-query just to be able to perform the simplest modifications of nodes at a good speed. These modifications included providing a Get() method to expose the underlying gumbo_node of CDocument/CNode. I then wrote several helper functions, the most important one is generating a unique node ID string. Like so:

std::string SerializeUtil::getUniqueNodeId(GumboNode* node)
{
    std::string nodeId = "";

    nodeId.append(std::to_string(node->index_within_parent));

    GumboNode* parent = node->parent;

    while (parent != nullptr)
    {
        nodeId.append(std::to_string(parent->index_within_parent));

        parent = parent->parent;
    }

    return nodeId;
}

Using this unique node ID, I could then keep a map of nodes I wanted to manipulate by storing them in a simple std::unordered_map<std::string, int> object. The INT can be set to an integer that represents what manipulation you wish to have done on the node while it is being rendered. For example, remove, modify so on. Then I heavily modified https://github.com/google/gumbo-parser/blob/master/examples/serialize.cc to take an optional pointer to such maps, so that while it's rendering the GumboOutput back to an HTML string, it can perform modifications (by checking the unique ID of each node as it begins to render it against the unordered_map provided).

So yeah, not too bad, but there is a lot involved to doing these modifications. For me, this approach was necessary because I'm doing modifications to HTML in real-time as users browse, so speed was of the utmost importance.

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

In my case speed isn't an issue. I'm making some POST requests, analysing the response and then making subsequent POST requests. I haven't tried it yet, but I'm assuming my simple idea of using the Replace function on my System::Strings should work (that particular replace function is quite fast), as I probably need to replace a couple of

tags each containing a few thousand or so characters after each POST. It's not optimal but it might be okay. Thanks for clarifying that for me though.

from gumbo-query.

TechnikEmpire avatar TechnikEmpire commented on June 21, 2024

Look at the code behind the text() methods and such in gumbo-query. They are just convenience functions that copy data from the parsed html, which resides exclusively in and owned by GumboOutput. So if you change the text that you get back from node.text(), this will have absolutely no effect on the actual document that you parsed. gumo-parser and gumbo-query only provide to you a read-only access to traverse parsed html. Maybe I'm not understand your use case, maybe the only text you need you're getting copied to you when you call text() on your node. But I want to make it clear that if you're expecting to get a HTML response, replace the text() of one element and end up with the whole response including your modifications, this simply isn't possible out of the box.

from gumbo-query.

CrimsonVex avatar CrimsonVex commented on June 21, 2024

I'm thinking more along the lines of having a global variable string. Everytime I make a new request that responds with pieces of HTML, I merge them into the global string by replacing the current CNode.text() with the HTML piece, and pass this global string to CDocument to be analysed again before making further requests.

from gumbo-query.

lazytiger avatar lazytiger commented on June 21, 2024

I think this feature can be implemented by CNode:startPos and CNode:endPos
You can replace the data from startPos to endPos as what you want.

from gumbo-query.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.