Git Product home page Git Product logo

Comments (4)

rajaramsrinivas avatar rajaramsrinivas commented on August 28, 2024 1

I said the inserts could be harder since the query says "insert x returning post id. I'm not sure if we're using the post id response anywhere. This could be because of the way sqlalchemy works and that we don't use it at all. I will go through the code and confirm. If we don't, it should be straightforward too.

from gobo.

rajaramsrinivas avatar rajaramsrinivas commented on August 28, 2024

One other solution that I can think of is, delayed write into db. We are currently doing a db.commit() for every post and I think that is way too many hits on the db. We could write the entire batch of posts in one shot in db as postgres tends to perform better in one large operation vs multiple small operations.

from gobo.

rajaramsrinivas avatar rajaramsrinivas commented on August 28, 2024

I turned on logs on postgres to see if flask performs any optimizations if we simply skipped the db.commit(). Turns out, it doesn't.

The Pastebin links below contain the insert and update Postgres query logs. We seem to be hitting the db for every single insert and update individually. That's at least two inserts (one for the post and one for association) and 4 updates per post. This, in my opinion, can severely affect performance and limit Gobo's ability to scale.

https://pastebin.com/EAnsWZfa
https://pastebin.com/njvhSS2z

The updates are the easiest to fix. We need to maintain them as dicts, batch them up and update to db once per batch.

The inserts might be a little more difficult. We could write to a temp file and use copy from perhaps. I am trying to identify time gaps to confirm my hypothesis.

from gobo.

rahulbot avatar rahulbot commented on August 28, 2024

Agreed - fixing the updates seems easy. I think you're suggested that each filter called in analyze_post should just return the computed result and then analyze_post can do one update call.

Why are the inserts harder? All _add_post does is do an insert and then queue up a task for each filter to run on the post. It is only called from for loops for Facebook-sourced and Twitter-sourced content. Could we just those two to chunk the posts and then have _add_post accept a batch of 20 or something to insert more efficiently? More importantly, if we did would that actually reduce the DB load?

from gobo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.