Git Product home page Git Product logo

Comments (3)

aaronsteers avatar aaronsteers commented on May 31, 2024 1

This will be merged shortly:

Also - after some additional testing, I can report back that while the first stream in the 'select all' scenario has around 2-3 records per second, the performance for issues and pull requests is about 10x faster. 😌

from pyairbyte.

aaronsteers avatar aaronsteers commented on May 31, 2024

I've found that performance is much faster if we filter for just the streams we care about. For instance, selecting just issues and pull_requests gives about 10x the performance. Still not fast, but not a bug-level defect at that speed.

For tests and benchmarking, I'm going to start using airbytehq/quickstarts rather than airbytehq/airbyte.

Regarding DX:

The developer experience when auto-selecting all streams unless the user requests otherwise is probably is not scalable and it's setting up users for a frustrating time. Other similar libraries, such as in LangChain, will require users to pick a single stream.

I'm going to suggest we fail if users have not requested any specific streams. The failure message will list what streams are available - so it's easy to remedy the omission. We can also add a "select_all_streams()" method so that if that's what the user wants, they can still quickly achieve it.

In the GitHub example, the recommended added step would be:

# Create the source as before
source_github = get_source(...)

# Add this step to pick the streams we want:
source_github.set_streams(["issues", "pull_requests"])

# Now we sync as usual
read_result = ...

from pyairbyte.

aaronsteers avatar aaronsteers commented on May 31, 2024

Confirmed today that our performance is back in acceptable range, using the DuckDB default cache strategy. There are still some slow streams, but this is mitigated by now requiring users to either run select_streams() or select_all_streams().

Closing as resolved.

from pyairbyte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.