cksac / dataloader-rs Goto Github PK

View Code? Open in Web Editor NEW

257.0 257.0 23.0 163 KB

Rust implementation of Facebook's DataLoader using async-await.

License: Apache License 2.0

Rust 100.00%

dataloader-rs's People

Contributors

Stargazers

Watchers

dataloader-rs's Issues

Can't run with it

I'm sorry to bother you, but I have a problem. I hope I can get your help.thank you very much!

About BatchFn load function returns

Hi, cksac

Maybe return Result<HashMap<K, V>, dataloader::Error> is better in here. most times we should query db in this function like this. what do you think?

Release with tokio 0.2?

This issue is to request a new release (0.7) with an updated dependency to tokio 0.2 and the relevant futures version (or std futures)

Doesn't build: the trait futures_core::stream::Stream is not implemented for ext::TimeoutStream<S>

Not exactly sure what is wrong here, probably something with futures having changed.

/U/d/De/dataloader-rs [master@0248d52e50a2d]
$ rustup override set nightly
info: using existing install for 'nightly-x86_64-apple-darwin'
info: override toolchain for '/Users/davidpdrsn/Desktop/dataloader-rs' set to 'nightly-x86_64-apple-darwin'

  nightly-x86_64-apple-darwin unchanged - rustc 1.40.0-nightly (e413dc36a 2019-10-14)


/U/d/De/dataloader-rs [master@0248d52e50a2d]
$ cargo build
   Compiling futures-timer v0.2.1
error[E0277]: the trait bound `ext::TimeoutStream<S>: futures_core::stream::Stream` is not satisfied
   --> /Users/davidpdrsn/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-timer-0.2.1/src/ext.rs:170:9
    |
170 | impl<S> TryStream for TimeoutStream<S>
    |         ^^^^^^^^^ the trait `futures_core::stream::Stream` is not implemented for `ext::TimeoutStream<S>`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.
error: could not compile `futures-timer`.

To learn more, run the command again with --verbose.

Why dataloader is handling errors?

Hello,
I'm wondering why cached loader have to handle errors, I don't see any particular treatment done on the results in this crate. Wouldn't it be simpler if loader can handle whatever value users want it to handle.

Loader could be declared as follows:

pub struct Loader<K, V, F, C = HashMap<K, V>>
where
    K: Eq + Hash + Clone,
    V: Clone,
    F: BatchFn<K, V>,
    C: Cache<Key = K, Val = V>,

This would still allow for anyone to put a (cloneable) Result as output value.

In particular I'd like to use the loader to return a Vec<Result<_, _>> for each key given to the BatchFn; What do you think about this design change ?

Upgrading to std Futures and latest tokio

Hi, there! Thank you for this crate 🙇

I'd like to use it extensively, but is it still maintained? The latest change was made at Aug 4 2018 and master depends on tokio-core, which is deprecated long ago.

How about upgrading to the latest tokio and refactoring with std Futures which will come in the nearest 1.36 Rust release? Would you mind on my elaboration on it?

Using dataloader with juniper (again)

I have returned to my abandoned GraphQL project :) I'm trying to solve N+1 problem using dataloader-rs along with juniper and actix-web. I have two simple entities: Article and Author. One article have many authors. Below I will show the most important places of my program.

Actix handler:

use crate::graphql::context::Context;
use crate::graphql::schema::{create_schema, Schema};
use actix_web::{web, Error, HttpResponse};
use juniper::hProblemttp::GraphQLRequest;
use std::sync::Arc;

async fn graphql(
    ctx: web::Data<Context>,
    schema: web::Data<Arc<Schema>>,
    req: web::Json<GraphQLRequest>,
) -> Result<HttpResponse, Error> {
    let response = req.execute(&schema, &ctx).await;
    let json = serde_json::to_string(&response)?;
    Ok(HttpResponse::Ok()
        .content_type("application/json")
        .body(json))
}

Authors dataloader:

use crate::environment::db::Database;
use crate::models::author::Author;
use dataloader::BatchFn;
use std::collections::HashMap;
use tokio_postgres::Row;

pub struct AuthorsLoader {
    pub db: Database,
}

#[async_trait::async_trait]
impl BatchFn<i32, Author> for AuthorsLoader {
    async fn load(&self, keys: &[i32]) -> HashMap<i32, Author> {
        let client = self.db.get_client().await.unwrap();

        let map: HashMap<i32, Author> = client
            .query(
                "SELECT * FROM articles.authors WHERE id = ANY($1)",
                &[&keys.to_vec()],
            )
            .await
            .unwrap()
            .into_iter()
            .map(|r: Row| (r.get("id"), Author::from(r)))
            .into_iter()
            .collect();

        map
    }
}

Article type resolver:

use crate::{graphql::context::Context, models::{author::Author, article::Article}};
use itertools::Itertools;

#[juniper::graphql_object(Context = Context)]
impl Article {
    fn id(&self) -> &i32 {
        &self.id
    }

    fn title(&self) -> &str {
        &self.title
    }

    async fn authors(&self, ctx: &Context) -> Vec<Author> {
        let a = ctx.authors_loader
            .load_many(self.author_ids.clone())
            .await
            .values()
            .cloned()
            .collect_vec();

        a
    }
}

Deps:

[dependencies]
log = "0.4"
chrono = "0.4"
actix-web = "2.0"
actix-rt = "1.0"
env_logger = "0.7"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
juniper = { git = "https://github.com/graphql-rust/juniper" }
dotenv = "0.15"
tokio = "0.2"
tokio-postgres = "0.5"
dataloader = "0.12"
futures = "0.3"
async-trait = "0.1"
itertools = "0.9"

Problem

The panic occurs when the server tries to load authors through load_many method:

thread 'actix-rt:worker:0' panicked at 'found key 0 in load result', <::std::macros::panic macros>:5:6

What I'm doing wrong?

Which version should I use?

0.5.1 from crates or 0.6.0-dev? When will the new release be in the crates.io?

Specify fields in key, or allow separate load/cache keys

I am wanting to only load requested fields from the database. For example, take this GraphQL type:

type Person {
  id
  name
  birthday
}

With the following request:

person {
  id
  name
}

Rather than doing SELECT * FROM people, I want to only get the id and name fields; not birthday.

It seems that in the official Node based module they recommend using separate cache and load keys. This means BatchFn might use (PersonID, Vec<String>) as its ID, but Loader would use PersonID. Multiple load keys map to one cache key; so it'd be up to the BatchFn implementation to dedup the person/fields combinations.

how to pass argument to load method

I am using async_graphql and dataloader and it works fine.

But when I want to limit output using last argument. I do not know how to do this with dataloader.

authors {
 name
 books(last: 30)
}

Getting books is called like this in graphql object ( without data loader )

pub async fn books(
    &self,
    ctx: &Context<'_>,
    last: i32,
) -> Result<Vec<Books>, AppError> {
    ctx.data_unchecked::<BooksRepository>()
        .get_for_id(self.id, last)
        .await
}

When using dataloader I do not know how to make it work.

pub async fn books(
    &self,
    ctx: &Context<'_>,
    last: i32,
) -> Result<Vec<books>, AppError> {
    let loader = ctx.data_unchecked::<BooksLoader>();
    loader.load(self.id).await
}

Maybe I just overlooked something or just do not have mu day, but I do not know how to solve it. Thanks for suggestion.

Inconsistent execution results

I've noticed that sometime batches works, and some times not... I'm using a very simple schema at the moment, sending the same request multiple time can give different results in how the loader is executed:

query foo {
  orders {
    id
    user { id }
  }
}

I'm using a loader on the orders.user field:

#[juniper::graphql_object(Context = Context)]
impl Order {
    fn id(&self) -> juniper::ID {
        self.id.to_string().into()
    }

    async fn user(&self, ctx: &Context) -> types::User {
        ctx.loaders.user.load(self.user.clone()).await.unwrap()
    }
}

and the UserLoader is basically a copy past of the example in juniper doc

db contains 2 orders, both having the same user field. Here is some logs of the same request send multiple times:

 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 9.229761ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c)]
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 12.157163ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 12.120952ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 10.009887ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes

We can see that some times load batch is called with 2 ids, some times called twice with the same id.

More infos:

tokio runtime (tried both threaded_scheduler and basic_scheduler with the same result)
warp http server
juniper master branch
mongodb database (for which the driver is not async yet)
context is created per request (and so are loaders)

I can provide more code if necessary, or maybe even a simple repo to reproduce the issue.

non-cached `try_load` panics instead of returning `Err` when requested by more than one

this test fails by panicking when I add requesters like below:

let load_fn = LoadFnForEmptyTest;
let loader = Loader::new(load_fn.clone()).with_max_batch_size(4);

let l1 = loader.clone();
let h1 = thread::spawn(move || {
    let r1 = l1.try_load(1337);
    let r2 = l1.try_load(1338);

    let (f1, f2) = block_on(futures::future::join(r1, r2));
    assert!(f1.is_err());
    assert!(f2.is_err());
});

let _ = h1.join().unwrap();

This is because this line expects value to be present at state.complete, while early-returning here may make values to be discarded (this is why the test passes when there is only one requester). Possible fix would be to put error handling after batch completion, along with failed state which would hold <requestId, key> (without introducing this state, we would have no way to attach key to error message as now)

What's the reason BatchFn::load returns a HashMap instead of a Vec?

That's pretty much what the reference implementation does: https://github.com/graphql/dataloader/tree/90353d8d34063f92c7c6300d66d0e9ce0a8d51c4#batch-function.

Indexing into a HashMap is notoriously slower than indexing into a Vec. On top of that, Rust, by default, uses a computationally expensive hashing algorithm.

But yeah, that'd be a breaking change for end users.

How to use it with Juniper?

Before Rust 1.39, I successfully used this library. But with the advent of async/await, I was completely confused. Below is an example of my broken code:

#[juniper::object(Context = Context)]
impl Article {
    fn id(&self) -> i32 {
        self.id
    }

    fn title(&self) -> &str {
        self.title.as_str()
    }

    async fn authors (&self, context: &Context) -> FieldResult<Vec<Author>> {
        let authors = context.authors_loader().load_many(self.author_ids.clone());
        Ok(authors.await)
    }
}

I'm trying to load an authors of the article with dataloader, but I got confused with the return types and how to get authors from the dataloader. The compiler gives me an error:

error[E0728]: `await` is only allowed inside `async` functions and blocks
  --> src/schema.rs:52:12
   |
40 | #[juniper::object(Context = Context)]
   | ------------------------------------- this is not `async`
...
52 |         Ok(authors.await)
   |            ^^^^^^^^^^^^^ only allowed inside `async` functions and blocks


error[E0308]: mismatched types
  --> src/schema.rs:52:12
   |
52 |         Ok(authors.await)
   |            ^^^^^^^^^^^^^ expected struct `std::vec::Vec`, found enum `std::result::Result`
   |
   = note: expected type `std::vec::Vec<_>`
              found type `std::result::Result<std::vec::Vec<_>, dataloader::LoadError<()>>`


error: aborting due to 2 previous errors

Could not compile example code

What I've tried

cargo check --examples

What I get

error[E0433]: failed to resolve: could not find `document_join_macro` in `futures_util`
   --> /Users/takahashiatsuki/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.3.1/src/lib.rs:504:15
    |
504 | futures_util::document_join_macro! {
    |               ^^^^^^^^^^^^^^^^^^^ could not find `document_join_macro` in `futures_util`

error[E0433]: failed to resolve: could not find `document_select_macro` in `futures_util`
   --> /Users/takahashiatsuki/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.3.1/src/lib.rs:528:15
    |
528 | futures_util::document_select_macro! {
    |               ^^^^^^^^^^^^^^^^^^^^^ could not find `document_select_macro` in `futures_util`

error: aborting due to 2 previous errors

My workaround

It seems to be relevant to graphql-rust/juniper#659 . So I changed juniper branch from async-await to master, then above error is disappeared but another bunch of compile errors arise. I couldn't solve these errors because I'm not familiar with juniper.

I hope that someone will fix this. Thanks.

LRU-based cache implementation

Would it be possible to include an lru-based cache implementation in order to better manage memory consumption? Something like https://github.com/maidsafe/lru_time_cache would be very useful.

On a side note, is there any particular reason the yield_count property hard-coded to 10? Thanks!

Consider adding prime_many to loader interface

Ex when a search result returns not just the keys but the data itself, it'd be good to populate the cache from the response. Thus any new request to the already know values can be omitted.
As I see prime can be used to add values to the cache, but it performs a lock on each addition. prime_many could accept an iterator of (K,V) pairs and could hold the lock for the entire update.

Or is there any other option to fill/add to the loader a known value ?

feature "runtime-tokio" doesn't compile

There were some missing features for tokio in Cargo.toml. Added a PR @cksac in #24

Fallible .load()

shouldn't .load() return Result<HashMap<_, _>> rather than just the hashmap itself? this forces me to panic on failed db request...

Would you mind publishing the latest version with #33 to crates.io?

Willing to accept full documentation contributions?

Hello! Thanks for this crate!

I'm evaluating a stack for a web app (including postgres, actix, and juniper) and I'm concerned about potentially very bad performance on the backend <-> database connection. It seems like more than caching, I'd need the ability to specialize specific types of queries to be able to optimize bottlenecks (i.e. the most common / expensive queries should be done in a single SQL query).

It seems dataloaders are a possible fix for this but I don't have a ton of context on GraphQL or dataloaders so I'm still trying to figure out what exactly they do and if they'll solve the problem I have or if I need something else or have to drop GraphQL.

It's a bit difficult for me without documentation of this crate. I've looked through the code and and it seems well written and pretty small overall though, so I think I'd be able to add docs myself.

Would you be interested in accepting pull requests to fully document this crate (for docs.rs)?

Additionally, would you be interested in enabling a lint in CI (or directly in the Rust code) to deny missing docs?

If I have to understand everything myself anyways, I might as well document everything so that it'll be easier for future people to use this. But before I commit to spending that time, I want to check with you to see if you'd be willing to accept PRs for documentation.