Git Product home page Git Product logo

Comments (8)

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

David Webb commented

Results of the investigation are as follows.

Pagination support with Spring Data requires using interfaces that support attributes like "Total Pages" and "Total Records" to divide by the page size to get the total pages.

While this is a valid CQL query in C*, the query will certainly bring down an entire C* cluster. The more nodes and the larger the dataset in the CF, the harder it will fail.

select count(*) from column_family;

While the CQL engineers are okay with this, the SDC team is not okay with adding a feature that will knowingly negatively impact HA Production systems.

I will leave this open for discussion by the voters. I am open to a good solution, but after thinking about this, and working with large C* clusters for the last 2 years, I do not see a solution that will work.

I believe that in the Big Data realm, the users of this technology must accept that there is a trade-off from the RDBMS realm, and some functions (Pagination) are not reasonable or possible.

Feedback welcome.... :)

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Jamal Fanaian commented

Hi there! I'm new to this project, but I have been thinking about how pagination could be efficiently implemented in SDC and have some thoughts to share.

I agree that the standard SD Page/Slice interfaces will not work with Cassandra. They assume offset based pagination which is not supported in Cassandra. I think you hinted at this in another ticket related to pagination, but exposing a continuation/range based interface would solve this problem. Instead of exposing an offset, this interface would expose a continuation value and page size. We could then use the continuation value to do a range query to fetch results after the continuation. In CQL, the query could look something like this:

SELECT ... FROM table WHERE token(pk) > token(continuationValue) LIMIT pageSize;

In addition, such an interface could be used with other SD adapters as well. Even when working with a RDMBS, offset based pagination is not efficient when dealing with large data sets. So, an implementation in SD would make more sense.

I took on the task of implementing a proof-of-concept that I could share here (forked in GH, links below). The implementation is not complete or clean, and assumes a couple of things that I'd like some feedback on. But, it works!

A couple of assumptions I have made:

  • Currently, it assumes that you're always querying the continuation agains the ID. This makes sense for SDC, but in an RDBMS you could query with an indexed column. Is it necessary to support this?
  • It currently extends Slice, which also exposes offset (currently returns 0). Should this be its own type?
  • The name Continuation made sense to me, although it is quite verbose. I'm definitely open to suggestions on how this should be named.
  • The response object has not been defined. This would require determining how to serialize the continuation value in a way that makes sense. In previous projects, I exposed links with next/previous cursor of a base64url encoded value. Would such an implementation make sense here?

I have made changes to both spring-data-commons and spring-data-cassandra to add support for this:

And I have a working example project that shows the usage as well:

One caveat is that the existing PagingAndSorting repository would not support this. Currently, I'm only using it by defining a custom method in my Repository, but it would make sense to implement a ContinuationRepository.

If this implementation makes sense, I would love to finish this up and submit a PR so any feedback on that would be appreciated.

Thanks!

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Mark Paluch commented

Hi Jamal Fanaian,

That's awesome. I see two options for paging. Stateful and stateless paging. Stateful paging is the way you approached. Users would reuse the cursor returned by the Cassandra query (Continuable from your example) to obtain the next page.
Stateful paging works only if we know the last value that was retrieved from the SELECT, so it's not feasible for streaming queries.

Stateless paging is the other approach, but it would be more expensive than stateful paging. Stateless paging could work with the existing Pageable objects and perform the skip of records by just iterating over the returned elements without converting those rows into entities.

I'd leave the decision up to the user with a clear statement of how it works and what to expect from the API. We use Pageable inside of Spring Data REST so it would be a nice feature to expose Spring Data Cassandra Repositories with paging support.

Does this make sense?

Any thoughts Oliver Drotbohm, John Blum?

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Jamal Fanaian commented

Hi Mark Paluch,

Thank you for the feed back! I can see the appeal of providing support for Pageable using stateless pagination. It is definitely convenient if you can control the size of your result set. But, I'm afraid a lot of users may naively implement this since it's the standard for many of the other Spring Data adapters without understanding the costs. If the consensus is to support this approach, then I'd be happy to implement it with my current change set. I did have one question in regards to this, though. If using Pageable, are you also expecting to return Page? What is your plan in regards to Page#getTotalElements() and Page#getTotalPages()? Or, should SDC only support returning Slice?

In regards to stateful pagination, my plan is to add Continuable support to Spring Data REST. In my previous comment I mentioned that I would want to return a serialized and encoded version of the continuation value. The idea was that these would be generated when building the response and provided under the _links key, similarly to how Pageable is handled. My current thought was to do something such as:

http://api.example.com/v1/path?next=foo
http://api.example.com/v1/path?previous=bar

And, when those values were present, provide a ContinuationRequest that can be used in a request endpoint. I wanted to provide an example of where I was headed to get some feedback before I continued further with this implementation.

Thanks for the feedback so far! I'm looking forward to hearing more, and hope that this can lead to something that could eventually be used :)

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

John Blum commented

PR #114 reviewed, polished and merged to master for the Spring Data *Kay GA release

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Łukasz Gosiewski commented

Hi all.

Have anyone thought about convenient way of connecting this solution with serialization/deserialization of return Pageable? This is needed to expose this as REST method for fetching. Native PagingState can be easily serialized with it's .toString() or .toBytes() methods, but it looks like CassandraPageRequest contains some additional pieces of information and can't be serialized as easily

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Mark Paluch commented

LukaszGosiewski care to file a new ticket in Spring Data REST?

from spring-data-cassandra.

spring-projects-issues avatar spring-projects-issues commented on May 5, 2024

Mark Paluch commented

Update to my previous comment: Care to file a new ticket in Spring Data Cassandra as we need to provide a serialization mechanism in the first place

from spring-data-cassandra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.