Git Product home page Git Product logo

Comments (13)

paulcwarren avatar paulcwarren commented on May 23, 2024 1

If the high-throughput requirement is on the content metadata, not on the content itself then it might be entirely possible to set this up already as it is all just Spring Data essentially. Doesn't sound like there would be any Spring Content in there during these "batch operations". I would actually be very interested in an experiment to prove this out actually because it sounds like this could be a key differentiator. I am going to add something to our backlog to experiment with this and we'll try and get to it soon-ish. Thanks for clarifying that.

I will move ahead with the existing implementation then and we can, of course, circle back around and more sophisticated "content storage" strategies in support of high-throughput use cases if and when those surface and become a requirement for someone.

Sounds reasonable?

from spring-content.

kdavisk6 avatar kdavisk6 commented on May 23, 2024 1

Absolutely. If it would help, I can provide some uses cases based on my experience we can consider during the experiment.

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

Hi @snoop244. Thanks for the compliments. Very much appreciated. So you are right. Versioning is a missing feature right now. It would be possible to implement your own but your issue is extremely well timed as we were toying with priorities and trying to decide between Versioning or Retention. We like to be customer-driven so you have tipped the scales in favor of Versioning.

To that end, we would be happy to collaborate on the design and/or implementation of this feature. That way you potentially don't have to maintain a customization and others would benefit/help to improve it.

Let me know and if you would like to review the design at least I can post it on this issue.

To answer your specific questions.

Also, it looks like the id is generated by Spring Content, so I assume we couldn't use that for versioning (1.0, 1.1...,). Let me know if I have that wrong.

Yes, the ID is generated by Spring Content but it is possible to set it ahead of time and SC will honor the given value. So it would be possible to generate a UUID and suffix a version label. That said, (and I need to think about it some more) my inclination right now would be to take a Documentum-like approach where the version was a separate field from the object ID. Most likely annotated with @Version.

turn off the searchability/indexing for non-current versions

Yes, either that or additional filtering to the fulltext resultset to filter out non-current versions. Either way, the fulltext feature needs to be changed.

Obviously the store would get bloated over time and that would need to be addressed by some other process.

It would, although Documentum never bothered too much about this. But a separate task could be used to purge non-current versions according to some criteria.

Am I missing anything major here?

I don't think so. The big decision is whether to version the metadata and the content. Or just the content. Since, I consider the metadata an extension of the content my preference is the former but this does mean that only AssociativeStore/ContentStore can support a new Versionable interface. Store wouldn't be able to (unlike the other two it isnt associated with an Entity) but that probably this makes sense anyways.

Anyways, let me know if you want to collaborate and I'll write up something more concrete. Either way we'll pick Versioning up as the next set of stories we work on.

from spring-content.

snoop244 avatar snoop244 commented on May 23, 2024

Happy to collaborate and help where I can. I'm not much of a developer but I can certainly help with feature definition and prioritization. I do work with some very good developers, mind you, so if we can contribute on that level we'd be happy to do it.

In the U.S. I gather retention is a big deal because of Sarbanes-Oxley. Not so much up in Canada - though it is becoming more important. I definitely vote for versioning.

Let me know how I can help. I've got deep experience with Alfresco, CMIS (Jackrabbit), Sharepoint, HP/Opentext Extreme - all of which were heavy and poor fits in the end. They all tried to do too much imho. I favour your approach that keeps it simple and leverages established, mature backends - particularly cloud-native backends.

On an unrelated note:
I noticed a roadmap item for "CMIS" in one of your docs. Brings back memories. We went deep on CMIS in one of our applications and the payback was not there. We did like that it brought in the renditions concept, which was powerful, but you already have that in your framework.

Again, let me know how I/we can help.

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

Hi Stephen,

Sorry, it has been a few weeks. Although a little slow, we have been making progress on this. What we've done so far is available for you to have a look at on wip branch issue-43 in the main spring-content repo and in spring-content-examples, our integration/examples repo. I've had an initial stab at a Java API. I thought we should iterate on this before starting to expose it at the REST API layer.

As a proof of concept, I have implemented the Java API in the SC Filesystem module and the implementation is specific to the ContentStore store interface atm. There are 2 locking modes; optimistic and pessimistic. Versioning is then built on top of the pessimistic locking mode.

Optimisitic Locking

Spring Data has always had an optimistic locking scheme based on an @Version annotation. This is intended to allow clients to update entities but prevents accidental overwrites. Usually used in a web/web service applications. If you are familiar with Documentum, it's the equivalent to s_sysobject's vstamp. I presume Alfresco has a similar concept. Therefore, as a first step, it seemed natural to us to have Spring Content support this too.

The way this works is given a JPA entity with an @Verison annotation, when a content operation is attempted on that entity, the content store will first attempt to take out a transactional database lock on the entity. This can only be obtained if the given entity is the latest (i.e. has a @Version value that matches that in the DB) proving that the client did, in fact, update the latest entity and/or content's. At the end of the transaction, modifying content operations will also increment the entity's @Version thus forcing other client's to refetch the entity and content and therefore ensure they worked with the latest.

You can see this behvaior in this block of tests.

Pessimistic Locking

Spring Data has never had a pessimistic locking mechanism to match the ones found in a CMS. JPA does have the concept of "extended transactions" where a transaction and therefore a lock can extend beyond a single "request" but these are not particularly cloud-native IMO. Extended transactions require that the entity manager be stored in the HttpSession that, in 12 factor apps, would, in turn, require something like Spring Session to share the session across all instances of the app. Whilst I thought this would probably work, it would add a hard dependency on Spring Session and therefore on a backing service like redis cache. Plus this would appear, to us, to be based on out-dated development patterns; i.e. using the HttpSession!

Instead, as a first cut, we decided to follow 12-Factor principles and push the lock state into the database. We created two new projects called Spring Versions Commons and Spring Versions JPA. Like Spring Data Commons and Spring Content Commons, the new Spring Versions Commons defines a Versioning API. Spring Versions JPA implements that versioning API with JPA. It's a super simple implementation atm. A locking service backed by a lock table. Each row in the table represents a "lock" on an entity and therefore on any associated content. As you would expect once a lock is obtained, no other client can obtain a lock and therefore cannot modify the entity (or its content). This would be exposed to the developer through a LockingAndVersioningRepository Spring Data Repository customization that looks something like the following:

@Configuration 
@EnableLockingAndVersioning
pulblic class AppConfiguration {
}

public interface SomeSpringDataRepository extends PagingAndSortingRepository<VersionedDoc, Long>, LockingAndVersioningRepository<VersionedDoc, Long> {
}

An entity would be updated to include these version-specific annotations:

@Entity
public class VersionedDocument {
  // all the usual annotated fields like @Id, @ContentId and @Version

  @LockOwner
  private String lockOwner;

  @AncestralRootId
  private Long ancestralRootId;

  @AncestorId
  private Long ancestorId;
}

The API for the LockingAndVersioningRepository this interface is as follows:

    /**
     * Locks the entity and returns the updated entity (@Version and @LockOwner) artributes updated, otherwise
     * returns null
     *
     * @param entity
     * @return
     */
    <S extends T> S lock(S entity);

    /**
     * Unlocks the entity and returns the updated entity (@Version and @LockOwner) artributes updated, otherwise
     * returns null
     *
     * @param entity
     * @return
     */
    <S extends T> S unlock(S entity);

    /**
     * Overridden implementation of save that enforces locking semantics
     *
     * @param entity
     * @param <S>
     * @return
     */
    <S extends T> S save(S entity);

    /**
     * Creates a new version based on the entity.  This new version becomes the latest version.   
     * 
     * @param entity the entity to base the new version on
     * @return the new version
     */
    <S extends T> S version(S entity);

    /**
     * Returns the latest version of all entities.  When using LockingAndVersioningRepository this method would
     * usually be preferred over CrudRepository's findAll that would find all versions of all entities.
     *
     * @return list of latest version entities
     */
    <S extends T> List<S> findAllVersionsLatest();

    /**
     * Returns a list of all versions of the given entity.
     *
     * @param entity
     * @return list of entity versions
     */
    <S extends T> List<S> findAllVersions(@Param("entity") S entity);
}

So, a client would use lock to create a pessimistic lock on an entity. Update attributes and content using regular Spring Data and Spring Content APIs. Use version to create a new version of the entity based on the edits they have made since locking. Finally, use unlock to remove the lock.

You can see this behvaior in this block of tests.

Thoughts on the REST API

Optimistic Locking

The main concern with optimistic locking is how to detect, over the wire, that the client is working with the latest version of the entity. For a variety of reasons we have always wanted to support mulitpart/mixed requests that would allow our controllers to accept a combined jpa entity/content payload. This would, again, be very handy for the optimistic locking as we could force the client to send us their copy of the entity and reject it if it isn't the latest. However, multipart/mixed requests don't appear to be supported by any of the current client technologies; angular, react, vue etc. In lieu of this, there are several options available when handling modifying operations (PUT, POST, DELETE);

  • take a version request parameter; i.e. PUT /mystore/12345?version=nn and trust that the client sends us the actual @Version value from their copy of the entity. Upon receiving a modifying request our controllers would then fetch the entity (identified in the URI) and reject invalid requests such as when the given version doesn't match.
  • include a version attribute in the multipart/form-data payload
  • include the entire serialized entity in a model attribute in the multipart/form-data payload

Pessimistic Locking

As for performing LOCK/UNLOCK over REST. We would look to follow the usual Spring Data/Spring Content model of "exporting" the Java API into the URI space of the JPA Entity. So LockingAndVersioningRepository.lock might be exported to the entity's locks collection; i.e. POST /mystore/12345/locks?version=nn, or alternatively to a logical lock property; i.e. PATCH /mystore/12345/lock?version=nn, or something similar. We think we just need to iron out the most RESTy way of doing this. The same concerns about the freshness of the client's copy of the entity exist for the LOCK operation to (as it is also a modifying operation).

OK. I'll stop there and let you guys digest this. All feedback, good and bad welcome. Thanks in advance!

Questions For You:

  • I apologize if you mentioned this already but what module(s) would you expect to support versioning? Guessing S3?

Questions in General:

  • Spring Content also supports content collections and multiple contents (content properties). For these content models should we lock the entity, or the property?
  • Currently, LockingAndVersioningRepository.lock returns null when the lock cannot be obtained. Should it throw an exception instead?

from spring-content.

snoop244 avatar snoop244 commented on May 23, 2024

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

I am actually on vacation myself and back next week so that works for me.

from spring-content.

kdavisk6 avatar kdavisk6 commented on May 23, 2024

With regards to Optimistic Concurrency, one technique we could use is Snapshot Isolation. This technique is typically used with Multi-Version Concurrency Control, but I don't think we need to go that far.

Reads, when using Snapshot Isolation, do not require locks and operate on the most recent version in the store at the time of the request.

When a write transaction is started, a copy of the current version is taken as a snapshot and saved with the transaction metadata. At the end of the transaction, the changes are applied to the snapshot and committed.

This approach removes the need to pass the version to the client and to lock on writes, allowing for multiple updates to the same entity concurrently. Since each update is performed on a snapshot, we can rely on Spring Data's Optimistic Concurrency support to catch conflicting updates. In those cases, we can return a 409 conflict on OptimisticConcurrencyExceptions. Throughput using Snapshot Isolation can be extremely high in situations where there is high volume but low contention, like Web Content use cases.

I'll see if I can dig up some use cases I have from some previous work I've done on the subject.

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

Hi @kdavisk6. Apologies for the extremely tardy response times.

If I understand it SNAPSHOT isolation is like a more efficient READ_COMMITED/REPEATABLE_READ that certainly sounds useful for some high throughput use cases.

If I also understand it correctly this is not something Spring Data and/or @Transactional support OOTB other than through IsolationMode=DEFAULT that uses whatever the database is set up to do.

The no @Version passing and non-locking behavior that you describe is just for entity metadata operations? Or do you see this model working for content operations also? Bearing in mind that content is not usually stored in the DB perhaps you are thinking that we can detect "content" conflicts by leaning on the database and its inherent ability to detect conflicts in entity content-related metadata fields like @ContentId (or perhaps an @ContentVersion).

By way of an example, I'm guessing something like:

"Txn A starts and a snapshot of the DB is taken with @ContentId=123. At the same time, Txn B starts and its snapshot is taken also with @ContentId=123. Txn A's client updates the entity's content that forces an update of @ContentId=456. Txn B's client also updates the entity's content and metadata to @ContentId=789. Txn A commits successfully so @ContentId=456 is persisted to the main DB. Txn B commits and the database detects a conflict because it's expecting @ContentId=123 but it is, in fact, @ContentId=456 and this causes a rollback."

(Or perhaps its even simpler than this and JPA detects a conflict of @Version numbers?)

Either way, with this optimistic model content is going to be written to storage before we really know if it is good, or not. So we would need to be careful to remove old content and content associated with an unsuccessfully commit. In this case, content '123' that got successfully overwritten by content '456' and content '789' that was unsuccessfully committed but written to the store non-the-less. Both need to be removed to keep the content store clean.

Is this the sort of thing that you had in mind?

from spring-content.

kdavisk6 avatar kdavisk6 commented on May 23, 2024

@paulcwarren

Not really, I was looking for a way to absolve the clients from having to "read then write" in order to make changes, due to the version parameter requirement. I am considering scenarios where batch updates need to be performed on a particular set of content metadata. Using Optimistic concurrency, a read must be performed to get the latest version before a write. Using Snapshot Isolation with Optimistic Concurrency, starting the transaction performs a read on the server. The changes are then applied to the snapshot taken, which includes the @Version value, then saved. The read still happens, but is the responsibility of the repository, not the caller. When performing batch updates, those reads may not add value or be needed.

I'll admit, I had not considered the Content itself, but primarily on the Metadata. I don't think this type of isolation is appropriate for Content. If the expectation is to update content and metadata in the same call, then we will not be able to use this technique.

Based on your feedback, my suggestion is to move forward with your original proposal, sticking with the models supported by Spring Data and consider these more exotic isolation modes when they come up.

Thoughts?

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

@snoop244 @kdavisk6. Apologies this took soooooo long but with release 0.5.0 we believe we have something consumable and before we go any further would welcome your feedback. The capabilities comes in two parts. Firstly, we extended JPA's existing @Version optimistic locking semantics to any content associated with Entities. You might not be too interested in this particular feature but it seemed like a logical first step for us and will mitigate the lost update problem for "content" for those use cases that don't require full locking and versioning. Secondly, we added the standard pessimistic locking (from the user perspective) semantics along with the ability to version an Entity and create a "version set". More information can be found here.

Please let me know if what is there today will satisfy your requirements, or not.

From our perspective we are planning at least the following additional capabilities (when requested by someone):

  • Extend locking and versioning support for JPA and Mongo modules
  • Extend locking and versioning semantics to content - so, for example, when an entity is locked - its content is also locked (S3 might place some sort of retention policy for example). Likewise, when an Entity is versioned, its content is also versioned.

Thoughts and feedback welcome.
Many thanks
_Paul

from spring-content.

kdavisk6 avatar kdavisk6 commented on May 23, 2024

Paul

Thank you for all this work. I’ll carve out some time between now and the new year to spend some quality time with this.

from spring-content.

paulcwarren avatar paulcwarren commented on May 23, 2024

@snoop244 @kdavisk6 I am going to close this issue as we now had basic support versioning (of metadata) for a version, or two. I know it isnt complete. We should capture version events and also version the content if and when the backing storage supports that; i.e. s3. Spring Content has aIways been community-drive so I propose that we do that work under different issues specific the storage. How does that sound?

Also, @kdavisk6 I was pondering your requirement for high throughput on the metadata) and I was wondering if you had ever considered spring data redis? We could also create a spring content redis too, if required but anyways I thought that I would mention it.

Anyways folks. Feel free to re-open this issue if you think we aren't done yet. Love to hear from you on progress etc so also feel free to send me an email, or whatever is easiest.

from spring-content.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.