Git Product home page Git Product logo

Comments (14)

ckhsponge avatar ckhsponge commented on June 20, 2024 1

@thomaswitt I like your motivation! I am also a fan of Opensearch.

If I understand correctly, using a GSI with all attributes projected could be wasting space but it does keep the items distributed as desired since it maintains a complete copy.

I do think you could accomplish what you need to with an extension on top of Dynamoid e.g. shared before-create actions and custom finders. Reworking the innards of Dynamoid to handle that would probably be possible as well although would be more involved.

from dynamoid.

bholzer avatar bholzer commented on June 20, 2024 1

I want to use Dynamo the way it's designed and intended to be used, and I would love a Ruby library that makes these patterns easier to use. Unfortunately, Dynamoid is not the answer today. @thomaswitt please let me know if you make any efforts in your own adapter/gem. I would be happy to contribute

from dynamoid.

andrykonchin avatar andrykonchin commented on June 20, 2024

Hi,

In short - Dynamoid doesn't support explicitly anything related to the Single-Table design. Why? I suppose because its goal is to implement the ActiveRecord pattern.

It seems to me that the classic ActiveRecord pattern contradicts with ideas of the Single-Table design with kind of schemeless/multi-schema items. But I can easily imagine that Single-Table design's approach may be implemented on top of ORM like Dynamoid. Or on top of any other DynamoDB client.

That is I am not against supporting Single-Table design in Dynamoid. But I see benefits in separating new features and existing ActiveRecord-like approach. How strong should be this separation? I don't know right now. It depends probably on how natural specific features look from the point of view of the ActiveRecord approach.

Regarding the proposed feature with a range prefix. I am not familiar with the concrete patterns of the Single-Table design and don't know whether such range prefixes is a common/well known pattern. Could you please point at resources that describe such patterns?

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

I totally second this idea of @bmalinconico. The prefix in range keys to differentiate between different types of data is THE access pattern in DynamoDB. Especially also when combined with a prefix and a date like "FUEL_PRICE#2022-09-19" … I'd really appreciate if Dynamoid would support this out of the box.

The advantages are obvious, especially in terms of pricing/capacity. Having one big table instead of lots of small table with their own throttle settings is hindering application performance and simply unnecessarily costs money…

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

Regarding the proposed feature with a range prefix. I am not familiar with the concrete patterns of the Single-Table design and don't know whether such range prefixes is a common/well known pattern. Could you please point at resources that describe such patterns?

@andrykonchin - Here's the official DynamoDB doc: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

from dynamoid.

andrykonchin avatar andrykonchin commented on June 20, 2024

@thomaswitt Thank you for the link. Probably I've got what you are talking about.

@bmalinconico Yep, it seems it's a common approach to have a synthetic structured sort key. And it will be useful to support some predefined schemas.

On the other hand such options like prefix_on_persistance and fixed_value can be emulated with a handwritten before_save hooks. So it would be a tiny enhancement.

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@andrykonchin Yes, basically you could already write this with the current version, but I'd say convention over configuration is a very old tried-and-true mantra of Rails.

Apart from that, the way @bmalinconico described it, that's the way DynamoDB is originally intended to be used. I ran several huge DynamoDB based applications with tons of data and gazillions of rows. That's the only way to keep it scaling, and basically every AWS engineer will agree at an AWS summit.

The way currently Dynamoids default is designed with multi table has a lots of drawbacks when you put it into production. That's why I think the project should offer more defaults in the way of @bmalinconico 's idea.

The way how STI is currently implemented with the Type field is basically more band-aid. It should be a prefix in the range key. That would make also lots of other stuff easy.

When I started using Dynamoid I ran e.g. into this problem: #501. It'd be way more easier if in that example Company and Report would have the same primary key and then you could filter via Range Key what you really want to get, either the company data or the report data.

I started using Dymanoid because it was very convenient and used (despite my better knowledge) the multi table approach in the beginning. I then later had to painfully rewrite the whole app to use STI in a single table, with all model classes inheriting from a Bass Class which then defines table. Still there's a lot of code in my application with like Model.where(id: id_to_search_for, metadata.begins_with: 'REPORT#'), etc. With smart range keys like COMPANY# and COMPANY#REPORT you can easily get a company and all its all reports with a single query.

All in all - just my two cents, I'd really appreciate if the single table approach would be promoted more as at least one of two default approaches and easily supported within the software, without having to write hooks, etc.

from dynamoid.

andrykonchin avatar andrykonchin commented on June 20, 2024

Could you elaborate a bit more on how the prefix is supposed to work? What value it should be added to? At what moment - creation or every model updating?

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@andrykonchin A full design I would need to think about longer, but how I would approach it if you configure Dynamoid that way (let's say via config.single_table_design = true):

  • Dynamoid now requires in this mode that a range key is present - I would go for defaults like key: :id and range: :metadata as this is the default used by AWS e.g. in the NoSQL Workbench Modeler. You of course can still overwrite it if you want other key and range key names.
  • :id is generated by Dynamoid with a UUID when not specified out but can be overwritten (id: 'Berlin')
  • :metadata is predefined as created_at and will be automatically expanded to "#{type.upcase}##{created_at}. So the prefix_on_persistance is by default the model name (type), but can be overwritten in a way @bmalinconico proposed (or alternatively fixed as he proposed as well) like range :metadata, prefix_on_persistance: "CUSTOMPREFIX#". Also when defining your object, there should be a reserved keyword like range_id which is by default set to created_at, but you can overwrite that in case you want to have a range key like "USER#<customData_from_range_id>". You could then define an own range ID like an email or whatever, or change the range key from created_at to updated_at - or even dynamoically expanded and chained like COMMENT#[email protected]#2022-03-12T00:22:33.144Z' when you set the range key to "{user_id}#β€˜{created_at}", etc.
  • The table definition (name, capacity_mode) should be defined in a bass class like DynamoidBase and all models should inherit from this base class by convention (Employee < DynamoidBase)
  • When I do a where search, I can either just look for the id and get multiple results or with a helper function look directly for key plus rangekey, like Comment.find('excellent-post-1234', '[email protected]') which would expand to Comment.where(id: 'excellent-post-1234', 'metadata': 'COMMENT#[email protected]').
  • I would also potentially include intuitive helpers when looking for range keys with time series data etc. for begins_with, gt, etc, dor example (not yet a well thought out API, just an idea):
    • Document.find('docset1234', '2022-03-12T00:22:33.144Z') -> Document.where(id: 'docset1234').where(metadata: '2022-03-12T00:22:33.144Z')
    • Document.find('docset1234', '2022-03-*') -> Document.where(id: 'docset1234').where(metadata.begins_with: '2022-03-')
    • Address.find('Berlin', '10115', '10178') -> Address.where(id: 'Berlin').where(metadata.between: [10115, 10178])
    • Address.find('Berlin', '>10115') -> Address.where(id: 'Berlin').where(metadata.gt: 10115)
    • Address.find('Berlin', '>=10115') -> Address.where(id: 'Berlin').where(metadata.gte: 10115)
    • Address.find('Berlin', '<10115') -> Address.where(id: 'Berlin').where(metadata.lt: 10115)
    • Address.find('Berlin', '<=10115') -> Address.where(id: 'Berlin').where(metadata.lte: 10115)

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@andrykonchin Hey Andrii, just checkin in whether you had time to think about those suggestions …

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@andrykonchin Just a little ping. Have you given those ideas some thought?

from dynamoid.

ckhsponge avatar ckhsponge commented on June 20, 2024

I use STI with Dynamoid. My approach is to not have a range key and use shared GSI columns with redundant data. GSI columns are set with before actions and can include values from multiple columns as needed. Much of this logic can be abstracted into a parent class so using it in the models isn't excessive. I have 5 GSIs that are string-string and 5 that are are string-number. If I needed pizzas created by user 1 sorted by timestamp I would use a GSI e.g. Pizza#User#1,2024-02-01. To further filter to store 2 you could have a GSI with Pizza#Store#2#User#1,2024-02-01.

HK Type Name Code GSI_HK1 GSI_RK1
a#Pizza#pizza Pizza Pizza 2024-02-01
a#PizzaTopping#onions PizzaTopping Onions onions PizzaTopping onions
a#PizzaTopping#pepperoni PizzaTopping Pepperoni pepperoni PizzaTopping pepperoni

If you don't like the redundant data and want to keep using range keys I suppose you could set the range key with a before action. You'd need to create new or override existing finders, however.

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@ckhsponge ckhsponge I understand your approach, but the range key was invented for a reason (also in terms of data distribution). Especially also as the idea in dynamo is that you don't delete by default but rather insert new data to have a built in history, the range key comes very handy. In that sense Dynamoid is written in a way that tries to emulate ActiveRecord, but does not embrace the ideas of DynamoDB.

Unfortunately @andrykonchin doesn't seem to be open/interested to build another way which I described above which is more built like DynamoDB wants it, so I am considering to write an own lightweight Gem adapter to embrace these concepts.

Especially as it makes sense to combine this with OpenSearch/ElasticSearch, which is also currently PITA as most gems (like SearchKick) won't work out of the box. DynamoDB + Opensearch is a very powerful combination - and it deserves to be supported for Rails out of the box the way it's meant to be.

from dynamoid.

thomaswitt avatar thomaswitt commented on June 20, 2024

@bholzer I agree. Would you be open to throw some ideas together and do an outline of what should be in scope for a ruby lib?

from dynamoid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.