Git Product home page Git Product logo

Comments (32)

richardkiene avatar richardkiene commented on August 10, 2024 1

It would probably be good to address metering and billing in this proposal. I believe that today, we are able to bill people just once for an object -- even if it has been snaplinked multiple times, such as when creating a versioning system as you describe in the RFD.

@jclulow For the purpose of this RFD, I think it's best that billing be out of scope. Billing will most certainly be undergoing some big changes in the near future. Thanks!

from rfd.

jclulow avatar jclulow commented on August 10, 2024

It would probably be good to address metering and billing in this proposal. I believe that today, we are able to bill people just once for an object -- even if it has been snaplinked multiple times, such as when creating a versioning system as you describe in the RFD.

When determining storage usage for billing purposes, I believe we deduplicate physically stored objects based on their object ID today, which has the effect of billing people only once for each concrete object stored. If we were to generate a new object ID for each snaplink, I think we'd need another way to correlate objects which use the same disk storage for metering purposes.

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@jclulow Thanks. I've added an Open Questions section with a note about this.

from rfd.

jclulow avatar jclulow commented on August 10, 2024

@jclulow For the purpose of this RFD, I think it's best that billing be out of scope. Billing will most certainly be undergoing some big changes in the near future. Thanks!

You're welcome!

To be clear, I'm merely suggesting that it be addressed in the document. Either the proposed solution will maintain the property that snaplinked objects are able to be associated and thus not double-billed, or it won't.

Moving it out of scope is fine, but it'd be best to explicitly record the decision to rule out deduplicated billing in the future if that's the case.

from rfd.

rjloura avatar rjloura commented on August 10, 2024

When we want to rebalance an object, each "link" to that object will be handled independently without need for the rebalancer to be concerned with the relationship. If you create a link to an object, then repair/rebalance that new object, or want to increase the number of copies of that object, you can do so independently of the original object.

I was thinking about the cases where you may separate the hardlink from the original during a rebalance. During a rebalance there is no guarantee that all objects from a given source shark are going to end up on the same destination shark. However, since snaplinks differ from Unix hardlinks in that writes to either "path" are not mirrored in the objects as they are in hardlinks, this will not be an issue during a rebalance where the hardlink is separated from it's source.

If we do execute a rebalance and the link is separated from its source we could inadvertently increase the object's redundancy. I don't see any negative impacts there aside from the aforementioned billing concern.

Overall looks good from a rebalance perspective, thanks!

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@rjloura thanks for your comments. I had thought about this (that during rebalance the objects might go elsewhere) but that seemed like it wasn't a problem since they'd have different objectIds and would no longer be required to be colocated. Thanks for confirming that!

from rfd.

askfongjojo avatar askfongjojo commented on August 10, 2024

What will we do with etag in the metadata? Currently the etags are identical for the snaplinked objects. There may be some usage of this information for the user to tell if these objects have the same contents. The user application can/should probably use content-md5 for this purpose but it's hard to tell if we'll break customer use cases with the change proposed here. It may be best that we keep the etag the same while the objectIds differ? (This leads to another question - is there any dependency on etag being the same as the objectId in manta? ... Correction: nvm this, the objectId and etag for an object are never the same.)

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@askfongjojo I'd assume that we can copy the ETag to the new object. If either of the two new objects are modified, they'll get a new ETag which seems fine since they're now independent objects. I'll add a note about this in the RFD and someone can correct me if I'm misunderstanding something here.

Thanks for pointing this out!

UPDATE: After some discussion with Kelly, we might have an issue here with ETags being different for buckets. I'll do some more looking into this and include notes about that in my update to the RFD.

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@askfongjojo Actually... I looked into this and when you do a snaplink you get a different ETag.

I uploaded a file, then did a snaplink here are the two links in the moray database:

moray=# select objectid,_etag From manta where dirname = '/96c4ecc0-89aa-4e15-958f-3f50d5e2a68b/public';
               objectid               |  _etag
--------------------------------------+----------
 09241c22-8da6-c142-d515-e1bb0aab703f | 52D3803A
 09241c22-8da6-c142-d515-e1bb0aab703f | A7100205
(2 rows)

moray=#

And talking to Kelly, with buckets we're using the objectId for the ETag. So this will actually work the same way there. We'll get a new objectId and therefore a new ETag if we want to implement SnapLinks there. I will add a section about this to the RFD now.

UPDATE: ha! Now I see that with minfo you don't get the _etag that's stored in the database. That ETag is the objectId. That's unfortunate.

from rfd.

jclulow avatar jclulow commented on August 10, 2024

UPDATE: ha! Now I see that with minfo you don't get the _etag that's stored in the database. That ETag is the objectId. That's unfortunate.

Right, the Moray record etag is a separate thing. It would change if you updated the attributes on the object; e.g., via mchattr or possibly even mchmod.

I don't think it's unfortunate per se. The user-visible ETag is intentionally related to the backing store object, and is one way a user can tell that two objects refer not just to the same contents but the same instance of uploading those contents -- even across snaplinks.

I think it'd be a breaking change not to preserve this behaviour, in the same vein as dropping deduplicated metering support.

from rfd.

askfongjojo avatar askfongjojo commented on August 10, 2024

Yes, I came to the conclusion from the minfo output. I should have made it clear in my original comment.

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

I added a section to the document about etags.

In case it was missed, I had also previously added a section about the billing concerns.

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

Thanks for your thoughts @rmustacc !

I will update the RFD to make notes about these points but I have a few questions for clarification:

It feels hard to imagine we'll never want to know which objects were linked to one another.

What use-cases are you thinking of here?

We do have a "createdFrom" on existing snaplinks. But that includes the original manta path rather than an objectId which seems less than ideal. I'll make a note about this.

However, changing the etag for existing snaplinks because they've been split into new objects does feel kind of problematic. It may be that the set of those impacted is such that they can deal with that, but it would break anyone relying on the traditional HTTP etag semantics.

Do you have any specific use-cases in mind that rely on the etags of SnapLinks currently?

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

Do those help clarify things?

I'm not sure. You're suggesting the following sequence of operations?

  • mput an object ~~/public/object1
  • mln that object to ~~/public/object2
  • read the etag from ~~/public/object1
  • use the etag from ~~/public/object1 to do a conditional put on ~~/public/object2?

Which currently would work (and eliminate the symlink) because object1 and object2 have the same etag. Whereas in the proposed changes, they'd have different etags so if you get one and use that etag as a condition to the other, it won't work. You'd have to get the object you're going to put instead.

Is that correct?

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@rmustacc ahhh. I see now. Thanks!

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@rmustacc I've attempted to distill the points you've made and include them in the RFD:

please let me know if any of these misrepresent your points (and how) so I can fix them.

Thanks!

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

FWIW, I wasn't trying to advocate for having that in the public API.

Ah, I didn't mean to make it sound that way either. I was intending to say this should be a property of the objects in moray / the metadata tier. Not that they should be exposed to customers. I'll try to make this clearer.

UPDATE: I've made a change here now. Hopefully that helps.

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

richardkiene avatar richardkiene commented on August 10, 2024

No, I'm not trying to talk about cross-object stuff at all. I was trying
to highlight the following:

  • mput ~~/public/object1
  • mln to ~~/public/object2
  • read ~~/public/object2
  • perform upgrade
  • read ~~/public/object2

If my understanding of what's written down at the moment is correct, the
etag of object2 will change across the upgrade because the association
is being broken and it's being made as a new object with a new uuid. In
this case I'm not trying to refer to object1 at all.

@rmustacc do you see this as a blocking issue or just an issue that should be called out if we choose to head this direction?

Personally I think the only down sides are proxy and/or client cache misses (or effectively that), and a small amount of confusion for humans that have their snap linked objects migrated.

from rfd.

rmustacc avatar rmustacc commented on August 10, 2024

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

That said, it is a breaking change to the public API for an end user.

ETags and their specific behavior here are not (afaict) documented as being part of the public API, so maybe that changes the calculus a bit here?

from rfd.

richardkiene avatar richardkiene commented on August 10, 2024

On 6/13/19 9:59 , Richard Kiene wrote: > No, I'm not trying to talk about cross-object stuff at all. I was trying > to highlight the following: > > * mput ~~/public/object1 > * mln to ~~/public/object2 > * read ~~/public/object2 > * perform upgrade > * read ~~/public/object2 > > If my understanding of what's written down at the moment is correct, the > etag of object2 will change across the upgrade because the association > is being broken and it's being made as a new object with a new uuid. In > this case I'm not trying to refer to object1 at all. @rmustacc do you see this as a blocking issue or just an issue that should be called out if we choose to head this direction? Personally I think the only down sides are proxy and/or client cache misses (or effectively that), and a small amount of confusion for humans that have their snap linked objects migrated.
Honestly, I don't know if these should be blocking or not. I can easily go down both sides of this particular point. If the etag of any existing object is changed without user action being taken, that violates the intent of an etag. If anyone had built a database and included what they expected the etag to be for doing later conditional puts, we'd be breaking that. That said, as you point out, the scope of impact is also probably pretty limited. I originally raised this as it wasn't clear if it was intentional or known that the etag would be changing for existing objects across the upgrade procedure. Same with the other bits. The intent was to discuss it and figure it out. That said, it is a breaking change to the public API for an end user. So it probably needs to be discussed and treated like that. We've traditionally avoiding doing that in Manta.

Cool, that's helpful. Thanks @rmustacc !

from rfd.

jclulow avatar jclulow commented on August 10, 2024

That said, it is a breaking change to the public API for an end user.

ETags and their specific behavior here are not (afaict) documented as being part of the public API, so maybe that changes the calculus a bit here?

Etags are mentioned several times in the public documentation; i.e., at https://apidocs.joyent.com/manta/api.html -- though our documentation could always stand to improve!

In particular, though it doesn't directly mention the word Etag, note that the phrase under PutObject, "The service is able to provide test/set semantics for you if you use HTTP conditional request semantics", refers to the part of the HTTP specification (conditional requests) which do deal with Etags. They're also listed as metadata you get back from ListDirectory, and the Etag header is present in the example responses.

In general, over the life time of the Manta API, we've tried very hard not to make breaking changes to the user-visible functionality of the API. The emphasis has been on interface stability, even in the face of perceived opportunities created by, at times, underdeveloped documentation.

I can't actually think of a time when we've made a change to the observable behaviour like this. If we're going to do it now, we'll want to figure out what it means for client software, for the documentation, communication of breakage, the roll-out plan, etc.

from rfd.

qdzlug avatar qdzlug commented on August 10, 2024

Wanted to step back and re-state the problem to see if I fully understand it.

A customer using snaplinks with the current snaplink logic is upgraded to use the proposed snaplink v2 logic.

If this customer is storing the ETag associated with that snaplink and using it as part of their application/process, this will be broken by the upgrade to snaplinks v2, because the ETag would change.

If I'm correct, to get into this situation requires that a user:

  1. Is using snaplinks.
  2. Is also using the ETag in some way.

I understand the desire to keep the public API stable and realize that any potential breaking change like this is going to be difficult to manage and implement. However, I think it's a fair tradeoff for the benefits that we would gain from RFD-171 being implemented, especially given the use case of our anchor tenant.

from rfd.

joshwilsdon avatar joshwilsdon commented on August 10, 2024

@qdzlug That matches my understanding of the discussion here at least. Though to be more specific on your point 2. Is also using the ETag in some way., it's actually only a very specific use that I believe is only hypothetical at this point (someone feel free to correct me if they know of an existing user doing this) where the user has stored the ETags for their existing SnapLinked objects in an external system for later use.

My understanding of the general use case of doing a conditional put involves:

  1. get an object (must be a SnapLink to be relevant here)
  2. do something with it
  3. put the object and use the ETag to ensure that the object didn't change while you were doing something, and on ETag failure (because something else changed the object in the meantime) retry from 1

this use-case should not be broken as far as I can tell. At most it will experience a larger number of retries than usual if the step 1 and 3 are on different sides of the upgrade.

I'm also not clear on what use-cases require making SnapLinks and doing a PUT over the link target making it no longer a SnapLink. Does anyone have examples of things that do this now?

from rfd.

qdzlug avatar qdzlug commented on August 10, 2024

Closing this issue out; let's pick up discussions regarding 171 on the manta-dev list, please.

Jay

from rfd.

bcantrill avatar bcantrill commented on August 10, 2024

As long as the RFD is in anything less than a published state (and even then, honestly), the right place for the discussion is this issue. We can agree that discussion has reasonably converged (or has diverged in a way that is static), but simply closing out the discussion here by closing this issue very much runs counter to the intent of an RFD.

from rfd.

trentm avatar trentm commented on August 10, 2024

RFD 171 is abandoned.

from rfd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.