<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RFD 171 Discussion!!! 🎉,about tritondatacenter/rfd

Comments (32)

richardkiene commented on August 10, 2024 1

It would probably be good to address metering and billing in this proposal. I believe that today, we are able to bill people just once for an object -- even if it has been snaplinked multiple times, such as when creating a versioning system as you describe in the RFD.

@jclulow For the purpose of this RFD, I think it's best that billing be out of scope. Billing will most certainly be undergoing some big changes in the near future. Thanks!

from rfd.

jclulow commented on August 10, 2024

It would probably be good to address metering and billing in this proposal. I believe that today, we are able to bill people just once for an object -- even if it has been snaplinked multiple times, such as when creating a versioning system as you describe in the RFD.

When determining storage usage for billing purposes, I believe we deduplicate physically stored objects based on their object ID today, which has the effect of billing people only once for each concrete object stored. If we were to generate a new object ID for each snaplink, I think we'd need another way to correlate objects which use the same disk storage for metering purposes.

from rfd.

joshwilsdon commented on August 10, 2024

@jclulow Thanks. I've added an Open Questions section with a note about this.

from rfd.

jclulow commented on August 10, 2024

@jclulow For the purpose of this RFD, I think it's best that billing be out of scope. Billing will most certainly be undergoing some big changes in the near future. Thanks!

You're welcome!

To be clear, I'm merely suggesting that it be addressed in the document. Either the proposed solution will maintain the property that snaplinked objects are able to be associated and thus not double-billed, or it won't.

Moving it out of scope is fine, but it'd be best to explicitly record the decision to rule out deduplicated billing in the future if that's the case.

from rfd.

rjloura commented on August 10, 2024

When we want to rebalance an object, each "link" to that object will be handled independently without need for the rebalancer to be concerned with the relationship. If you create a link to an object, then repair/rebalance that new object, or want to increase the number of copies of that object, you can do so independently of the original object.

I was thinking about the cases where you may separate the hardlink from the original during a rebalance. During a rebalance there is no guarantee that all objects from a given source shark are going to end up on the same destination shark. However, since snaplinks differ from Unix hardlinks in that writes to either "path" are not mirrored in the objects as they are in hardlinks, this will not be an issue during a rebalance where the hardlink is separated from it's source.

If we do execute a rebalance and the link is separated from its source we could inadvertently increase the object's redundancy. I don't see any negative impacts there aside from the aforementioned billing concern.

Overall looks good from a rebalance perspective, thanks!

from rfd.

joshwilsdon commented on August 10, 2024

@rjloura thanks for your comments. I had thought about this (that during rebalance the objects might go elsewhere) but that seemed like it wasn't a problem since they'd have different objectIds and would no longer be required to be colocated. Thanks for confirming that!

from rfd.

askfongjojo commented on August 10, 2024

What will we do with etag in the metadata? Currently the etags are identical for the snaplinked objects. There may be some usage of this information for the user to tell if these objects have the same contents. The user application can/should probably use content-md5 for this purpose but it's hard to tell if we'll break customer use cases with the change proposed here. It may be best that we keep the etag the same while the objectIds differ? (This leads to another question - is there any dependency on etag being the same as the objectId in manta? ... Correction: nvm this, the objectId and etag for an object are never the same.)

from rfd.

joshwilsdon commented on August 10, 2024

@askfongjojo I'd assume that we can copy the ETag to the new object. If either of the two new objects are modified, they'll get a new ETag which seems fine since they're now independent objects. I'll add a note about this in the RFD and someone can correct me if I'm misunderstanding something here.

Thanks for pointing this out!

UPDATE: After some discussion with Kelly, we might have an issue here with ETags being different for buckets. I'll do some more looking into this and include notes about that in my update to the RFD.

from rfd.

joshwilsdon commented on August 10, 2024

@askfongjojo Actually... I looked into this and when you do a snaplink you get a different ETag.

I uploaded a file, then did a snaplink here are the two links in the moray database:

moray=# select objectid,_etag From manta where dirname = '/96c4ecc0-89aa-4e15-958f-3f50d5e2a68b/public';
               objectid               |  _etag
--------------------------------------+----------
 09241c22-8da6-c142-d515-e1bb0aab703f | 52D3803A
 09241c22-8da6-c142-d515-e1bb0aab703f | A7100205
(2 rows)

moray=#

And talking to Kelly, with buckets we're using the objectId for the ETag. So this will actually work the same way there. We'll get a new objectId and therefore a new ETag if we want to implement SnapLinks there. I will add a section about this to the RFD now.

UPDATE: ha! Now I see that with minfo you don't get the _etag that's stored in the database. That ETag is the objectId. That's unfortunate.

from rfd.

jclulow commented on August 10, 2024

UPDATE: ha! Now I see that with minfo you don't get the _etag that's stored in the database. That ETag is the objectId. That's unfortunate.

Right, the Moray record etag is a separate thing. It would change if you updated the attributes on the object; e.g., via mchattr or possibly even mchmod.

I don't think it's unfortunate per se. The user-visible ETag is intentionally related to the backing store object, and is one way a user can tell that two objects refer not just to the same contents but the same instance of uploading those contents -- even across snaplinks.

I think it'd be a breaking change not to preserve this behaviour, in the same vein as dropping deduplicated metering support.

from rfd.

askfongjojo commented on August 10, 2024

Yes, I came to the conclusion from the minfo output. I should have made it clear in my original comment.

from rfd.

joshwilsdon commented on August 10, 2024

I added a section to the document about etags.

In case it was missed, I had also previously added a section about the billing concerns.

from rfd.

rmustacc commented on August 10, 2024

Thanks for putting this together @jwilsdon. Using hardlinks on the storage systems and a separate object ID in the metadata tier is an interesting idea here. I have a couple of additional thoughts here that are things that might be worth thinking through. Some of these may sound like a variant on the wandering link problem, which I think we may have in a slightly different form. 1) While @rjloura points out some of the effects of separating the hardlinks during a rebalance operation, there is one that we haven't talked about so far. The actual physical used storage of splitting a hardlink will (assuming no changes to durability) double as we've taken two logical objects which were backed by the same storage on disk and now they duplicate that. When that storage is small, then that's not that big a deal. However, if this was used for larger objects, the actual storage available could start to chip away and cause operators who thought their available storage is much less than we expect. This makes me think that just always treating the two as independent objects may cause usage spikes that we may not want, especially as it can be unpredictable when that occurs. While unlikely, in a relatively full system without sufficient excess capacity, it may not be possible to actually rebalance large objects which are hardlinks that are being split up. I know it's the case that it may just be the case that this could happen all the same without hardlinks, but it seems like that's the kind of thing we'd want to help make clear to operators as getting new hardware always has a long lead time. 2) It feels hard to imagine we'll never want to know which objects were linked to one another. So even if that's not in metadata in its current form, I'm not sure whether or not we want to throw that information away forever. Maybe there's some compact way we can record that? If we need to manually find out what things are hardlinks, there is no fast way of doing that on the file system. The file system doesn't usually keep track of what names are shared as hard links. This means that to determine what's hard linked, we'll need to iterate over every file and construct a mapping from inode to the set paths that represent that. While it's true that this is spread out across multiple shards today, making this query on 7200 RPM disks of the storage tier and going through every object on some of our larger systems is going to be equally painful in the same way that it is today, especially as we'll have to manually perform the join and the capacity and number of objects per compute node is only increasing with every generation of hardware. I think we should probably think carefully about whether or not we'll ever need this in the future. My fear, perhaps unfounded, is that we're foreclosing on ever being able to use this information now. Given case 1, it seems like knowing that we're splitting or being able to keep things around here could possibly be important. Maybe it's worth talking through if we did need this information, how we might do it in a way that's practical and scalable or explicitly saying that we're proposing to foreclose on it and its use here? 3) What happens in the case of 'full storage nodes'? Today, a storage node being full technically doesn't impact the snaplink or its creation. Something we'll want to think through is what happens when the storage node you're on is considered full from a storage capacity perspective? One option is that we could say, OK, we'll let's just ignore this and create the link anyways as the amount of additional space used won't be very much and if we're taking a 97% limit on a large pool with many TiB of data, then the cost of this is noise. Of course, it's non-zero so if we end up in a situation where we start doing lots and lots of snaplinks, we may hit a scalability point where we're further pushing the pool beyond where we want and could possibly hit a point where if we push it very, very hard, that we run out of space to create an additional hardlink. It's probably the case that this cost is less than that of a link in the metadata tier and perhaps unlikely to occur in practice, but it does represent a point where we can hit a scaling limit that isn't horizontally scalable. 4) Regarding how to upgrade into this world, I'd probably think about treating this a deferred action that an operator should explicitly opt into and not have it done without their knowledge. If we could arrange it without requiring downtime for specific operations, that'd be useful. However, changing the etag for existing snaplinks because they've been split into new objects does feel kind of problematic. It may be that the set of those impacted is such that they can deal with that, but it would break anyone relying on the traditional HTTP etag semantics. On the flip side, one could argue that all objects have, in some way changed, but from the perspective of someone using the system today they don't know or care per se, it'd just look like my etag got changed out from under me when I took no action and that'd leave me very confused. For what it's worth, I consider the upgrade case a different one then what was brought up Hopefully all this is useful.

from rfd.

joshwilsdon commented on August 10, 2024

Thanks for your thoughts @rmustacc !

I will update the RFD to make notes about these points but I have a few questions for clarification:

It feels hard to imagine we'll never want to know which objects were linked to one another.

What use-cases are you thinking of here?

We do have a "createdFrom" on existing snaplinks. But that includes the original manta path rather than an objectId which seems less than ideal. I'll make a note about this.

However, changing the etag for existing snaplinks because they've been split into new objects does feel kind of problematic. It may be that the set of those impacted is such that they can deal with that, but it would break anyone relying on the traditional HTTP etag semantics.

Do you have any specific use-cases in mind that rely on the etags of SnapLinks currently?

from rfd.

rmustacc commented on August 10, 2024

On 6/12/19 14:52 , Josh Wilsdon wrote: Thanks for your thoughts @rmustacc ! I will update the RFD to make notes about these points but I have a few questions for clarification:

OK. Feel free to also reply to them here before you update the RFD. I imagine there's a lot of different ways we can approach these.

> It feels hard to imagine we'll never want to know which objects were linked to one another. What use-cases are you thinking of here? We do have a "createdFrom" on existing snaplinks. But that includes the original manta path rather than an objectId which seems less than ideal. I'll make a note about this.

I was referring to cases that I described earlier in the discussion. Mostly about us caring about it for internal purposes so we can properly deal with internal issues around rebalancing, etc.

> However, changing the etag for existing snaplinks because they've been split into new objects does feel kind of problematic. It may be that the set of those impacted is such that they can deal with that, but it would break anyone relying on the traditional HTTP etag semantics. Do you have any specific use-cases in mind that rely on the etags of SnapLinks currently?

All of the conditional puts to the snaplink based path would no longer necessarily be correct as does anyone else that relies on the etag. Once I put an object or mln it, when I do a GET, there's a normal, valid etag that I can then use for conditional PUTs, CDNs, etc. For a Manta consumer, whether the object was put with a PUT, MPU, or snaplink doesn't really matter when they're doing a GET, it's something that has an etag and in general, if the underlying object hasn't changed, the etag shouldn't change per HTTP semantics. Do those help clarify things?

from rfd.

joshwilsdon commented on August 10, 2024

Do those help clarify things?

I'm not sure. You're suggesting the following sequence of operations?

mput an object ~~/public/object1
mln that object to ~~/public/object2
read the etag from ~~/public/object1
use the etag from ~~/public/object1 to do a conditional put on ~~/public/object2?

Which currently would work (and eliminate the symlink) because object1 and object2 have the same etag. Whereas in the proposed changes, they'd have different etags so if you get one and use that etag as a condition to the other, it won't work. You'd have to get the object you're going to put instead.

Is that correct?

from rfd.

rmustacc commented on August 10, 2024

On 6/12/19 15:15 , Josh Wilsdon wrote: > Do those help clarify things? I'm not sure. You're suggesting the following sequence of operations? * mput an object ~~/public/object1 * mln that object to ~~/public/object2 * read the etag from ~~/public/object1 * use the etag from ~~/public/object1 to do a conditional put on ~~/public/object2? Which currently would work (and eliminate the symlink) because object1 and object2 have the same etag. Whereas in the proposed changes, they'd have different etags so if you get one and use that etag as a condition to the other, it won't work. You'd have to get the object you're going to put instead.

No, I'm not trying to talk about cross-object stuff at all. I was trying to highlight the following: * mput ~~/public/object1 * mln to ~~/public/object2 * read ~~/public/object2 * perform upgrade * read ~~/public/object2 If my understanding of what's written down at the moment is correct, the etag of object2 will change across the upgrade because the association is being broken and it's being made as a new object with a new uuid. In this case I'm not trying to refer to object1 at all.

from rfd.

joshwilsdon commented on August 10, 2024

@rmustacc ahhh. I see now. Thanks!

from rfd.

joshwilsdon commented on August 10, 2024

@rmustacc I've attempted to distill the points you've made and include them in the RFD:

I added a Capacity Questions section, which includes:
- Concerns about additional capacity being used due to rebalancing of "split" SnapLinks
- Problems when trying to create SnapLinks on full storage zones
I added a section about the additional concerns you had about SnapLinks and ETags across upgrade
I added a section about Identifying SnapLinks

please let me know if any of these misrepresent your points (and how) so I can fix them.

Thanks!

from rfd.

rmustacc commented on August 10, 2024

On 6/12/19 16:50 , Josh Wilsdon wrote: * I added a section about [Identifying SnapLinks](https://github.com/joyent/rfd/blob/master/rfd/0171/README.md#identifying-links)

FWIW, I wasn't trying to advocate for having that in the public API. I was trying to point out that there might be cases where we will want to know what things are actually hardlinked for various internal reasons as part of the implementation and having a mapping of what's hardlinked together on a given system may be important the same way that having to solve the 'walking links' problem was important for GC. It's hard to predict the future and know how important it is or not, the main thing is that we should probably figure out whether or not we want to foreclose on having any internal information about these.

from rfd.

joshwilsdon commented on August 10, 2024

FWIW, I wasn't trying to advocate for having that in the public API.

Ah, I didn't mean to make it sound that way either. I was intending to say this should be a property of the objects in moray / the metadata tier. Not that they should be exposed to customers. I'll try to make this clearer.

UPDATE: I've made a change here now. Hopefully that helps.

from rfd.

rmustacc commented on August 10, 2024

On 6/12/19 17:25 , Josh Wilsdon wrote: > FWIW, I wasn't trying to advocate for having that in the public API. Ah, I didn't mean to make it sound that way either. I was intending to say this should be a property of the objects in moray / the metadata tier. Not that they should be exposed to customers. I'll try to make this clearer.

I don't even know if they should live there either, for what it's worth. I was hoping you or others would chime in on the discussion thread and share their thoughts about keeping that info around or not. I just think it's something we should discuss as to whether it's worth keeping or not and if we want to keep that information somewhere, where should we and how should we so we can still preserve the other properties of the system that were problems as you pointed out in the RFD.

from rfd.

richardkiene commented on August 10, 2024

No, I'm not trying to talk about cross-object stuff at all. I was trying
to highlight the following:

mput ~~/public/object1

mln to ~~/public/object2

read ~~/public/object2

perform upgrade

read ~~/public/object2

If my understanding of what's written down at the moment is correct, the
etag of object2 will change across the upgrade because the association
is being broken and it's being made as a new object with a new uuid. In
this case I'm not trying to refer to object1 at all.

@rmustacc do you see this as a blocking issue or just an issue that should be called out if we choose to head this direction?

Personally I think the only down sides are proxy and/or client cache misses (or effectively that), and a small amount of confusion for humans that have their snap linked objects migrated.

from rfd.

rmustacc commented on August 10, 2024

On 6/13/19 9:59 , Richard Kiene wrote: > No, I'm not trying to talk about cross-object stuff at all. I was trying > to highlight the following: > > * mput ~~/public/object1 > * mln to ~~/public/object2 > * read ~~/public/object2 > * perform upgrade > * read ~~/public/object2 > > If my understanding of what's written down at the moment is correct, the > etag of object2 will change across the upgrade because the association > is being broken and it's being made as a new object with a new uuid. In > this case I'm not trying to refer to object1 at all. @rmustacc do you see this as a blocking issue or just an issue that should be called out if we choose to head this direction? Personally I think the only down sides are proxy and/or client cache misses (or effectively that), and a small amount of confusion for humans that have their snap linked objects migrated.

Honestly, I don't know if these should be blocking or not. I can easily go down both sides of this particular point. If the etag of any existing object is changed without user action being taken, that violates the intent of an etag. If anyone had built a database and included what they expected the etag to be for doing later conditional puts, we'd be breaking that. That said, as you point out, the scope of impact is also probably pretty limited. I originally raised this as it wasn't clear if it was intentional or known that the etag would be changing for existing objects across the upgrade procedure. Same with the other bits. The intent was to discuss it and figure it out. That said, it is a breaking change to the public API for an end user. So it probably needs to be discussed and treated like that. We've traditionally avoiding doing that in Manta.

from rfd.

joshwilsdon commented on August 10, 2024

That said, it is a breaking change to the public API for an end user.

ETags and their specific behavior here are not (afaict) documented as being part of the public API, so maybe that changes the calculus a bit here?

from rfd.

richardkiene commented on August 10, 2024

On 6/13/19 9:59 , Richard Kiene wrote: > No, I'm not trying to talk about cross-object stuff at all. I was trying > to highlight the following: > > * mput ~~/public/object1 > * mln to ~~/public/object2 > * read ~~/public/object2 > * perform upgrade > * read ~~/public/object2 > > If my understanding of what's written down at the moment is correct, the > etag of object2 will change across the upgrade because the association > is being broken and it's being made as a new object with a new uuid. In > this case I'm not trying to refer to object1 at all. @rmustacc do you see this as a blocking issue or just an issue that should be called out if we choose to head this direction? Personally I think the only down sides are proxy and/or client cache misses (or effectively that), and a small amount of confusion for humans that have their snap linked objects migrated.
Honestly, I don't know if these should be blocking or not. I can easily go down both sides of this particular point. If the etag of any existing object is changed without user action being taken, that violates the intent of an etag. If anyone had built a database and included what they expected the etag to be for doing later conditional puts, we'd be breaking that. That said, as you point out, the scope of impact is also probably pretty limited. I originally raised this as it wasn't clear if it was intentional or known that the etag would be changing for existing objects across the upgrade procedure. Same with the other bits. The intent was to discuss it and figure it out. That said, it is a breaking change to the public API for an end user. So it probably needs to be discussed and treated like that. We've traditionally avoiding doing that in Manta.

Cool, that's helpful. Thanks @rmustacc !

from rfd.

jclulow commented on August 10, 2024

That said, it is a breaking change to the public API for an end user.

ETags and their specific behavior here are not (afaict) documented as being part of the public API, so maybe that changes the calculus a bit here?

Etags are mentioned several times in the public documentation; i.e., at https://apidocs.joyent.com/manta/api.html -- though our documentation could always stand to improve!

In particular, though it doesn't directly mention the word Etag, note that the phrase under PutObject, "The service is able to provide test/set semantics for you if you use HTTP conditional request semantics", refers to the part of the HTTP specification (conditional requests) which do deal with Etags. They're also listed as metadata you get back from ListDirectory, and the Etag header is present in the example responses.

In general, over the life time of the Manta API, we've tried very hard not to make breaking changes to the user-visible functionality of the API. The emphasis has been on interface stability, even in the face of perceived opportunities created by, at times, underdeveloped documentation.

I can't actually think of a time when we've made a change to the observable behaviour like this. If we're going to do it now, we'll want to figure out what it means for client software, for the documentation, communication of breakage, the roll-out plan, etc.

from rfd.

qdzlug commented on August 10, 2024

Wanted to step back and re-state the problem to see if I fully understand it.

A customer using snaplinks with the current snaplink logic is upgraded to use the proposed snaplink v2 logic.

If this customer is storing the ETag associated with that snaplink and using it as part of their application/process, this will be broken by the upgrade to snaplinks v2, because the ETag would change.

If I'm correct, to get into this situation requires that a user:

Is using snaplinks.
Is also using the ETag in some way.

I understand the desire to keep the public API stable and realize that any potential breaking change like this is going to be difficult to manage and implement. However, I think it's a fair tradeoff for the benefits that we would gain from RFD-171 being implemented, especially given the use case of our anchor tenant.

from rfd.

joshwilsdon commented on August 10, 2024

@qdzlug That matches my understanding of the discussion here at least. Though to be more specific on your point 2. Is also using the ETag in some way., it's actually only a very specific use that I believe is only hypothetical at this point (someone feel free to correct me if they know of an existing user doing this) where the user has stored the ETags for their existing SnapLinked objects in an external system for later use.

My understanding of the general use case of doing a conditional put involves:

get an object (must be a SnapLink to be relevant here)
do something with it
put the object and use the ETag to ensure that the object didn't change while you were doing something, and on ETag failure (because something else changed the object in the meantime) retry from 1

this use-case should not be broken as far as I can tell. At most it will experience a larger number of retries than usual if the step 1 and 3 are on different sides of the upgrade.

I'm also not clear on what use-cases require making SnapLinks and doing a PUT over the link target making it no longer a SnapLink. Does anyone have examples of things that do this now?

from rfd.

qdzlug commented on August 10, 2024

Closing this issue out; let's pick up discussions regarding 171 on the manta-dev list, please.

Jay

from rfd.

bcantrill commented on August 10, 2024

As long as the RFD is in anything less than a published state (and even then, honestly), the right place for the discussion is this issue. We can agree that discussion has reasonably converged (or has diverged in a way that is static), but simply closing out the discussion here by closing this issue very much runs counter to the intent of an RFD.

from rfd.

trentm commented on August 10, 2024

RFD 171 is abandoned.

from rfd.

RFD 171 Discussion!!! 🎉 about rfd HOT 32 CLOSED

Comments (32)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent