uazu / qcell Goto Github PK

Statically-checked alternatives to RefCell and RwLock

License: Apache License 2.0

Rust 98.07% Perl 1.06% Shell 0.87%

qcell's Introduction

Statically-checked alternatives to RefCell or RwLock

Cell types that instead of panicking at runtime as with RefCell will give compilation errors instead, or that exchange fine-grained locking with RwLock for coarser-grained locking of a separate owner object.

Documentation

See the crate documentation.

License

This project is licensed under either the Apache License version 2 or the MIT license, at your option. (See LICENSE-APACHE and LICENSE-MIT).

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

qcell's People

Contributors

Stargazers

Watchers

qcell's Issues

Be aware of pointer provenance

There are new APIs proposed in 95228 which may eventually lead to the deprecation of ptr as usize casts.

qcell uses these casts in many places, namely in functions -> QCellOwnerId, as well as to check the uniqueness of cells in rw2/rw3 calls.

Since they're only used for comparisons, I don't think qcell really needs to worry about retaining provenance information (i.e. the change would be to replace ptr as usize with ptr.addr()).

I figured I'd open this issue to keep an eye on strict_provenance and prepare to update qcell if/when those APIs become stable.

`rwn` to borrow `N` cells

Hello,

can here be an api like that in std? So that signature would look like

pub fn rwn<'a, T: ?Sized, const N: usize>(&'a mut self, qc: [&'a QCell<T>; N]) -> [&'a mut T; N];

Evaluate GhostCell, maybe add it

See the crate docs for the links to ghost_cell.rs. This uses lifetimes. It's not clear from the usage there how invasive the lifetime annotations would be in the code. It does offer one big advantage over TCell: no need for singletons. Apart from that the behaviour should be the same as TCell. Whether it is worth it depends on how complicated and confusing the lifetimes might be to the user.

Also, need to check licenses before copying any code from that repository.

Access nested cells with the same owner

My usecase is in a programming language, for type-inferring. I have type variables in Rc<TCell<>>, such that multiple places can share their inferred type automatically. The problem is that a type can contain another type variable (that might not be known yet), so I have some situations where I would effectively need to get the innermost value (mutably) out of a Rc<TCell<Rc<TCell<_>>>>. I have some workarounds for this, but they are all limiting in some way.

I think the nested access pattern should be possible for all the cells, since the owner is borrowed mutably and so no other accesses can happen simultaneously. Is my reasoning wrong? If not, it would be great if we could get a function like

fn TCellOwner::rw_nested<T, U>(cell: &TCell<T>, get_nested: impl FnOnce(&mut T) -> &TCell<U>) -> &mut U;

On a related note, we could also introduce a lower-level, UnsafeCell-like unsafe API to get pointers without having the owner.
This would allow one to implement the above function oneself, by using the fact that the owner is borrowed mutably to fulfill the safety conditions of, say,

/// Subject to similar conditions to UnsafeCell::get
fn TCell::ptr(&self) -> *mut T;

(This should then also exist for all the cells)

Safer (checked) `QCellOwnerSeq::new`

std's Rc, Arc, and ThreadId have essentially the same pathological edge case as QCellOwnerSeq — incrementing the reference count (owner id sequence) in a loop without ever decreasing it (i.e. by forgetting the cloned Rc/Arc, or always for the monotonic ThreadId and QCellOwnerSeq counts) could overflow the counter and lead to UB. std declares this as pathological unsupported behavior — when would you ever legitimately need even 2³¹ different owning references or threads, let alone 2⁶³¹ — and aborts if this overflow would happen².

At least on 64-bit targets (where exhausting the ID space is fundamentally impractical³), it'd be nice to have QCellOwnerSeq::new be made safe. I'd recommend doing a cmpxchg loop (fetch_update) like ThreadId, since creating new owners doesn't seem like it'd ever be a contended operation.

Doing this would entail one of:

making QCellOwnerSeq::new safe (in theory not API breaking, but can cause downstream unused_unsafe warnings and makes the non-panicking path now potentially panicking); or
changing QCellOwnerSeq::new's unsafe requirements (from "don't misuse colliding IDs" to "don't exhaust the ID space") and making a separate constructor⁴ that does checked sequence based owner IDs.

Since qcell doesn't expose a way to get the numeric value of a QCellOwnerID (and I think this is a good decision), there's no way to implement this downstream, even partially.

For some context, with 128 threads all cooperatively incrementing the same shared counter at a rate of 6 GHz, it will still take 139 days at that rate to increment the the counter 2⁶³ times. It's effectively never going to happen accidentally. ↩
Rc uses cell.set(cell.get() + 1) and aborts when overflow happens. Arc uses atom.fetch_add(1, Relaxed) and aborts at isize::MAX, relying on the fact that having isize::MAX threads cloning Arc concurrently is impossible (at least two bytes of address space are consumed per thread). ThreadId uses a compare_exchange loop, presumably because creating a new thread will never be performance critical like cloning Arc can be. ↩
Counting to 2⁶³ is never going to happen in a reasonable timeframe (see previous footnote), but counting to 2³² is entirely practical. OTOH, 32-bit targets with 64-bit atomic support could still use 64-bit owner IDs, zero-extending the address-based owner IDs. ↩
In both cases QCellOwner could also switch to using checked sequence-based IDs to avoid the alloc requirement, but do note that this would result in the ID space being consumed without reuse by QCellOwner as well as QCellOwnerSeq, which it isn't currently. ↩

Switch to compiletest_rs for compilation failure checks

Using compile_fail doctests is convenient, but the code can fail to compile for other reasons than the intended one. So the only way to be sure is to strip out all the compile_fail markers, and recheck all the failures one by one. So using the compiletest_rs crate might make things easier in the long run.

Consider adding `rw` and `ro` calls to cells, for convenience

Default is not implemented for LCell

We feel like LCell could just impl Default?

(Also TCell and TLCell?)

Translations of T

Some possible additions to the API (contributed by pythonesque):

I also proved that the translations Lcell<T> -> T, &LCell<[T]> -> &[LCell<T>], and &mut T <-> &mut LCell<T>, are sound for all T, which I don't think your code has yet.

Check whether rw2/rw3 are sound

It was pointed out that maybe it's possible to get access to both a structure and a member of that structure at the same time using rw2, which would mean two &mut to the same memory region. Try to create an example which reproduces this, then see if there's any way to save the functionality.

A new architecture

I was just thinking about how to restructure qcell after #9, and I came up with a way to generalize QCell, TCell, TLCell, and LCell into a single generic type!

The core interface would be:

use core::cell::UnsafeCell;

pub unsafe trait ValueCellOwner: Sized {
    type Proxy;

    fn validate_proxy(&self, proxy: &Self::Proxy) -> bool;

    fn cell<T>(&self, value: T) -> ValueCell<Self, T>;

    fn owns<T: ?Sized>(&self, cell: &ValueCell<Self, T>) -> bool {
        self.validate_proxy(cell.owner_proxy())
    }

    fn ro<'a, T: ?Sized>(&'a self, cell: &'a ValueCell<Self, T>) -> &'a T {
        assert!(self.owns(cell), "You cannot borrow from a `ValueCell` using a different owner!");
        unsafe { &*cell.as_ptr() }
    }

    fn rw<'a, T: ?Sized>(&'a mut self, cell: &'a ValueCell<Self, T>) -> &'a mut T {
        assert!(self.owns(cell), "You cannot borrow from a `ValueCell` using a different owner!");
        unsafe { &mut *cell.as_ptr() }
    }

    fn rw2<'a, T: ?Sized, U: ?Sized>(
        &'a mut self,
        c1: &'a ValueCell<Self, T>,
        c2: &'a ValueCell<Self, U>,
    ) -> (&'a mut T, &'a mut U) {
        assert!(self.owns(c1), "You cannot borrow from a `ValueCell` using a different owner!");
        assert!(self.owns(c2), "You cannot borrow from a `ValueCell` using a different owner!");
        assert_ne!(c1 as *const _ as usize, c2 as *const _ as usize, "You cannot uniquely borrow the same cell multiple times");
        unsafe { (&mut *c1.as_ptr(), &mut *c2.as_ptr()) }
    }
}

pub struct ValueCell<Owner: ValueCellOwner, T: ?Sized> {
    owner: Owner::Proxy,
    value: UnsafeCell<T>
}

impl<Owner, T> ValueCell<Owner, T>
where
    Owner: ValueCellOwner
{
    pub fn from_proxy(owner: Owner::Proxy, value: T) -> Self {
        Self { owner, value }
    }
}

impl<Owner, T: ?Sized> ValueCell<Owner, T>
where
    Owner: ValueCellOwner
{
    pub const fn as_ptr(&self) -> *mut T {
         self.value.get()
    }
    
    pub const fn owner_proxy(&self) -> &Owner::Proxy {
         &self.owner
    }
}

All of the current types could be modeled like so,

type QCell<T> = ValueCell<RuntimeOwner, T>;

struct RuntimeOwner {
    id: u32
}

struct RuntimeProxy(u32);

unsafe impl ValueCellOwner for RuntimeOwner {
    type Proxy = RuntimeProxy;

    fn validate_proxy(&self, proxy: &Self::Proxy) -> bool {
        self.id == proxy.0
    }

    fn cell<T>(&self, value: T) -> ValueCell<Self, T> {
        ValueCell::from_proxy(RuntimeProxy(self.id), value)
    }
}

qcell + selfref?

would it be possible to have a cell type which is similar to LCell, but is based around the selfref crate?

(maybe with generativity? we're not familiar with that crate)

this is related to #30 but more general

specifically we want a thread-safe lock-free &mut SelfRefCellEnvironment which opens a reusable LCell-like environment.

`TCell` is unsound due to covariant `Q` parameter of `TCellOwner` (and the same applies to `TLCell`)

use qcell::{TCell, TCellOwner};

type T1 = fn(&());
type T2 = fn(&'static ());

// T1 subtype of T2, both 'static

// TCellOwner covariant
fn _demo(x: TCellOwner<T1>) -> TCellOwner<T2> {
    x
}

// and that's obviously bad

fn main() {
    let first_owner = TCellOwner::<T2>::new();
    let mut second_owner = TCellOwner::<T1>::new() as TCellOwner<T2>;

    let mut x = TCell::<T2, _>::new(vec!["Hello World!".to_owned()]);
    let reference = &first_owner.ro(&x)[0];
    second_owner.rw(&x).clear();

    println!("{}", reference); // ��&d��i
                               // (or similar output)
}

Intereseted in adding a `generativity` based `LCell`?

For one of my projects I am using a fork of LCell with the generativity crate to avoid the requirement for closures. Is this something you would be interested in having contributed back to this crate? If so I will put together a PR.

Consider adding generic cell-owner and cell types, to handle alternative key-management approaches

If someone wants to do their own key management (e.g. u64 key, panic on exhaustion instead of free list, etc), then have some types that they can plug their own 'key' type into, and that does the cell borrowing handling for them. These would typically be wrapped in type aliases to make them more ergonomic.

It might be necessary to have two sets of these, for each of TLCell-like and TCell-like, i.e. with or without Sync access to the cells.

Debugging ergonomics?

Heya, thanks for making this crate.

I'm currently porting a project over from RefCell, and I'm finding that I really miss being able to derive Debug on things that now contain QCells. Debugging anything in a QCell is requiring me to go in with a debugger every time, when I often just want to dump out what's in it. I typically can't use get_mut or into_inner in any of these places.

Do you have any suggestions to make this more ergonomic?

Retrieving the wrapped value from `QCell` and friends

Cell, RefCell, and Mutex all have an into_inner method which allows you to extract the wrapped value, but QCell (and TCell and LCell) doesn't seem to have such a method.

Cell constructor that does not require owner reference

This would be possible with TCell and LCell, but not with QCell. This would be useful when new cells need to be created whilst the owner is borrowed.

Features are additive

So opt-out features such as no-thread-local are not the correct approach. If one crate depending on qcell opts out, but another one doesn't, it will be forced to opt out anyway as things are at the moment (because cargo builds just one instance of the crate with the sum of all features enabled). This will mean things not working correctly, i.e. probably it will panic immediately on running.

So probably it is best to split out global and thread-local-based TCell as two different types instead of using features.

IntrusiveQCellOwner?

We don't suppose it'd be possible to have something along these lines?

struct IntrusiveQCellOwner<T: ?Sized> {
  inner: Arc<QCell<T>>,
}

// impl<T: ?Sized> !Clone for IntrusiveQCellOwner<T>;

impl<T> IntrusiveQCellOwner {
  fn new(value: T) -> Self {
      Arc::new_cyclic(|arc| {
        QCell { owner: QCellOwnerID(arc.as_ptr() as *const () as usize), value: UnsafeCell::new(value) }
      })
    }
  }
}

impl<T: ?Sized> IntrusiveQCellOwner {
  fn id(&self) -> QCellOwnerID {
    self.inner.owner
  }

  // the usual stuff, rw etc.

  fn clone_inner(&self) -> Arc<QCell<T>> {
    self.inner.clone()
  }

  // in fact, might aswell just use Deref and DerefMut for these.
  //fn mut_inner(&mut self) -> &mut T {
    // ...
  //}
  //fn get_inner(&self) -> &T {
    // ...
  //}
}

The main use-case would be the same as #16 except without a separate "owner".

pub struct ConfigManager {
    resources: Vec<Resource>,
}

pub struct AddConfigSource<'a, T: DataSourceBase + Send + Sync + 'static> {
    resource: &'a mut Resource,
    source: Arc<QCell<T>>,
}

struct Resource {
    // actual resource
    base: IntrusiveQCellOwner<dyn DataSourceBase + Send + Sync>,
    // views of the actual resource
    title: Option<Arc<QCell<dyn DataSource<InstanceTitle> + Send + Sync>>>,
    url: Option<Arc<QCell<dyn DataSource<InstanceBaseUrl> + Send + Sync>>>,
    repolist: Option<Arc<QCell<dyn DataSource<RepoListUrl> + Send + Sync>>>,
}

impl ConfigManager {
    pub fn add_source<T>(&mut self, source: T) -> AddConfigSource<'_, T>
    where
        T: DataSourceBase + Send + Sync + 'static,
    {
        let base = IntrusiveQCellOwner::new(source);
        let arc = base.clone_inner();
        self.resources.push(Resource::new(base));
        AddConfigSource {
            resource: self.resources.last_mut().unwrap(),
            source: arc,
        }
    }
}

impl<'a, T: DataSourceBase + Send + Sync + 'static> AddConfigSource<'a, T> {
    pub fn for_title(self) -> Self where T: DataSource<InstanceTitle> {
        let arc = &self.source;
        self.resource.title.get_or_insert_with(|| {
            arc.clone()
        });
        self
    }
    pub fn for_base_url(self) -> Self where T: DataSource<InstanceBaseUrl> {
        let arc = &self.source;
        self.resource.url.get_or_insert_with(|| {
            arc.clone()
        });
        self
    }
    pub fn for_repo_lists(self) -> Self where T: DataSource<RepoListUrl> {
        let arc = &self.source;
        self.resource.repolist.get_or_insert_with(|| {
            arc.clone()
        });
        self
    }
}

no_std support

It should be possible to support large parts of the functionality in a no_std environment. So add a default "std" feature, and allow the crate user to disable it if they want no_std.

Supporting TCell without std or exclusion-set

I believe qcell can support TCell without either of those features enabled, via refactoring the TCellOwner into an unsafe trait with most of the behaviour plus the current struct implementing this new trait. Then you can introduce a simple macro for creating new owner types that uses a static AtomicBool for uniqueness. Something like

pub unsafe trait CanOwnTCell {
  fn ro<'a, T: ?Sized>(&'a self, tc: &'a TCell<Self, T>) -> &'a T {
    // Copy implementation over
  }

  // All the other cell-related methods
}

unsafe impl<Q> CanOwnTCell for TCellOwner<Q> {}

macro_rules! make_tcellowner_type {
  ($visibility:vis, $owner:ident, $bool_name:ident) => {
    static $bool_name: AtomicBool = AtomicBool::new(false);

    $visibility struct $owner {
      _phantom: PhantomData<()>
    }

    impl $owner {
      pub fn try_new() -> Option<Self> {
        $bool_name.compare_exchange(false, true, Relaxed, Relaxed).ok().map(|_| Self { _phantom: Default::default() })
      }
    }

    unsafe impl CanOwnTCell for $owner {}

    impl Drop for $owner {
      fn drop(&mut self) {
        $bool_name.store(false, Relaxed);
      }
    }
  }
}

This would be a breaking change, in that now TCell has to specify the full TCellOwner<Q> as the owner type, but if you keep the old TCell<Q, T> as a type alias type TCell<Q, T> = TCellTraited<TCellOwner<Q>, T>; I think you can downgrade it to just a major change.

Cell type that uses address of cell owner as key

Since Rust guarantees no dangling references or use-after-free in safe code, it should be possible to use the address of the cell-owner as the key to access the cells, storing the address as the key in the cell.

If the owner is moved, then it loses access to the cells (which would typically be a bug in the user's code). Also if the owner is dropped and another owner created in the same memory, it will gain access to all the cells previously owned by the old owner. But this doesn't cause soundness problems, because there is still just one owner at any one time. Also access to a cell requires both a pointer to the cell and also the owner's key. So it really doesn't cause any issues that some other code might get logical ownership as it can't get access unless it also has pointers to the cells.

Seek motivating example for more than 3 simultaneous borrows

Personally, I don't think that this is required. All the cases so far where I've needed simultaneous borrows have been handled by rw2. However if there are any concrete cases where 4+ simultaneous borrows are needed, and it would be inefficient to handle them as a sequence of rw2 or rw3 borrows, it would be good to document them and analyse them. So please add a comment if you have such a requirement. If it would really work out as more efficient to borrow 4+ items at a time (considering the roughly O(N^2) comparisons), there is a draft PR #26 which could be finished off to provide this functionality.

Zero-sized LCellOwner APIs?

Currently, LCellOwner is zero-sized, but &LCellOwner and &mut LCellOwner are not. We wonder if it's possible to have a sound API which borrows the LCellOwner but is zero-sized?

This is a bit of a micro-optimization but anyway.

T: ?Sized support

We have this:

/// Stores multiple DataSource capable of InstanceTitle, InstanceBaseUrl and
/// RepoListUrl
#[derive(Default)]
pub struct ConfigManager {
    // conceptually the actual objects
    bases: Vec<Arc<RwLock<dyn DataSourceBase + Send + Sync>>>,
    // conceptually just views of the above objects
    titles: Vec<Option<Arc<RwLock<dyn DataSource<InstanceTitle> + Send + Sync>>>>,
    urls: Vec<Option<Arc<RwLock<dyn DataSource<InstanceBaseUrl> + Send + Sync>>>>,
    repolists: Vec<Option<Arc<RwLock<dyn DataSource<RepoListUrl> + Send + Sync>>>>,
    durations: Vec<Duration>,
    // add_source can be called after update.
    valid: usize,
}

It is our understanding that we would be able to do something like this:

/// Stores multiple DataSource capable of InstanceTitle, InstanceBaseUrl and
/// RepoListUrl
#[derive(Default)]
pub struct ConfigManager {
    owner: QCellOwner,
    // conceptually the actual objects
    bases: Vec<Arc<QCell<dyn DataSourceBase + Send + Sync>>>,
    // conceptually just views of the above objects
    titles: Vec<Option<Arc<QCell<dyn DataSource<InstanceTitle> + Send + Sync>>>>,
    urls: Vec<Option<Arc<QCell<dyn DataSource<InstanceBaseUrl> + Send + Sync>>>>,
    repolists: Vec<Option<Arc<QCell<dyn DataSource<RepoListUrl> + Send + Sync>>>>,
    durations: Vec<Duration>,
    // add_source can be called after update.
    valid: usize,
}

And this would be much faster than both RwLock and even hypothetical Arc locking, while still giving us Send+Sync. Is that accurate?

Support moving a cell to a new ownership

TLCell already supports a kind of transfer of ownership, but only between threads, due to there being an owner in each thread.

A QCell-like cell could potentially support transferring ownership since ownership is determined by the key value stored in the cell. However that key value is immutable. To make it mutable means either using an atomic type or else maybe by making the cell non-Sync and using a plain Cell to contain the key value.

The proposed address-based-key cell (issue #14) could also support transferring ownership to some other owner's address in a similar way, if there was an ID type that could be used to pass the other owner's address. (We don't need &mut on the second owner in order to transfer ownership to it, just the address.)

The other cells (TCell, TLCell and LCell) can't support transferring ownership because ownership is hardcoded at compile-time and is checked by the compiler, and so cannot be modified at runtime.

uazu / qcell Goto Github PK

qcell's Introduction

Statically-checked alternatives to RefCell or RwLock

Documentation

License

Contribution

qcell's People

Contributors

Stargazers

Watchers

Forkers

qcell's Issues

Footnotes

Recommend Projects

Recommend Topics

Recommend Org