Git Product home page Git Product logo

sorted_set_nif's Introduction

Discord.SortedSet

Hex.pm Version

SortedSet is a fast and efficient data structure that provides certain guarantees and functionality. The core data structure and algorithms are implemented in a Native Implemented Function in the Rust Programming Language, using the Rustler crate.

Installation

Add SortedSet to your dependencies and then install with mix do deps.get, deps.compile

def deps do
  [
    {:sorted_set_nif, "~> 1.0.0"}
  ]
end

Implementation Details

Internally the Elixir terms stored in the SortedSet are converted to Rust equivalents and stored in a Vector of Vectors. The structure is similar to a skip-list, almost every operation on the SortedSet will perform a linear scan through the buckets to find the bucket that owns the term, then a binary search is done within the bucket to complete the operation.

Why not just a Vector of Terms? This approach was explored but when the Vector needs to grow beyond it's capacity, copying Terms over to the new larger Vector proved to be a performance bottle neck. Using a Vector of Vectors, the Bucket pointers can be quickly copied when additional capacity is required.

This strategy provides a reasonable trade off between performance and implementation complexity.

When using a SortedSet, the caller can tune bucket sizes to their use case. A default bucket size of 500 was chosen as it provides good performance for most use cases. See new/2 for details on how to provide custom tuning details.

Guarantees

  1. Terms in the SortedSet will be sorted based on the Elixir sorting rules.
  2. SortedSet is a Set, any item can appear 0 or 1 times in the Set.

Functionality

There is some special functionality that SortedSet provides beyond sorted and uniqueness guarantees.

  1. SortedSet has a defined ordering, unlike a pure mathematical set.
  2. SortedSet can report the index of adding and removing items from the Set due to it's defined ordering property.
  3. SortedSet can provide random access of items and slices due to it's defined ordering property.

Caveats

  1. Due to SortedSet's implementation, some operations that are constant time in sets have different performance characteristic in SortedSet, these are noted on the operations.
  2. SortedSets do not support some types of Elixir Terms, namely reference, pid, port, function, and float. Attempting to store any of these types (or an allowed composite type containing one of the disallowed types) will result in an error, namely, {:error, :unsupported_type}

Documentation

Documentation is hosted on hexdocs.

For a local copy of the documentation, the mix.exs file is already set up for generating documentation, simply run the following commands to generate the documentation from source.

$ mix deps.get
$ mix docs

Running the Tests

There are two test suites available in this library, an ExUnit test suite that tests the correctness of the implementation from a black box point of view. These tests can be run by running mix test in the root of the library.

The rust code also contains tests, these can be run by running cargo test in the native/sorted_set_nif directory.

Running the Benchmarks

Before running any benchmarks it's important to remember that during development the NIF will be built unoptimized. Make sure to rebuild an optimized version of the NIF before running the benchmarks.

There are benchmarks available in the bench folder, these are written with Benchee and can be run with the following command.

$ OPTIMIZE_NIF=true mix run bench/{benchmark}.exs

Adding the OPTIMIZE_NIF=true will force the benchmark to run against the fully optimized NIF.

Basic Usage

SortedSet lives in the Discord namespace to prevent symbol collision, it can be used directly

defmodule ExampleModule do
  def get_example_sorted_set() do
    Discord.SortedSet.new()
    |> Discord.SortedSet.add(1)
    |> Discord.SortedSet.add(:atom),
    |> Discord.SortedSet.add("hi there!")
  end
end

You can always add an alias to make this code less verbose

defmodule ExampleModule do
  alias Discord.SortedSet
  
  def get_example_sorted_set() do
    SortedSet.new()
    |> SortedSet.add(1)
    |> SortedSet.add(:atom),
    |> SortedSet.add("hi there!")
  end
end

Full API Documentation is available, there is also a full test suite with examples of how the library can be used.

sorted_set_nif's People

Contributors

ihumanable avatar iredelmeier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sorted_set_nif's Issues

Question about the blog post

Hey folks,

Apologies for the partially off-topic issue here, I just wasn't sure where else to reach out.

In the article the author writes

using the guarantees of Rust is guaranteed not to crash the VM or leak memory

I was wondering if someone could expand on this or direct me to more information about it. I'm curious because crashing and leaking memory are both things that the Rust compiler is happy to let you do in the general case. I took a look at the Rustler project and I see that they repeat the claim about crashes but they also have several open and closed issues related to crashes as well. I think I might just be confused about terms or the domain that was intended in the writing.

Thank you! And if this is simply the wrong venue for this question and this issue needs to be closed without answer I of course understand.

Define sort order / compare method

We have a case of a large data set ~100k entries which consist of id / score tuples. ids are static, but the score is changing frequently.

From what I understand, I cannot define a compare function to define the order of elements in your SortedSet, right? Like sort by score or the second item of a tuple? And it is also not possible to fetch an entry by id, right?

Is there a way your package can be used for our case?

Application start failed due with on_load_function_failed

Hey,

First of all, fantastic library and i've just start playing with it for one scenario where we needed sorted sets. Just running a regular distillery release and docker build but this seems to break the application start:

7F19DBDD9FB8:t2:A8:shutdown,H7F19DBDDA028
7F19DBDDA028:t3:A15:failed_to_start_child,AF:kernel_safe_sup,H7F19DBDDA088
7F19DBDDA088:t2:A17:on_load_function_failed,A22:Elixir.Discord.SortedSet.NifBridge

Docker version: elixir:1.7.2-alpine

Are there any dependencies that may be missing or maybe required?

Comparison to BTreeSet

I'm just curious how Vec<Vec<_>> compares to the standard library's BTreeSet - is there some significant advantage to this approach that make it worth building/maintaining a custom data structure?

Enumerable & Collectable

Hello, first of all thanks for this library! I'm curious if you considered implementing protocols for the module, or, e.g. you intentionally kept a low-level API for performance reasons and/or error handling?

As an exercise I implemented below, might be useful to you or anyone that comes looking for slightly higher-level API.

defmodule SortedSet do
  defstruct [:ref]

  defmacrop wrap(result) do
    quote do
      case unquote(result) do
        {:error, reason} -> raise inspect(reason)
        other -> other
      end
    end
  end

  def new() do
    %__MODULE__{ref: wrap(Discord.SortedSet.new())}
  end

  def new(items) do
    %__MODULE__{ref: wrap(Discord.SortedSet.from_enumerable(items))}
  end

  def member?(set, item) do
    case wrap(Discord.SortedSet.find_index(set.ref, item)) do
      index when is_integer(index) -> true
      nil -> false
    end
  end

  def put(%__MODULE__{ref: ref} = set, item) do
    wrap(Discord.SortedSet.add(ref, item))
    set
  end

  def delete(%__MODULE__{ref: ref} = set, item) do
    wrap(Discord.SortedSet.remove(ref, item))
    set
  end

  def to_list(%__MODULE__{ref: ref}) do
    wrap(Discord.SortedSet.to_list(ref))
  end

  def size(%__MODULE__{ref: ref}) do
    wrap(Discord.SortedSet.size(ref))
  end

  defimpl Collectable do
    @impl true
    def into(original) do
      collector_fun = fn
        set, {:cont, elem} -> SortedSet.put(set, elem)
        set, :done -> set
        _set, :halt -> :ok
      end

      {original, collector_fun}
    end
  end

  defimpl Enumerable do
    @impl true
    def count(set) do
      {:ok, SortedSet.size(set)}
    end

    @impl true
    def member?(set, item) do
      {:ok, SortedSet.member?(set, item)}
    end

    @impl true
    def reduce(set, acc, fun) do
      Enumerable.List.reduce(SortedSet.to_list(set), acc, fun)
    end

    @impl true
    def slice(set) do
      {:ok, SortedSet.size(set), &Discord.SortedSet.slice(set.ref, &1, &2)}
    end
  end
end
defmodule SortedSetTest do
  use ExUnit.Case, async: true

  test "collectable" do
    set = for i <- [2, 1, 3, 3], do: i, into: SortedSet.new()
    assert SortedSet.to_list(set) == [1, 2, 3]
  end

  test "enumerable" do
    set = SortedSet.new([2, 1, 3, 3])
    assert 1 in set
    refute 4 in set
    assert Enum.count(set) == 3
    assert Enum.slice(set, 0, 2) == [1, 2]
    assert Enum.map(set, &(&1 * 2)) == [2, 4, 6]
    assert Enum.fetch(set, 1) == {:ok, 2}
  end
end

Unable to compile with "elixir 1.15.7" & "erlang 26.1.2"

OS: Darwin two 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020 arm64
Elixir: elixir 1.15.7
Erlang/OTP: 26.1.2

Command:

git clone https://github.com/discord/sorted_set_nif.git && cd sorted_set_nif
mix deps.get
mix compile

Actual Output:

warning: use Mix.Config is deprecated. Use the Config module instead
  config/config.exs:1

Compiling 3 files (.ex)
Compiling crate sorted_set_nif in debug mode (native/sorted_set_nif)

Compiling on macOS requires special link args in order to compile
correctly.

Rustler is currently working around this issue in the compiler task.
This will be removed in v1.0.0 in favor of a user supplied .cargo/config
file.

To remove this warning, please create /Users/xxx/sorted_set_nif/native/sorted_set_nif/.cargo/config
with the following content:

      [target.x86_64-apple-darwin]
      rustflags = [
          "-C", "link-arg=-undefined",
          "-C", "link-arg=dynamic_lookup",
      ]

See https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html
for more details.


   Compiling libc v0.2.150
   Compiling proc-macro2 v1.0.69
   Compiling proc-macro-hack v0.5.20+deprecated
   Compiling unicode-ident v1.0.12
   Compiling fs_extra v1.3.0
   Compiling syn v1.0.109
   Compiling void v1.0.2
   Compiling rustler_sys v2.1.1
   Compiling unicode-segmentation v1.10.1
   Compiling rustler v0.22.2
   Compiling lazy_static v1.4.0
   Compiling unreachable v1.0.0
   Compiling heck v0.3.3
   Compiling paste-impl v0.1.18
   Compiling quote v1.0.33
   Compiling cc v1.0.83
   Compiling paste v0.1.18
   Compiling jemalloc-sys v0.3.2
   Compiling rustler_codegen v0.22.2
error: failed to run custom build command for `rustler v0.22.2`

Caused by:
  process didn't exit successfully: `/User/xxx/sorted_set_nif/_build/dev/lib/sorted_set_nif/native/sorted_set_nif/debug/build/rustler-f511909f2d5940db/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at /Users/xxx/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rustler-0.22.2/build.rs:37:13:
  Erlang version 2.17 not handled, please file a a bug report.
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
Compiling lib/sorted_set/nif_bridge.ex (it's taking more than 10s)

== Compilation error in file lib/sorted_set/nif_bridge.ex ==
** (RuntimeError) Rust NIF compile error (rustc exit code 101)
    (rustler 0.22.0) lib/rustler/compiler.ex:36: Rustler.Compiler.compile_crate/2
    lib/sorted_set/nif_bridge.ex:13: (module)

Expected Output:

Compilation succeeds.

Thoughts:
It is probably linked to the supported version of Erlang as described here in the Rustler readme. However I don't know how to fix it.

Unable to compile native_sorted_set_nif, file doesn't exist

Nif crate native/sorted_set_nif is not generating.

Upon running mix do deps.get, deps.compile I receive error

Compiling NIF crate :sorted_set (native/sorted_set_nif)...
could not compile dependency :sorted_set_nif, "mix compile" failed. You can recompile this dependency with "mix deps.compile sorted_set_nif", update it with "mix deps.update sorted_set_nif" or clean it with "mix deps.clean sorted_set_nif"
** (ErlangError) Erlang error: :enoent
    (elixir) lib/system.ex:791: System.cmd("cargo", ["rustc", "--release"], [cd: "/mnt/d/PersonalProjects/Phoenix/MusicBrew/deps/sorted_set_nif/native/sorted_set_nif", stderr_to_stdout: true, env: [{"CARGO_TARGET_DIR", "/mnt/d/PersonalProjects/Phoenix/MusicBrew/_build/dev/rustler_crates/sorted_set"}], into: %IO.Stream{device: :standard_io, line_or_bytes: :line, raw: false}])
    lib/mix/tasks/compile.rustler.ex:58: Mix.Tasks.Compile.Rustler.compile_crate/1
    (elixir) lib/enum.ex:1336: Enum."-map/2-lists^map/1-0-"/2
    lib/mix/tasks/compile.rustler.ex:14: Mix.Tasks.Compile.Rustler.run/1
    (mix) lib/mix/task.ex:331: Mix.Task.run_task/3
    (mix) lib/mix/tasks/compile.all.ex:73: Mix.Tasks.Compile.All.run_compiler/2
    (mix) lib/mix/tasks/compile.all.ex:53: Mix.Tasks.Compile.All.do_compile/4
    (mix) lib/mix/tasks/compile.all.ex:24: anonymous fn/1 in Mix.Tasks.Compile.All.run/1

and Elixir LS extension in VSCode provides the error

an exception was raised:
    ** (RuntimeError) Rust NIF compile error (rustc exit code 1)
        lib/mix/tasks/compile.rustler.ex:67: Mix.Tasks.Compile.Rustler.compile_crate/1
        (elixir) lib/enum.ex:1327: Enum."-map/2-lists^map/1-0-"/2
        lib/mix/tasks/compile.rustler.ex:14: Mix.Tasks.Compile.Rustler.run/1
        (mix) lib/mix/task.ex:331: Mix.Task.run_task/3
        (mix) lib/mix/tasks/compile.all.ex:68: Mix.Tasks.Compile.All.run_compiler/2
        (mix) lib/mix/tasks/compile.all.ex:52: Mix.Tasks.Compile.All.do_compile/4
        (mix) lib/mix/tasks/compile.all.ex:23: anonymous fn/1 in Mix.Tasks.Compile.All.run/

Am I missing something?

Is it OK to use a SortedSet concurrently from multiple BEAM processes?

Hi! I did a bunch of work on Rustler a while back, so I was excited to read your blog post and take a look at the code.

It looks exactly how we imagined Rustler would be used in practice. I'm very grateful that you made this an open source library so I could see it. :)


My question is about concurrency and SortedSet.

The NIFs all use try_lock to acquire the mutex. I think that means that a SortedSet will work fine as long as only a single Erlang process uses it. If multiple processes try to use it concurrently, there could be sporadic :lock_fail errors. Are you using it within a single process? Have you seen errors?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.