There's a straight-forward fix: make the constructors defaulte

I've mostly implemented it in the draft PR <a class="issue-link js-issue-link" data-er

Hi, I am UBsan, and I'm currently complaining about this. </blockquot

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

C++ opaque handle implementation is based on undefined behaviour,about webassembly/wasm-c-api

Comments (17)

strega-nil commented on June 23, 2024 5

Hi, I am UBsan, and I'm currently complaining about this.

from wasm-c-api.

nlewycky commented on June 23, 2024 2

There's a straight-forward fix:

make the constructors defaulted and protected
make the destructor protected
remove the overloaded operator delete
add a private destroy() method
change own<T> from unique_ptr<T> to unique_ptr<T, Deleter> where Deleter calls destroy()
make the classes friends with their Deleters so that unique_ptr can delete them but Foo::make()->destroy() doesn't compile (if it did, it would lead to a double-free).

Then implementers can:

create their own implementation classes that public-derive from these classes and add their own additional member variables and methods
define the declared make() method to always create objects of the matching derived type.
define the other methods by static_cast'ing the (this) pointer to the derived type which is always valid since the only way to create one is with the make() method that always creates the derived type anyways
like the other methods destroy() casts to derived type, then calls "delete casted_this;" or calls a derived destroy() which does "delete this;"

Example:
wasm.hh has:

template <typename T>
struct Destroyer {
  void operator()(T *ptr) {
    ptr->destroy();
  }
};

template<class T> using own = std::unique_ptr<T, Destroyer<T>>;

template<class T>
auto make_own(T* x) -> own<T> { return own<T>(x); }

class WASM_API_EXTERN Config {
  friend struct Destroyer<Config>;
  void destroy();

protected:
  Config() = default;
  ~Config();

public:
  static auto make() -> own<Config>;

  // Implementations may provide custom methods for manipulating Configs.
};

and impl-wasm.cc has:

class ImplConfig : public Config {
public:
  ImplConfig() = default;
  ~ImplConfig() {}

  static ImplConfig *from(Config *config) {
    return static_cast<ImplConfig *>(config);
  }

private:
  int impl_state;
};

Config::~Config() {}

auto Config::make() -> own<Config> { return make_own<Config>(new ImplConfig); }

void Config::destroy() { delete ImplConfig::from(this); }

A few comments on this approach:

A unique_ptr<Derived, Deleter<Derived>> doesn't freely cast to a unique_ptr<Base, Deleter<Base>>. I think the easiest fix it to pass the explicit T to make_own when doing a make + downcast combined as shown in my definition of make() above.
With this approach the potential bug of deleting the object of base type is prevented:

void test(Config *config) {
  delete config;
}

dt.cc:52:10: error: calling a protected destructor of class 'Config'
  delete config;
         ^
dt.cc:20:3: note: declared protected here
  ~Config();
  ^

which is necessary if we expect the implementer to extend the size of the object when deriving.

Manual memory management is still possible but obvious in code review.

void ok1() {
  auto x = Config::make();
}

void ok2() {
  auto x = Config::make().release();
  Destroyer<Config>()(x);
}

void error1() {
  Config::make()->destroy();
}

dt.cc:63:19: error: 'destroy' is a private member of 'Config'
  Config::make()->destroy();
                  ^
dt.cc:19:8: note: implicitly declared private here
  void destroy();
       ^

The other main alternative is to use virtual. I assume we're avoiding that?

from wasm-c-api.

nlewycky commented on June 23, 2024 1

I've mostly implemented it in the draft PR #161. There's a pre-existing bug with Shared<> that I'm not sure what approach to take to address and am looking for input. wasm-v8.cc defines an explicit template specialization for Shared<Module>::~Shared(), which is a problem because no specialization was declared and the destructor (or now destroy() in my approach) may be implicitly instantiated. I know this aspect of C++ templates is not well understood so I'll give a primer:

implicit instantiation. The template is defined generally over some template parameters and you use that pattern. This is what happens when you use std::vector<char> for instance. For implicit instantiations, multiple definitions is not a linker error and it is optional whether the compiler chooses to emit the the symbols to the .o file (for example, it may depend on the optimization level whether a function template is inlined into all callers so now there is no need for a symbol -- this is what leads to most cases of mysterious C++ template linking errors).
explicit template instantiation definition. Emits a definition to the .o file for all the symbols in this template. Two definitions may or may not be a linker error, but the definition must be present. For example (template Shared<int>;).
explicit template instantiation declaration. Added in C++11. Informs the compiler that the template is expected to be provided by another .o file, in conjunction with clang's -Wundefined-var-template and -Wundefined-func-template to prevent those mysterious linking errors. (extern template Shared<int>;).
explicit template specialization definition. Replace the body of a template pattern (a whole class or a function or variable) with a different one for a particular set of template arguments. (template<> struct Shared<int> { /* ... */ };)
explicit template specialization declaration. Declare that there exists an explicit specialization for this particular set of template arguments. (template<> struct Shared<int>;)

(not discussed: partial templates or template templates)

It's incorrect to call an explicit specialization when expecting an instantiation per https://eel.is/c++draft/temp.expl.spec#7.sentence-1 . So this combination:
wasm.hh:

class WASM_API_EXTERN Shared {
public:
  Shared() = delete;
  ~Shared();
  void operator delete(void*);
};

wasm-v8.cc:

template<>
Shared<Module>::~Shared() {
  stats.free(Stats::MODULE, this, Stats::SHARED);
  impl(this)->~vec();
}

is a non-starter. (If it helps, you can imagine that an explicit specialization has a different type signature, so it could be mangled differently, though both popular ABIs chose to mangle them the same which is why it still links.) Any code that might trigger implicit instantiation of the declaration of d'tor/destroy() must know that this is defined with an explicit specialization.

How would we do that? We could forbid C++ implementations from using an explicit specialization. We could tell users that it's normal to have to include a second vendor-specific header file even though the entire interface defined in wasm.hh is portable at the source level. We could declare all the explicit specializations in wasm.hh and require all implementations to define for each specialization? We could replace Shared<T> with a non-template?

from wasm-c-api.

jakobkummerow commented on June 23, 2024

I believe that this pattern is fine, because the objects are not accessed through a pointer of the respective API type: the implementations of all functions cast the opaque API pointer back to its real type first. (I could be wrong, but that's my reading of the spec and the code. FWIW, UBSan is currently not complaining.)

from wasm-c-api.

peter-b commented on June 23, 2024

UBSan is currently not complaining.

Absence of evidence is not evidence of absence.

See also class.mfct.non-static/2:

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

from wasm-c-api.

rossberg commented on June 23, 2024

Do folks have any suggestion how to achieve implementation independence here without using such a cast or making all methods static? (Though, practically speaking, V8 itself relies on patterns like this so fundamentally that I doubt the ones here ever matter.)

from wasm-c-api.

jakobkummerow commented on June 23, 2024

Hi, I am UBsan, and I'm currently complaining about this.

Well played :-)

Absence of evidence is not evidence of absence.

I know, and in particular UBSan only provides partial coverage. Hence "FWIW".

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

TIL. Well, for whatever that's worth, this is a common pattern for libraries, so it's unlikely that compilers will be able to afford to break it, but it's good to know that this actually is UB. In that case, I would support moving to a different design, however I don't have a good suggestion for what might be such a design.

from wasm-c-api.

TartanLlama commented on June 23, 2024

@rossberg do you have a link to where this is done in V8? Would be interested in having a look!

I'll have another look to tomorrow and see if I can think of another possibility which maintains the performance characteristics while being safer standards-wise and bit easier to work with, unless someone beats me to it 😄.

from wasm-c-api.

rossberg commented on June 23, 2024

@TartanLlama, just look at the implementation of the factory class. Pretty much every allocation function there goes through some of various general mechanisms for allocating on the GC'ed heap and then casts the result to one of the gazillion object classes. With everything going on in a real-world GC, I have a hard time imagining a sane alternative to this approach. And there probably are many far worse uses of what's technically UB all over the place.

Would be a fun challenge to write a production VM in C++ without any UB. I would make any bet that that's outright impossible. :)

from wasm-c-api.

jakobkummerow commented on June 23, 2024

@rossberg : V8's heap objects actually don't use this pattern any more :-)

But V8's API still uses it, and AFAIK it's a fairly common pattern for libraries to have public types that are essentially opaque pointers that get reinterpret_casted to their internal equivalents. The benefits are that (1) the internal classes' implementations are hidden from the public API, and (2) you can use idiomatic member function calls on the external types. Unless I'm missing something, using inheritance would break (1) for non-trivial class hierarchies (maybe unless one uses multiple inheritance?), while switching to static functions would obviously break (2).

Maybe something could be built using composition with a forward-declared pointer to the internal object, roughly:

//// public.h

namespace internal {
class Internal;
}
class Public {
 public:
  void foo();
 private:
  internal::Internal* impl_;
};

//// internal.cc

namespace internal {
class Internal {
 public:
  foo() { /* real implementation */ }
 private:
  int field_;
};
}  // namespace internal

Public::foo() { impl_->foo(); }

But I haven't really thought through the implications of that yet, e.g. regarding possible overhead, construction/destruction/lifetime management issues, or any non-obvious corner cases.
Also, having to forward-declare a bunch of internal classes might be fine for a project-specific API like V8's, but seems unfortunate for an implementation-independent API like the wasm-c-api. So I'm not arguing in favor of this particular approach, just putting the idea out there.

Would be a fun challenge to write a production VM in C++ without any UB. I would make any bet that that's outright impossible.

My understanding is that strictly speaking it's even impossible to implement hello-world in C++ without relying on UB, because one has to assume that any program requires a non-zero amount of stack space, which might not be available in some environments, and behavior on stack overflow is undefined.

from wasm-c-api.

nlewycky commented on June 23, 2024

This comment serves as something of a footnote. As far as I know, the existing header file can be implemented without UB in C++17. You just have to construct an object with only a single deleted constructor.

No problem.

class WASM_API_EXTERN WasmerConfig : public Config {
public:
  WasmerConfig() : Config{} {}

  static WasmerConfig *from(void *base) {
    return reinterpret_cast<WasmerConfig *>(base);
  }

private:
  // TODO: custom config state
};

Config::~Config() {}

void Config::operator delete(void *ptr) { delete WasmerConfig::from(ptr); }

auto Config::make() -> own<Config> { return make_own(new WasmerConfig); }

// TODO: add custom config calls

Note that the base object initialization used Config{} and not Config() because uniform initialization syntax treats Config as an aggregate and we bypass the deleted constructor by directly initializing its elements (all zero of them).

This is no longer possible in C++20 because classes with constructors (including deleted constructors) are not considered aggregates for the purposes of uniform initialization.

from wasm-c-api.

rossberg commented on June 23, 2024

@nlewycky, thanks for that! The only downside I see is a further increase of the boilerplate in the header file's class definitions, but this is C++, so who cares.

I assume an empty deleter struct will not increase the size of the unique_ptr object? (The current C implementation on top of the C++ API does some really verboten reinterpret casts between unique_ptr<T>* and T** to avoid copying vectors. =:-} )

from wasm-c-api.

nlewycky commented on June 23, 2024

@rossberg Strictly speaking there's no guarantee about the size or representation of a unique_ptr in any case, but I'd be surprised if you find any production-quality C++ standard library where a unique_ptr with an empty deleter is not simply the pointer. If you were happy with reinterpret_cast before, you should be just as happy afterwards.

from wasm-c-api.

rossberg commented on June 23, 2024

Can I interest you in creating a PR for making this change? (including adapting the wasm-v8.cc prototype?) :)

from wasm-c-api.

rossberg commented on June 23, 2024

Thanks a lot for the help, I'll have a look at your PR soon.

As for Shared, IIRC my intention was that the template specialisation for each sharable type (currently, only Module) be declared in wasm.hh, since the implementation is always going to be type-specific. I just forgot doing that, and the toolchain did not complain. Do you foresee a problem with that path? (The reason for using a template is so that there is a natural, uniform way for writing shared types.)

from wasm-c-api.

nlewycky commented on June 23, 2024

I think there might be an issue with that, but we can defer it to another issue if there is. Making a specialization of a struct means that you replace the entire innards of the struct.

template<typename T> class Shared { int x; };
template<> class Shared<char> { using x = char; };

There's no need for the specialization's contents to have any relationship with the template pattern, so the API shown in wasm.hh isn't necessarily the API that the specialization will have. Offhand I'm not sure if there's any special-case rules that come into play if the specialization is declared but not defined and a template pattern is provided, etc.

from wasm-c-api.

rossberg commented on June 23, 2024

Ah, I may have been using inaccurate terminology above. I meant that the class template specialisation for Shared<Module> is defined in the header, but its methods are merely declared.

My understanding is that a class template specialization that is declared but not defined yields an incomplete type, so is only useful as a forward declaration.

from wasm-c-api.

C++ opaque handle implementation is based on undefined behaviour about wasm-c-api HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent