Here's a long-standing issue that we haven't yet visited. The following code results i

I looked at a somewhat simplified test case: <div class="highlight highlight-sourc

Non-data dependent code with side effects - should they sync? about swift HOT 8 CLOSED

google commented on May 5, 2024

Non-data dependent code with side effects - should they sync?

from swift.

Comments (8)

mhong commented on May 5, 2024 2

Based on the discussion yesterday, one option is to introduce a compiler debug mode, where we send each intermediate tensor value back to the host. The tensor program will look like:

let x = foo(...)
sendToHost(x)
let y = bar(x, ...)
sendToHost(y)
...

TFGraphLowering threads the effectful ops (sendToHost is implemented via TF mechanisms FifoEnqueue / send / outfeed depending on the TF device, and is an effectful op) via control dependency, and makes sure they run before we return from the function. So in this example, sendToHost(x) will run before sendToHost(y), which will in turn run before the function returns.

sendToHost(x) and let y = bar(x, ...) can run in parallel, as expected. This allows TF execution to not get blocked on the sends, and yet host should receive the tensors fairly quickly due to the scheduling and bounded queuing in the TF impl. For example, when x and y are both sent out of TPU via outfeed, outfeed is a bounded-size queue, so TPU execution will get blocked by the consumer of the outfeed queue at some point, if TPU runs ahead and keeps producing new values into the outfeed queue.

As such, we probably do not need to introduce another "barrier" that forces sendToHost(x) to run before let y = bar(x, ...), but if needed we can introduce such an explicit programming construct.

from swift.

rxwei commented on May 5, 2024 1

Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete.

Yes, but that feels like an implementation detail to me. The actual behavior and the user expectation in eager execution are still sync-per-statement.

But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.

Agreed. That's something we can start with. When too many users complain about debuggability, we can then evaluate whether debug mode should expose the same behavior as eager execution.

from swift.

mhong commented on May 5, 2024

I looked at a somewhat simplified test case:

import TensorFlow

var x = Tensor<Float>(1.0)
x += x
print("interlude")
x += x
let _ = x.array

Confirmed that while host is running the tensor program in a separate thread, it does print("interlude") right away. Some options I can think of are:

Introduce a special form of print/debug which compiler understands, and will schedule it properly w.r.t. tensor statements. One example is to recommend that user use a TF op for printing (https://github.com/tensorflow/tensorflow/blob/9054c9b2ac303cbd1538166d0821f389cbc75894/tensorflow/core/ops/logging_ops.cc#L30). We can also teach compiler to special case on the swift print() function, though that seems brittle.
Introduce a special compilation mode (or use the debug mode), where compiler turns each tensor statement into a TF graph. This leads to TF eager style execution, and provides "per-statement debugging" support as in lldb (https://groups.google.com/a/tensorflow.org/d/msg/swift/ZAtPx-R4Dc4/23FLmTXQAwAJ).

BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:

  %30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
  cond_br %32, bb1, bb2                           // id: %33

This may be because the code is at top-level. Full SIL code as input to TFPartition pass is below.

---- INPUT FUNCTION main ----------
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  alloc_global @$S4test1x10TensorFlow0B0VySfGvp   // id: %2
  %3 = float_literal $Builtin.FPIEEE32, 0x3F800000 // 1 // user: %4
  %4 = builtin "__tfop_tfc.scalarToTensor,$in"(%3 : $Builtin.FPIEEE32) : $TensorHandle<Float> // users: %12, %11, %10, %9, %9, %8, %7, %6, %5
  strong_retain %4 : $TensorHandle<Float>         // id: %5
  strong_retain %4 : $TensorHandle<Float>         // id: %6
  strong_retain %4 : $TensorHandle<Float>         // id: %7
  strong_retain %4 : $TensorHandle<Float>         // id: %8
  %9 = builtin "__tfop_Add,$in,$in"(%4 : $TensorHandle<Float>, %4 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %57, %56, %55, %53, %53, %52, %51, %50, %49
  strong_release %4 : $TensorHandle<Float>        // id: %10
  strong_release %4 : $TensorHandle<Float>        // id: %11
  strong_release %4 : $TensorHandle<Float>        // id: %12
  %13 = integer_literal $Builtin.Word, 1          // user: %16
  %14 = integer_literal $Builtin.Int64, 1         // user: %15
  %15 = struct $Int (%14 : $Builtin.Int64)        // user: %20
  %16 = alloc_ref [tail_elems $Any * %13 : $Builtin.Word] $_ContiguousArrayStorage<Any> // user: %17
  %17 = upcast %16 : $_ContiguousArrayStorage<Any> to $_ContiguousArrayStorageBase // users: %25, %26, %24, %35, %22
  %18 = integer_literal $Builtin.Int64, 2         // user: %19
  %19 = struct $UInt (%18 : $Builtin.Int64)       // user: %20
  %20 = struct $_SwiftArrayBodyStorage (%15 : $Int, %19 : $UInt) // user: %21
  %21 = struct $_ArrayBody (%20 : $_SwiftArrayBodyStorage) // user: %23
  %22 = ref_element_addr %17 : $_ContiguousArrayStorageBase, #_ContiguousArrayStorageBase.countAndCapacity // user: %23
  store %21 to %22 : $*_ArrayBody                 // id: %23
  %24 = ref_tail_addr %17 : $_ContiguousArrayStorageBase, $Any // user: %27
  strong_retain %17 : $_ContiguousArrayStorageBase // id: %25
  strong_release %17 : $_ContiguousArrayStorageBase // id: %26
  %27 = init_existential_addr %24 : $*Any, $String // user: %42
  %28 = integer_literal $Builtin.Int8, 2          // users: %65, %107, %101, %97
  %29 = metatype $@thick UInt8.Type               // users: %88, %31, %30
  %30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
  cond_br %32, bb1, bb2                           // id: %33

bb1:                                              // Preds: bb0 bb2
  %34 = global_addr @$S4test1x10TensorFlow0B0VySfGvp : $*Tensor<Float> // user: %85
  %35 = struct $_ContiguousArrayBuffer<Any> (%17 : $_ContiguousArrayStorageBase) // user: %36
  %36 = struct $Array<Any> (%35 : $_ContiguousArrayBuffer<Any>) // user: %48
  %37 = string_literal utf8 "interlude"           // user: %40
  %38 = integer_literal $Builtin.Int64, 9         // user: %40
  // function_ref specialized _StringGuts.init<A>(_:)
  %39 = function_ref @$Ss11_StringGutsVyABs010_UnmanagedA0VyxGcs17FixedWidthIntegerRzs08UnsignedF0RzlufCs5UInt8V_Tgq5Tf4xd_n : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %40
  %40 = apply %39(%37, %38) : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %41
  %41 = struct $String (%40 : $_StringGuts)       // user: %42
  store %41 to %27 : $*String                     // id: %42
  // function_ref default argument 1 of print(_:separator:terminator:)
  %43 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA0_ : $@convention(thin) () -> @owned String // user: %44
  %44 = apply %43() : $@convention(thin) () -> @owned String // user: %48
  // function_ref default argument 2 of print(_:separator:terminator:)
  %45 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA1_ : $@convention(thin) () -> @owned String // user: %46
  %46 = apply %45() : $@convention(thin) () -> @owned String // user: %48
  // function_ref print(_:separator:terminator:)
  %47 = function_ref @$Ss5print_9separator10terminatoryypd_S2StF : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> () // user: %48
  %48 = apply %47(%36, %44, %46) : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> ()
  strong_retain %9 : $TensorHandle<Float>         // id: %49
  strong_retain %9 : $TensorHandle<Float>         // id: %50
  strong_retain %9 : $TensorHandle<Float>         // id: %51
  strong_retain %9 : $TensorHandle<Float>         // id: %52
  %53 = builtin "__tfop_Add,$in,$in"(%9 : $TensorHandle<Float>, %9 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %71, %76, %73, %58, %54
  %54 = struct $Tensor<Float> (%53 : $TensorHandle<Float>) // user: %85
  strong_release %9 : $TensorHandle<Float>        // id: %55
  strong_release %9 : $TensorHandle<Float>        // id: %56
  strong_release %9 : $TensorHandle<Float>        // id: %57
  strong_retain %53 : $TensorHandle<Float>        // id: %58
  // function_ref implicit closure #1 in Tensor.array.getter
  %59 = function_ref @$S10TensorFlow0A0V5arrayAA11ShapedArrayVyxGvgSSyXKfu_ : $@convention(thin) () -> @owned String // user: %60
  %60 = convert_function %59 : $@convention(thin) () -> @owned String to $@convention(thin) @noescape () -> @owned String // user: %61
  %61 = thin_to_thick_function %60 : $@convention(thin) @noescape () -> @owned String to $@noescape @callee_guaranteed () -> @owned String // user: %69
  %62 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/TensorFlow/Tensor.swift" // user: %64
  %63 = integer_literal $Builtin.Word, 93         // user: %65
  %64 = builtin "ptrtoint_Word"(%62 : $Builtin.RawPointer) : $Builtin.Word // user: %65
  %65 = struct $StaticString (%64 : $Builtin.Word, %63 : $Builtin.Word, %28 : $Builtin.Int8) // user: %69
  %66 = integer_literal $Builtin.Int64, 765       // user: %67
  %67 = struct $UInt (%66 : $Builtin.Int64)       // user: %69
  // function_ref debugLog(_:file:line:)
  %68 = function_ref @$S10TensorFlow8debugLog_4file4lineySSyXK_s12StaticStringVSutF : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> () // user: %69
  %69 = apply %68(%61, %65, %67) : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> ()
  // function_ref __tf_receive
  %70 = function_ref @__tf_receive : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // user: %71
  %71 = apply %70<Float>(%53) : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // users: %75, %74
  // function_ref TensorHandle.makeHostCopy()
  %72 = function_ref @$S10TensorFlow0A6HandleC12makeHostCopyAA11ShapedArrayVyxGyF : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // user: %74
  strong_retain %53 : $TensorHandle<Float>        // id: %73
  %74 = apply %72<Float>(%71) : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // users: %79, %77
  strong_release %71 : $TensorHandle<Float>       // id: %75
  strong_release %53 : $TensorHandle<Float>       // id: %76
  %77 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.buffer // user: %78
  strong_release %77 : $TensorBuffer<Float>       // id: %78
  %79 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.shape // user: %80
  %80 = struct_extract %79 : $Array<Int>, #Array._buffer // user: %81
  %81 = struct_extract %80 : $_ContiguousArrayBuffer<Int>, #_ContiguousArrayBuffer._storage // user: %82
  strong_release %81 : $_ContiguousArrayStorageBase // id: %82
  %83 = integer_literal $Builtin.Int32, 0         // user: %84
  %84 = struct $Int32 (%83 : $Builtin.Int32)      // user: %86
  store %54 to %34 : $*Tensor<Float>              // id: %85
  return %84 : $Int32                             // id: %86

bb2:                                              // Preds: bb0
  %87 = integer_literal $Builtin.Int1, -1         // user: %92
  %88 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %91
  %89 = metatype $@thick UInt16.Type              // user: %90
  %90 = init_existential_metatype %89 : $@thick UInt16.Type, $@thick Any.Type // user: %91
  %91 = builtin "is_same_metatype"(%88 : $@thick Any.Type, %90 : $@thick Any.Type) : $Builtin.Int1 // user: %92
  %92 = builtin "int_expect_Int1"(%91 : $Builtin.Int1, %87 : $Builtin.Int1) : $Builtin.Int1 // user: %93
  cond_br %92, bb1, bb3                           // id: %93

bb3:                                              // Preds: bb2
  %94 = string_literal utf8 ""                    // user: %96
  %95 = integer_literal $Builtin.Word, 0          // user: %97
  %96 = builtin "ptrtoint_Word"(%94 : $Builtin.RawPointer) : $Builtin.Word // user: %97
  %97 = struct $StaticString (%96 : $Builtin.Word, %95 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %98 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/core/UnmanagedString.swift" // user: %100
  %99 = integer_literal $Builtin.Word, 96         // user: %101
  %100 = builtin "ptrtoint_Word"(%98 : $Builtin.RawPointer) : $Builtin.Word // user: %101
  %101 = struct $StaticString (%100 : $Builtin.Word, %99 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %102 = integer_literal $Builtin.Int64, 73       // user: %103
  %103 = struct $UInt (%102 : $Builtin.Int64)     // user: %111
  %104 = string_literal utf8 "Fatal error"        // user: %106
  %105 = integer_literal $Builtin.Word, 11        // user: %107
  %106 = builtin "ptrtoint_Word"(%104 : $Builtin.RawPointer) : $Builtin.Word // user: %107
  %107 = struct $StaticString (%106 : $Builtin.Word, %105 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %108 = integer_literal $Builtin.Int32, 0        // user: %109
  %109 = struct $UInt32 (%108 : $Builtin.Int32)   // user: %111
  // function_ref _fatalErrorMessage(_:_:file:line:flags:)
  %110 = function_ref @$Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never // user: %111
  %111 = apply %110(%107, %97, %101, %103, %109) : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never
  unreachable                                     // id: %112
} // end sil function 'main'

from swift.

rxwei commented on May 5, 2024

BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:

Right, top level code plus print (which will then do Any coercion) will emit all kinds of weirdness.

from swift.

rxwei commented on May 5, 2024

The solution depends on how we converge on the technical direction of the programming model:

Sync implicitly on side-effecting ops in debug mode and insert a trivial send/receive control dependency to the graph. The benefit is that users get the exact same behavior as eager execution. Since the semantics of Swift does not prevent code motion around a side-effecting operation, the compiler will generate code that has the fully async behavior today in release mode.
Treat Tensor and graph-extractable sub-programs as async, and tell users "get used to it if you don't like it". The benefit is that we have a single execution model that always has the same async behavior. The big downside is that users will almost certainly be surprised during debugging because they are not (and probably should not) aware of how a graph is formed (especially in the "print done in between two loops" example shown on the top).
Allow either implicit sync or implicit async, while allowing the user to opt in the opposite behavior through an API call with a trailing closure.

I'd personally prefer the implicit sync with opt-in async, because no matter what the semantics or compiler execution order guarantees are, users of eager-style ML frameworks will assume that each line is executed after the previous line.

from swift.

rxwei commented on May 5, 2024

IIRC the current send behavior is that they do not block the rest of the tensor execution. This is ideal and there probably won't be a need for it to block device computation.

If there's a side-effecting operation without data dependency (like a print("hi")), we'll need to make a decision about whether to make it execute after the last tensor statement above print("hi"). Similar to current sends, it shouldn't block the rest of the tensor computation.

from swift.

abrarrcas commented on May 5, 2024

I think there comes a time when abstractions and hiding whats happening under the hood is not a good idea. We should view operations on Tensors as asynchronous, and use data dependencies or barriers to synchronise as necessary. It's not that we're treating them as asynchronous and telling people to get used to it. They are asynchronous - and we should be aware of that (even if we don't need to worry about it in most cases). Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete (assuming we're using a GPU/TPU/XPU and not just inlining the code on the CPU). But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.

import TensorFlow

var f : Float = 1.0
var t : Tensor<Float> = 1.0
f += f // addition
t += t // request to perform addition
t.wait()
print("All TF operations complete")

from swift.

rxwei commented on May 5, 2024

Closed. Further discussions will happen on the mailing list.

from swift.

Non-data dependent code with side effects - should they sync? about swift HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent