Git Product home page Git Product logo

Comments (8)

mhong avatar mhong commented on May 5, 2024 2

Based on the discussion yesterday, one option is to introduce a compiler debug mode, where we send each intermediate tensor value back to the host. The tensor program will look like:

let x = foo(...)
sendToHost(x)
let y = bar(x, ...)
sendToHost(y)
...

TFGraphLowering threads the effectful ops (sendToHost is implemented via TF mechanisms FifoEnqueue / send / outfeed depending on the TF device, and is an effectful op) via control dependency, and makes sure they run before we return from the function. So in this example, sendToHost(x) will run before sendToHost(y), which will in turn run before the function returns.

sendToHost(x) and let y = bar(x, ...) can run in parallel, as expected. This allows TF execution to not get blocked on the sends, and yet host should receive the tensors fairly quickly due to the scheduling and bounded queuing in the TF impl. For example, when x and y are both sent out of TPU via outfeed, outfeed is a bounded-size queue, so TPU execution will get blocked by the consumer of the outfeed queue at some point, if TPU runs ahead and keeps producing new values into the outfeed queue.

As such, we probably do not need to introduce another "barrier" that forces sendToHost(x) to run before let y = bar(x, ...), but if needed we can introduce such an explicit programming construct.

from swift.

rxwei avatar rxwei commented on May 5, 2024 1

Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete.

Yes, but that feels like an implementation detail to me. The actual behavior and the user expectation in eager execution are still sync-per-statement.

But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.

Agreed. That's something we can start with. When too many users complain about debuggability, we can then evaluate whether debug mode should expose the same behavior as eager execution.

from swift.

mhong avatar mhong commented on May 5, 2024

I looked at a somewhat simplified test case:

import TensorFlow

var x = Tensor<Float>(1.0)
x += x
print("interlude")
x += x
let _ = x.array

Confirmed that while host is running the tensor program in a separate thread, it does print("interlude") right away. Some options I can think of are:

  1. Introduce a special form of print/debug which compiler understands, and will schedule it properly w.r.t. tensor statements. One example is to recommend that user use a TF op for printing (https://github.com/tensorflow/tensorflow/blob/9054c9b2ac303cbd1538166d0821f389cbc75894/tensorflow/core/ops/logging_ops.cc#L30). We can also teach compiler to special case on the swift print() function, though that seems brittle.

  2. Introduce a special compilation mode (or use the debug mode), where compiler turns each tensor statement into a TF graph. This leads to TF eager style execution, and provides "per-statement debugging" support as in lldb (https://groups.google.com/a/tensorflow.org/d/msg/swift/ZAtPx-R4Dc4/23FLmTXQAwAJ).

BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:

  %30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
  cond_br %32, bb1, bb2                           // id: %33

This may be because the code is at top-level. Full SIL code as input to TFPartition pass is below.

---- INPUT FUNCTION main ----------
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  alloc_global @$S4test1x10TensorFlow0B0VySfGvp   // id: %2
  %3 = float_literal $Builtin.FPIEEE32, 0x3F800000 // 1 // user: %4
  %4 = builtin "__tfop_tfc.scalarToTensor,$in"(%3 : $Builtin.FPIEEE32) : $TensorHandle<Float> // users: %12, %11, %10, %9, %9, %8, %7, %6, %5
  strong_retain %4 : $TensorHandle<Float>         // id: %5
  strong_retain %4 : $TensorHandle<Float>         // id: %6
  strong_retain %4 : $TensorHandle<Float>         // id: %7
  strong_retain %4 : $TensorHandle<Float>         // id: %8
  %9 = builtin "__tfop_Add,$in,$in"(%4 : $TensorHandle<Float>, %4 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %57, %56, %55, %53, %53, %52, %51, %50, %49
  strong_release %4 : $TensorHandle<Float>        // id: %10
  strong_release %4 : $TensorHandle<Float>        // id: %11
  strong_release %4 : $TensorHandle<Float>        // id: %12
  %13 = integer_literal $Builtin.Word, 1          // user: %16
  %14 = integer_literal $Builtin.Int64, 1         // user: %15
  %15 = struct $Int (%14 : $Builtin.Int64)        // user: %20
  %16 = alloc_ref [tail_elems $Any * %13 : $Builtin.Word] $_ContiguousArrayStorage<Any> // user: %17
  %17 = upcast %16 : $_ContiguousArrayStorage<Any> to $_ContiguousArrayStorageBase // users: %25, %26, %24, %35, %22
  %18 = integer_literal $Builtin.Int64, 2         // user: %19
  %19 = struct $UInt (%18 : $Builtin.Int64)       // user: %20
  %20 = struct $_SwiftArrayBodyStorage (%15 : $Int, %19 : $UInt) // user: %21
  %21 = struct $_ArrayBody (%20 : $_SwiftArrayBodyStorage) // user: %23
  %22 = ref_element_addr %17 : $_ContiguousArrayStorageBase, #_ContiguousArrayStorageBase.countAndCapacity // user: %23
  store %21 to %22 : $*_ArrayBody                 // id: %23
  %24 = ref_tail_addr %17 : $_ContiguousArrayStorageBase, $Any // user: %27
  strong_retain %17 : $_ContiguousArrayStorageBase // id: %25
  strong_release %17 : $_ContiguousArrayStorageBase // id: %26
  %27 = init_existential_addr %24 : $*Any, $String // user: %42
  %28 = integer_literal $Builtin.Int8, 2          // users: %65, %107, %101, %97
  %29 = metatype $@thick UInt8.Type               // users: %88, %31, %30
  %30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
  %32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
  cond_br %32, bb1, bb2                           // id: %33

bb1:                                              // Preds: bb0 bb2
  %34 = global_addr @$S4test1x10TensorFlow0B0VySfGvp : $*Tensor<Float> // user: %85
  %35 = struct $_ContiguousArrayBuffer<Any> (%17 : $_ContiguousArrayStorageBase) // user: %36
  %36 = struct $Array<Any> (%35 : $_ContiguousArrayBuffer<Any>) // user: %48
  %37 = string_literal utf8 "interlude"           // user: %40
  %38 = integer_literal $Builtin.Int64, 9         // user: %40
  // function_ref specialized _StringGuts.init<A>(_:)
  %39 = function_ref @$Ss11_StringGutsVyABs010_UnmanagedA0VyxGcs17FixedWidthIntegerRzs08UnsignedF0RzlufCs5UInt8V_Tgq5Tf4xd_n : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %40
  %40 = apply %39(%37, %38) : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %41
  %41 = struct $String (%40 : $_StringGuts)       // user: %42
  store %41 to %27 : $*String                     // id: %42
  // function_ref default argument 1 of print(_:separator:terminator:)
  %43 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA0_ : $@convention(thin) () -> @owned String // user: %44
  %44 = apply %43() : $@convention(thin) () -> @owned String // user: %48
  // function_ref default argument 2 of print(_:separator:terminator:)
  %45 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA1_ : $@convention(thin) () -> @owned String // user: %46
  %46 = apply %45() : $@convention(thin) () -> @owned String // user: %48
  // function_ref print(_:separator:terminator:)
  %47 = function_ref @$Ss5print_9separator10terminatoryypd_S2StF : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> () // user: %48
  %48 = apply %47(%36, %44, %46) : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> ()
  strong_retain %9 : $TensorHandle<Float>         // id: %49
  strong_retain %9 : $TensorHandle<Float>         // id: %50
  strong_retain %9 : $TensorHandle<Float>         // id: %51
  strong_retain %9 : $TensorHandle<Float>         // id: %52
  %53 = builtin "__tfop_Add,$in,$in"(%9 : $TensorHandle<Float>, %9 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %71, %76, %73, %58, %54
  %54 = struct $Tensor<Float> (%53 : $TensorHandle<Float>) // user: %85
  strong_release %9 : $TensorHandle<Float>        // id: %55
  strong_release %9 : $TensorHandle<Float>        // id: %56
  strong_release %9 : $TensorHandle<Float>        // id: %57
  strong_retain %53 : $TensorHandle<Float>        // id: %58
  // function_ref implicit closure #1 in Tensor.array.getter
  %59 = function_ref @$S10TensorFlow0A0V5arrayAA11ShapedArrayVyxGvgSSyXKfu_ : $@convention(thin) () -> @owned String // user: %60
  %60 = convert_function %59 : $@convention(thin) () -> @owned String to $@convention(thin) @noescape () -> @owned String // user: %61
  %61 = thin_to_thick_function %60 : $@convention(thin) @noescape () -> @owned String to $@noescape @callee_guaranteed () -> @owned String // user: %69
  %62 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/TensorFlow/Tensor.swift" // user: %64
  %63 = integer_literal $Builtin.Word, 93         // user: %65
  %64 = builtin "ptrtoint_Word"(%62 : $Builtin.RawPointer) : $Builtin.Word // user: %65
  %65 = struct $StaticString (%64 : $Builtin.Word, %63 : $Builtin.Word, %28 : $Builtin.Int8) // user: %69
  %66 = integer_literal $Builtin.Int64, 765       // user: %67
  %67 = struct $UInt (%66 : $Builtin.Int64)       // user: %69
  // function_ref debugLog(_:file:line:)
  %68 = function_ref @$S10TensorFlow8debugLog_4file4lineySSyXK_s12StaticStringVSutF : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> () // user: %69
  %69 = apply %68(%61, %65, %67) : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> ()
  // function_ref __tf_receive
  %70 = function_ref @__tf_receive : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // user: %71
  %71 = apply %70<Float>(%53) : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // users: %75, %74
  // function_ref TensorHandle.makeHostCopy()
  %72 = function_ref @$S10TensorFlow0A6HandleC12makeHostCopyAA11ShapedArrayVyxGyF : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // user: %74
  strong_retain %53 : $TensorHandle<Float>        // id: %73
  %74 = apply %72<Float>(%71) : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // users: %79, %77
  strong_release %71 : $TensorHandle<Float>       // id: %75
  strong_release %53 : $TensorHandle<Float>       // id: %76
  %77 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.buffer // user: %78
  strong_release %77 : $TensorBuffer<Float>       // id: %78
  %79 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.shape // user: %80
  %80 = struct_extract %79 : $Array<Int>, #Array._buffer // user: %81
  %81 = struct_extract %80 : $_ContiguousArrayBuffer<Int>, #_ContiguousArrayBuffer._storage // user: %82
  strong_release %81 : $_ContiguousArrayStorageBase // id: %82
  %83 = integer_literal $Builtin.Int32, 0         // user: %84
  %84 = struct $Int32 (%83 : $Builtin.Int32)      // user: %86
  store %54 to %34 : $*Tensor<Float>              // id: %85
  return %84 : $Int32                             // id: %86

bb2:                                              // Preds: bb0
  %87 = integer_literal $Builtin.Int1, -1         // user: %92
  %88 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %91
  %89 = metatype $@thick UInt16.Type              // user: %90
  %90 = init_existential_metatype %89 : $@thick UInt16.Type, $@thick Any.Type // user: %91
  %91 = builtin "is_same_metatype"(%88 : $@thick Any.Type, %90 : $@thick Any.Type) : $Builtin.Int1 // user: %92
  %92 = builtin "int_expect_Int1"(%91 : $Builtin.Int1, %87 : $Builtin.Int1) : $Builtin.Int1 // user: %93
  cond_br %92, bb1, bb3                           // id: %93

bb3:                                              // Preds: bb2
  %94 = string_literal utf8 ""                    // user: %96
  %95 = integer_literal $Builtin.Word, 0          // user: %97
  %96 = builtin "ptrtoint_Word"(%94 : $Builtin.RawPointer) : $Builtin.Word // user: %97
  %97 = struct $StaticString (%96 : $Builtin.Word, %95 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %98 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/core/UnmanagedString.swift" // user: %100
  %99 = integer_literal $Builtin.Word, 96         // user: %101
  %100 = builtin "ptrtoint_Word"(%98 : $Builtin.RawPointer) : $Builtin.Word // user: %101
  %101 = struct $StaticString (%100 : $Builtin.Word, %99 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %102 = integer_literal $Builtin.Int64, 73       // user: %103
  %103 = struct $UInt (%102 : $Builtin.Int64)     // user: %111
  %104 = string_literal utf8 "Fatal error"        // user: %106
  %105 = integer_literal $Builtin.Word, 11        // user: %107
  %106 = builtin "ptrtoint_Word"(%104 : $Builtin.RawPointer) : $Builtin.Word // user: %107
  %107 = struct $StaticString (%106 : $Builtin.Word, %105 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
  %108 = integer_literal $Builtin.Int32, 0        // user: %109
  %109 = struct $UInt32 (%108 : $Builtin.Int32)   // user: %111
  // function_ref _fatalErrorMessage(_:_:file:line:flags:)
  %110 = function_ref @$Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never // user: %111
  %111 = apply %110(%107, %97, %101, %103, %109) : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never
  unreachable                                     // id: %112
} // end sil function 'main'

from swift.

rxwei avatar rxwei commented on May 5, 2024

BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:

Right, top level code plus print (which will then do Any coercion) will emit all kinds of weirdness.

from swift.

rxwei avatar rxwei commented on May 5, 2024

The solution depends on how we converge on the technical direction of the programming model:

  1. Sync implicitly on side-effecting ops in debug mode and insert a trivial send/receive control dependency to the graph. The benefit is that users get the exact same behavior as eager execution. Since the semantics of Swift does not prevent code motion around a side-effecting operation, the compiler will generate code that has the fully async behavior today in release mode.

  2. Treat Tensor and graph-extractable sub-programs as async, and tell users "get used to it if you don't like it". The benefit is that we have a single execution model that always has the same async behavior. The big downside is that users will almost certainly be surprised during debugging because they are not (and probably should not) aware of how a graph is formed (especially in the "print done in between two loops" example shown on the top).

  3. Allow either implicit sync or implicit async, while allowing the user to opt in the opposite behavior through an API call with a trailing closure.

I'd personally prefer the implicit sync with opt-in async, because no matter what the semantics or compiler execution order guarantees are, users of eager-style ML frameworks will assume that each line is executed after the previous line.

from swift.

rxwei avatar rxwei commented on May 5, 2024

IIRC the current send behavior is that they do not block the rest of the tensor execution. This is ideal and there probably won't be a need for it to block device computation.

If there's a side-effecting operation without data dependency (like a print("hi")), we'll need to make a decision about whether to make it execute after the last tensor statement above print("hi"). Similar to current sends, it shouldn't block the rest of the tensor computation.

from swift.

abrarrcas avatar abrarrcas commented on May 5, 2024

I think there comes a time when abstractions and hiding whats happening under the hood is not a good idea. We should view operations on Tensors as asynchronous, and use data dependencies or barriers to synchronise as necessary. It's not that we're treating them as asynchronous and telling people to get used to it. They are asynchronous - and we should be aware of that (even if we don't need to worry about it in most cases). Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete (assuming we're using a GPU/TPU/XPU and not just inlining the code on the CPU). But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.

import TensorFlow

var f : Float = 1.0
var t : Tensor<Float> = 1.0
f += f // addition
t += t // request to perform addition
t.wait()
print("All TF operations complete")

from swift.

rxwei avatar rxwei commented on May 5, 2024

Closed. Further discussions will happen on the mailing list.

from swift.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.