Comments (8)
Based on the discussion yesterday, one option is to introduce a compiler debug mode, where we send each intermediate tensor value back to the host. The tensor program will look like:
let x = foo(...)
sendToHost(x)
let y = bar(x, ...)
sendToHost(y)
...
TFGraphLowering threads the effectful ops (sendToHost is implemented via TF mechanisms FifoEnqueue / send / outfeed depending on the TF device, and is an effectful op) via control dependency, and makes sure they run before we return from the function. So in this example, sendToHost(x)
will run before sendToHost(y)
, which will in turn run before the function returns.
sendToHost(x)
and let y = bar(x, ...)
can run in parallel, as expected. This allows TF execution to not get blocked on the sends, and yet host should receive the tensors fairly quickly due to the scheduling and bounded queuing in the TF impl. For example, when x
and y
are both sent out of TPU via outfeed, outfeed is a bounded-size queue, so TPU execution will get blocked by the consumer of the outfeed queue at some point, if TPU runs ahead and keeps producing new values into the outfeed queue.
As such, we probably do not need to introduce another "barrier" that forces sendToHost(x)
to run before let y = bar(x, ...)
, but if needed we can introduce such an explicit programming construct.
from swift.
Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete.
Yes, but that feels like an implementation detail to me. The actual behavior and the user expectation in eager execution are still sync-per-statement.
But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.
Agreed. That's something we can start with. When too many users complain about debuggability, we can then evaluate whether debug mode should expose the same behavior as eager execution.
from swift.
I looked at a somewhat simplified test case:
import TensorFlow
var x = Tensor<Float>(1.0)
x += x
print("interlude")
x += x
let _ = x.array
Confirmed that while host is running the tensor program in a separate thread, it does print("interlude") right away. Some options I can think of are:
-
Introduce a special form of print/debug which compiler understands, and will schedule it properly w.r.t. tensor statements. One example is to recommend that user use a TF op for printing (https://github.com/tensorflow/tensorflow/blob/9054c9b2ac303cbd1538166d0821f389cbc75894/tensorflow/core/ops/logging_ops.cc#L30). We can also teach compiler to special case on the swift print() function, though that seems brittle.
-
Introduce a special compilation mode (or use the debug mode), where compiler turns each tensor statement into a TF graph. This leads to TF eager style execution, and provides "per-statement debugging" support as in lldb (https://groups.google.com/a/tensorflow.org/d/msg/swift/ZAtPx-R4Dc4/23FLmTXQAwAJ).
BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:
%30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
%31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
%32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
cond_br %32, bb1, bb2 // id: %33
This may be because the code is at top-level. Full SIL code as input to TFPartition pass is below.
---- INPUT FUNCTION main ----------
// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
alloc_global @$S4test1x10TensorFlow0B0VySfGvp // id: %2
%3 = float_literal $Builtin.FPIEEE32, 0x3F800000 // 1 // user: %4
%4 = builtin "__tfop_tfc.scalarToTensor,$in"(%3 : $Builtin.FPIEEE32) : $TensorHandle<Float> // users: %12, %11, %10, %9, %9, %8, %7, %6, %5
strong_retain %4 : $TensorHandle<Float> // id: %5
strong_retain %4 : $TensorHandle<Float> // id: %6
strong_retain %4 : $TensorHandle<Float> // id: %7
strong_retain %4 : $TensorHandle<Float> // id: %8
%9 = builtin "__tfop_Add,$in,$in"(%4 : $TensorHandle<Float>, %4 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %57, %56, %55, %53, %53, %52, %51, %50, %49
strong_release %4 : $TensorHandle<Float> // id: %10
strong_release %4 : $TensorHandle<Float> // id: %11
strong_release %4 : $TensorHandle<Float> // id: %12
%13 = integer_literal $Builtin.Word, 1 // user: %16
%14 = integer_literal $Builtin.Int64, 1 // user: %15
%15 = struct $Int (%14 : $Builtin.Int64) // user: %20
%16 = alloc_ref [tail_elems $Any * %13 : $Builtin.Word] $_ContiguousArrayStorage<Any> // user: %17
%17 = upcast %16 : $_ContiguousArrayStorage<Any> to $_ContiguousArrayStorageBase // users: %25, %26, %24, %35, %22
%18 = integer_literal $Builtin.Int64, 2 // user: %19
%19 = struct $UInt (%18 : $Builtin.Int64) // user: %20
%20 = struct $_SwiftArrayBodyStorage (%15 : $Int, %19 : $UInt) // user: %21
%21 = struct $_ArrayBody (%20 : $_SwiftArrayBodyStorage) // user: %23
%22 = ref_element_addr %17 : $_ContiguousArrayStorageBase, #_ContiguousArrayStorageBase.countAndCapacity // user: %23
store %21 to %22 : $*_ArrayBody // id: %23
%24 = ref_tail_addr %17 : $_ContiguousArrayStorageBase, $Any // user: %27
strong_retain %17 : $_ContiguousArrayStorageBase // id: %25
strong_release %17 : $_ContiguousArrayStorageBase // id: %26
%27 = init_existential_addr %24 : $*Any, $String // user: %42
%28 = integer_literal $Builtin.Int8, 2 // users: %65, %107, %101, %97
%29 = metatype $@thick UInt8.Type // users: %88, %31, %30
%30 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
%31 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %32
%32 = builtin "is_same_metatype"(%30 : $@thick Any.Type, %31 : $@thick Any.Type) : $Builtin.Int1 // user: %33
cond_br %32, bb1, bb2 // id: %33
bb1: // Preds: bb0 bb2
%34 = global_addr @$S4test1x10TensorFlow0B0VySfGvp : $*Tensor<Float> // user: %85
%35 = struct $_ContiguousArrayBuffer<Any> (%17 : $_ContiguousArrayStorageBase) // user: %36
%36 = struct $Array<Any> (%35 : $_ContiguousArrayBuffer<Any>) // user: %48
%37 = string_literal utf8 "interlude" // user: %40
%38 = integer_literal $Builtin.Int64, 9 // user: %40
// function_ref specialized _StringGuts.init<A>(_:)
%39 = function_ref @$Ss11_StringGutsVyABs010_UnmanagedA0VyxGcs17FixedWidthIntegerRzs08UnsignedF0RzlufCs5UInt8V_Tgq5Tf4xd_n : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %40
%40 = apply %39(%37, %38) : $@convention(thin) (Builtin.RawPointer, Builtin.Int64) -> @owned _StringGuts // user: %41
%41 = struct $String (%40 : $_StringGuts) // user: %42
store %41 to %27 : $*String // id: %42
// function_ref default argument 1 of print(_:separator:terminator:)
%43 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA0_ : $@convention(thin) () -> @owned String // user: %44
%44 = apply %43() : $@convention(thin) () -> @owned String // user: %48
// function_ref default argument 2 of print(_:separator:terminator:)
%45 = function_ref @$Ss5print_9separator10terminatoryypd_S2StFfA1_ : $@convention(thin) () -> @owned String // user: %46
%46 = apply %45() : $@convention(thin) () -> @owned String // user: %48
// function_ref print(_:separator:terminator:)
%47 = function_ref @$Ss5print_9separator10terminatoryypd_S2StF : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> () // user: %48
%48 = apply %47(%36, %44, %46) : $@convention(thin) (@owned Array<Any>, @owned String, @owned String) -> ()
strong_retain %9 : $TensorHandle<Float> // id: %49
strong_retain %9 : $TensorHandle<Float> // id: %50
strong_retain %9 : $TensorHandle<Float> // id: %51
strong_retain %9 : $TensorHandle<Float> // id: %52
%53 = builtin "__tfop_Add,$in,$in"(%9 : $TensorHandle<Float>, %9 : $TensorHandle<Float>) : $TensorHandle<Float> // users: %71, %76, %73, %58, %54
%54 = struct $Tensor<Float> (%53 : $TensorHandle<Float>) // user: %85
strong_release %9 : $TensorHandle<Float> // id: %55
strong_release %9 : $TensorHandle<Float> // id: %56
strong_release %9 : $TensorHandle<Float> // id: %57
strong_retain %53 : $TensorHandle<Float> // id: %58
// function_ref implicit closure #1 in Tensor.array.getter
%59 = function_ref @$S10TensorFlow0A0V5arrayAA11ShapedArrayVyxGvgSSyXKfu_ : $@convention(thin) () -> @owned String // user: %60
%60 = convert_function %59 : $@convention(thin) () -> @owned String to $@convention(thin) @noescape () -> @owned String // user: %61
%61 = thin_to_thick_function %60 : $@convention(thin) @noescape () -> @owned String to $@noescape @callee_guaranteed () -> @owned String // user: %69
%62 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/TensorFlow/Tensor.swift" // user: %64
%63 = integer_literal $Builtin.Word, 93 // user: %65
%64 = builtin "ptrtoint_Word"(%62 : $Builtin.RawPointer) : $Builtin.Word // user: %65
%65 = struct $StaticString (%64 : $Builtin.Word, %63 : $Builtin.Word, %28 : $Builtin.Int8) // user: %69
%66 = integer_literal $Builtin.Int64, 765 // user: %67
%67 = struct $UInt (%66 : $Builtin.Int64) // user: %69
// function_ref debugLog(_:file:line:)
%68 = function_ref @$S10TensorFlow8debugLog_4file4lineySSyXK_s12StaticStringVSutF : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> () // user: %69
%69 = apply %68(%61, %65, %67) : $@convention(thin) (@noescape @callee_guaranteed () -> @owned String, StaticString, UInt) -> ()
// function_ref __tf_receive
%70 = function_ref @__tf_receive : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // user: %71
%71 = apply %70<Float>(%53) : $@convention(thin) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@owned TensorHandle<τ_0_0>) -> @owned TensorHandle<τ_0_0> // users: %75, %74
// function_ref TensorHandle.makeHostCopy()
%72 = function_ref @$S10TensorFlow0A6HandleC12makeHostCopyAA11ShapedArrayVyxGyF : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // user: %74
strong_retain %53 : $TensorHandle<Float> // id: %73
%74 = apply %72<Float>(%71) : $@convention(method) <τ_0_0 where τ_0_0 : AccelerableByTensorFlow> (@guaranteed TensorHandle<τ_0_0>) -> @owned ShapedArray<τ_0_0> // users: %79, %77
strong_release %71 : $TensorHandle<Float> // id: %75
strong_release %53 : $TensorHandle<Float> // id: %76
%77 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.buffer // user: %78
strong_release %77 : $TensorBuffer<Float> // id: %78
%79 = struct_extract %74 : $ShapedArray<Float>, #ShapedArray.shape // user: %80
%80 = struct_extract %79 : $Array<Int>, #Array._buffer // user: %81
%81 = struct_extract %80 : $_ContiguousArrayBuffer<Int>, #_ContiguousArrayBuffer._storage // user: %82
strong_release %81 : $_ContiguousArrayStorageBase // id: %82
%83 = integer_literal $Builtin.Int32, 0 // user: %84
%84 = struct $Int32 (%83 : $Builtin.Int32) // user: %86
store %54 to %34 : $*Tensor<Float> // id: %85
return %84 : $Int32 // id: %86
bb2: // Preds: bb0
%87 = integer_literal $Builtin.Int1, -1 // user: %92
%88 = init_existential_metatype %29 : $@thick UInt8.Type, $@thick Any.Type // user: %91
%89 = metatype $@thick UInt16.Type // user: %90
%90 = init_existential_metatype %89 : $@thick UInt16.Type, $@thick Any.Type // user: %91
%91 = builtin "is_same_metatype"(%88 : $@thick Any.Type, %90 : $@thick Any.Type) : $Builtin.Int1 // user: %92
%92 = builtin "int_expect_Int1"(%91 : $Builtin.Int1, %87 : $Builtin.Int1) : $Builtin.Int1 // user: %93
cond_br %92, bb1, bb3 // id: %93
bb3: // Preds: bb2
%94 = string_literal utf8 "" // user: %96
%95 = integer_literal $Builtin.Word, 0 // user: %97
%96 = builtin "ptrtoint_Word"(%94 : $Builtin.RawPointer) : $Builtin.Word // user: %97
%97 = struct $StaticString (%96 : $Builtin.Word, %95 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
%98 = string_literal utf8 "/usr/local/google/home/hongm/git/oss/swift-source/swift/stdlib/public/core/UnmanagedString.swift" // user: %100
%99 = integer_literal $Builtin.Word, 96 // user: %101
%100 = builtin "ptrtoint_Word"(%98 : $Builtin.RawPointer) : $Builtin.Word // user: %101
%101 = struct $StaticString (%100 : $Builtin.Word, %99 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
%102 = integer_literal $Builtin.Int64, 73 // user: %103
%103 = struct $UInt (%102 : $Builtin.Int64) // user: %111
%104 = string_literal utf8 "Fatal error" // user: %106
%105 = integer_literal $Builtin.Word, 11 // user: %107
%106 = builtin "ptrtoint_Word"(%104 : $Builtin.RawPointer) : $Builtin.Word // user: %107
%107 = struct $StaticString (%106 : $Builtin.Word, %105 : $Builtin.Word, %28 : $Builtin.Int8) // user: %111
%108 = integer_literal $Builtin.Int32, 0 // user: %109
%109 = struct $UInt32 (%108 : $Builtin.Int32) // user: %111
// function_ref _fatalErrorMessage(_:_:file:line:flags:)
%110 = function_ref @$Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never // user: %111
%111 = apply %110(%107, %97, %101, %103, %109) : $@convention(thin) (StaticString, StaticString, StaticString, UInt, UInt32) -> Never
unreachable // id: %112
} // end sil function 'main'
from swift.
BTW, the deabstraction pass produces some SIL that's surprising and redundant to me. e.g. Here's the end of bb0:
Right, top level code plus print
(which will then do Any
coercion) will emit all kinds of weirdness.
from swift.
The solution depends on how we converge on the technical direction of the programming model:
-
Sync implicitly on side-effecting ops in debug mode and insert a trivial send/receive control dependency to the graph. The benefit is that users get the exact same behavior as eager execution. Since the semantics of Swift does not prevent code motion around a side-effecting operation, the compiler will generate code that has the fully async behavior today in release mode.
-
Treat
Tensor
and graph-extractable sub-programs as async, and tell users "get used to it if you don't like it". The benefit is that we have a single execution model that always has the same async behavior. The big downside is that users will almost certainly be surprised during debugging because they are not (and probably should not) aware of how a graph is formed (especially in the "print done in between two loops" example shown on the top). -
Allow either implicit sync or implicit async, while allowing the user to opt in the opposite behavior through an API call with a trailing closure.
I'd personally prefer the implicit sync with opt-in async, because no matter what the semantics or compiler execution order guarantees are, users of eager-style ML frameworks will assume that each line is executed after the previous line.
from swift.
IIRC the current send behavior is that they do not block the rest of the tensor execution. This is ideal and there probably won't be a need for it to block device computation.
If there's a side-effecting operation without data dependency (like a print("hi")
), we'll need to make a decision about whether to make it execute after the last tensor statement above print("hi")
. Similar to current sends, it shouldn't block the rest of the tensor computation.
from swift.
I think there comes a time when abstractions and hiding whats happening under the hood is not a good idea. We should view operations on Tensors as asynchronous, and use data dependencies or barriers to synchronise as necessary. It's not that we're treating them as asynchronous and telling people to get used to it. They are asynchronous - and we should be aware of that (even if we don't need to worry about it in most cases). Even in the eager execution mode, we're really waiting for each individual asynchronous operation to complete (assuming we're using a GPU/TPU/XPU and not just inlining the code on the CPU). But to keep the program synced when necessary we would need some form of wait/callback so we know operations have completed.
import TensorFlow
var f : Float = 1.0
var t : Tensor<Float> = 1.0
f += f // addition
t += t // request to perform addition
t.wait()
print("All TF operations complete")
from swift.
Closed. Further discussions will happen on the mailing list.
from swift.
Related Issues (20)
- Close all open issues and pull requests since this repository has been moved
- Python linking in xcode HOT 5
- Internal error when `public`s are removed from the catch model HOT 3
- Configuring swift-format's line breaking rule
- Add build instructions to Readme
- Error building swift-format HOT 1
- Translation of Google Swift Style Guide
- Unexpected newline before opening brace for method calls with trailing closures
- The rule NoAccessLevelOnExtensionDeclaration doesn't work
- testEagerLoop() in crashers.swift crashers in TFLowerGraph.cpp, after supporting TF tensor receives HOT 1
- Use SILPrinter callback mechanism to annotate the printed SIL code HOT 2
- Bridge Python `numpy.ndarray` to `Array` and `Tensor` HOT 2
- Crash when doing basic Swift's String Interpolation. HOT 10
- [Ubuntu] `Python.import("numpy")` fails in the REPL HOT 5
- Implement UseOnlySpaces HOT 1
- Implement UseTripleSlashForDocumentationComments HOT 1
- implement NeverUseForceTry HOT 1
- Implement NeverUseImplicitlyUnwrappedOptionals HOT 1
- Implement CyclomaticComplexityWarning.swift HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swift.