I am currently playing around with metaflow and having problems using it in combination with tensorflow. I am trying to define, train and evaluate a model defined with the keras API in seperate steps. The program crashes at the end of the step that defines the model since metaflow tries to store the model as an artifact using pickle which is apparently not supported by tensorflow models. The error message is "TypeError: can't pickle _thread._local objects".
I do not think that this is an issue that can necessarily be fixed in metaflow when pickling is not supported in general by tensorflow models. However I was hoping that someone knows a way to use tensorflow models within metaflow and could share that knowledge.
If it helps here is some example code and the traceback produced when running it (This is using tensorflow 2.0.0):
import tensorflow as tf
from metaflow import FlowSpec, step
class ExampleFlow(FlowSpec):
"""Example of a flow using a tensorflow.keras model"""
@step
def start(self):
"""Defines a model."""
self.model = tf.keras.models.Sequential([
tf.keras.layers.Dense(4, input_shape=(4, ), activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
self.model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
self.next(self.end)
@step
def end(self):
"""Uses the model defined in the prior step."""
self.model.summary()
if __name__ == "__main__":
ExampleFlow()
Metaflow 2.0.0 executing ExampleFlow for user:mfr
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
2019-12-11 19:28:41.009 Workflow starting (run-id 1576088921002498):
2019-12-11 19:28:41.020 [1576088921002498/start/1 (pid 7847)] Task is starting.
2019-12-11 19:28:43.109 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.109000: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-11 19:28:43.136 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.135689: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2096165000 Hz
2019-12-11 19:28:43.138 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.137785: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5638147fe6d0 executing computations on platform Host. Devices:
2019-12-11 19:28:43.263 [1576088921002498/start/1 (pid 7847)] 2019-12-11 19:28:43.137882: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-12-11 19:28:43.263 [1576088921002498/start/1 (pid 7847)] Internal error
2019-12-11 19:28:43.264 [1576088921002498/start/1 (pid 7847)] Traceback (most recent call last):
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/cli.py", line 853, in main
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] start(auto_envvar_prefix='METAFLOW', obj=state)
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 764, in call
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] return self.main(args, kwargs)
2019-12-11 19:28:43.265 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 717, in main
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] rv = self.invoke(ctx)
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] return _process_result(sub_ctx.command.invoke(sub_ctx))
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 956, in invoke
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] return ctx.invoke(self.callback, ctx.params)
2019-12-11 19:28:43.266 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/core.py", line 555, in invoke
2019-12-11 19:28:43.267 [1576088921002498/start/1 (pid 7847)] return callback(args, kwargs)
2019-12-11 19:28:43.749 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/click/decorators.py", line 27, in new_func
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] return f(get_current_context().obj, args, kwargs)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/cli.py", line 430, in step
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] max_user_code_retries)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/task.py", line 447, in run_step
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] output.persist(self.flow)
2019-12-11 19:28:43.750 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 50, in method
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] return f(self, args, kwargs)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 507, in persist
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] sha, size, encoding = self._save_object(obj, var, force_v4)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 431, in _save_object
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] transformable_obj.transform(lambda x: pickle.dumps(x, protocol=2))
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 68, in transform
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] temp = transformer(self._object)
2019-12-11 19:28:43.751 [1576088921002498/start/1 (pid 7847)] File "/home/mfr/anaconda3/envs/data-science/lib/python3.7/site-packages/metaflow/datastore/datastore.py", line 431, in
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)] transformable_obj.transform(lambda x: pickle.dumps(x, protocol=2))
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)] TypeError: can't pickle _thread._local objects
2019-12-11 19:28:43.752 [1576088921002498/start/1 (pid 7847)]
2019-12-11 19:28:43.754 [1576088921002498/start/1 (pid 7847)] Task failed.
2019-12-11 19:28:43.754 Workflow failed.
Step failure:
Step start (task-id 1) failed.