Git Product home page Git Product logo

Comments (7)

zakajd avatar zakajd commented on September 13, 2024 1

Thanks!
I ended up doing something similar, but used tf.random.uniform and implemented a stateless random operations similar to tf.image.stateless_random_contrast.

from framework-reproducibility.

duncanriach avatar duncanriach commented on September 13, 2024

Hi Jamil (@zakajd),

Thank you for the appreciation. What method of tf.data.Dataset are you calling with the num_parallel_calls parameter?.

Duncan

from framework-reproducibility.

zakajd avatar zakajd commented on September 13, 2024

Currently train and val datasets use the following methods:

train_dataset = (
    train_dataset
    .cache()
    .map(read_tfrecord, num_parallel_calls=AUTOTUNE)
    .shuffle(512)
    .map(
        lambda o, e: (resize_and_rescale(o), resize_and_rescale(e)),
        num_parallel_calls=AUTOTUNE
    )
    .batch(cfg.training.batch_size , drop_remainder=True)
    .map(
        lambda o, e: augment(o, e, training=True), num_parallel_calls=AUTOTUNE
    )
    .prefetch(AUTOTUNE)
)
val_dataset = (
    val_dataset
    .cache()
    .map(read_tfrecord, num_parallel_calls=AUTOTUNE)
    .map(
        lambda o, e: (resize_and_rescale(o), resize_and_rescale(e)),
        num_parallel_calls=AUTOTUNE
    )
    .batch(cfg.training.batch_size, drop_remainder=False)
    .map(lambda o, e: augment(o, e, training=False), num_parallel_calls=AUTOTUNE)
    .prefetch(AUTOTUNE)
)

When setting AUTOTUNE parameter to 1 results are reproducible between different runs on GPU and CPU (but results between GPU and CPU are slightly different). When setting AUTOTUNE=4 or AUTOTUNE=tf.data.AUTOTUNE each run results in a different model.

from framework-reproducibility.

duncanriach avatar duncanriach commented on September 13, 2024

Hi @zakajd, I suspect that my original guidance on this may have come from data augmentation pipelines that did not use tf.data.Dataset. One way to solve this problem with tf.data.Dataset is to serialize the generation of pseudorandom parameters and then pass those into the computationally-expensive data augmentation process, which can then be arbitrarily parallelized. Here is some example code:

import tensorflow as tf
import numpy as np

np.set_printoptions(precision=2, floatmode='fixed')

def augment_random(x):
  random = np.float32(np.random.uniform())
  # The addition here represents a computationally-expensive set of operations
  return np.float32(x) + random

def random_param():
  return np.float32(np.random.uniform())

def augment(x, y):
  # The addition here represents a computationally-expensive set of operations
  return tf.add(tf.cast(x, tf.float32), y)

def nondeterministic_pipeline():
  np.random.seed(123)
  dataset = tf.data.Dataset.range(1, 6)
  dataset = dataset.map(
      # From tf.data.Dataset::map documentation: Note that use of
      # tf.numpy_function or tf.py_function in general precludes the possibility
      # of executing user-defined transformations in parallel (because of the
      # Python GIL).
      lambda x: tf.numpy_function(augment_random, inp=[x], Tout=tf.float32),
      num_parallel_calls=5)
  return np.array(list(dataset.as_numpy_iterator()))

def deterministic_pipeline():
  np.random.seed(123)
  dataset = tf.data.Dataset.range(1, 6)
  dataset = dataset.map(
      # From tf.data.Dataset::map documentation: Note that use of
      # tf.numpy_function or tf.py_function in general precludes the possibility
      # of executing user-defined transformations in parallel (because of the
      # Python GIL).
      lambda x: (x, tf.numpy_function(random_param, inp=[], Tout=tf.float32)),
      num_parallel_calls=1)
  dataset = dataset.map(lambda x, y: augment(x, y), num_parallel_calls=5)
  return np.array(list(dataset.as_numpy_iterator()))

result1 = nondeterministic_pipeline()
result2 = nondeterministic_pipeline()
result3 = deterministic_pipeline()
result4 = deterministic_pipeline()

print("\nGenerate random parameters in parallel:")
print("run 1: ",result1)
print("run 2: ",result2)

print("\nGenerate random parameters in series:")
print("run 1: ",result3)
print("run 2: ",result4)

# Generate random parameters in parallel:
# run 1:  [1.55 2.72 3.29 4.70 5.23]
# run 2:  [1.72 2.23 3.55 4.29 5.70]

# Generate random parameters in series:
# run 1:  [1.70 2.29 3.23 4.55 5.72]
# run 2:  [1.70 2.29 3.23 4.55 5.72]

from framework-reproducibility.

duncanriach avatar duncanriach commented on September 13, 2024

I have also updated the documentation for deterministic data-loader parallelism to cover this topic.

from framework-reproducibility.

duncanriach avatar duncanriach commented on September 13, 2024

I just discovered that the solution that I suggested above has been suggested before, on github/tensorflow/tensorflow issue 13932.

Also, now closing this current issue.

from framework-reproducibility.

duncanriach avatar duncanriach commented on September 13, 2024

The relatively new stateless random image ops, such as tf.image.stateless_sample_distorted_bounding_box can also be used with this approach: a seed-per-example is generated in a single-worker stage and used with these ops in later, parallelized stages.

from framework-reproducibility.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.