How could I use batch normalization in TensorFlow?

I would like to use batch normalization in TensorFlow. I found the related C++ source code in core/ops/nn_ops.cc. However, I did not find it documented on tensorflow.org.

BN has different semantics in MLP and CNN, so I am not sure what exactly this BN does.

I did not find a method called MovingMoments either.

Update July 2016 The easiest way to use batch normalization in TensorFlow is through the higher-level interfaces provided in either contrib/layers, tflearn, or slim.

Previous answer if you want to DIY : The documentation string for this has improved since the release - see the docs comment in the master branch instead of the one you found. It clarifies, in particular, that it's the output from tf.nn.moments.

You can see a very simple example of its use in the batch_norm test code. For a more real-world use example, I've included below the helper class and use notes that I scribbled up for my own use (no warranty provided!):

    """A helper class for managing batch normalization state.                   

    This class is designed to simplify adding batch normalization               
    (http://arxiv.org/pdf/1502.03167v3.pdf) to your model by                    
    managing the state variables associated with it.                            

    Important use note:  The function get_assigner() returns                    
    an op that must be executed to save the updated state.                      
    A suggested way to do this is to make execution of the                      
    model optimizer force it, e.g., by:                                         

      update_assignments = tf.group(bn1.get_assigner(),                         
                                    bn2.get_assigner())                         
      with tf.control_dependencies([optimizer]):                                
        optimizer = tf.group(update_assignments)                                

    """

    import tensorflow as tf


    class ConvolutionalBatchNormalizer(object):
      """Helper class that groups the normalization logic and variables.        

      Use:                                                                      
          ewma = tf.train.ExponentialMovingAverage(decay=0.99)                  
          bn = ConvolutionalBatchNormalizer(depth, 0.001, ewma, True)           
          update_assignments = bn.get_assigner()                                
          x = bn.normalize(y, train=training?)                                  
          (the output x will be batch-normalized).                              
      """

      def __init__(self, depth, epsilon, ewma_trainer, scale_after_norm):
        self.mean = tf.Variable(tf.constant(0.0, shape=[depth]),
                                trainable=False)
        self.variance = tf.Variable(tf.constant(1.0, shape=[depth]),
                                    trainable=False)
        self.beta = tf.Variable(tf.constant(0.0, shape=[depth]))
        self.gamma = tf.Variable(tf.constant(1.0, shape=[depth]))
        self.ewma_trainer = ewma_trainer
        self.epsilon = epsilon
        self.scale_after_norm = scale_after_norm

      def get_assigner(self):
        """Returns an EWMA apply op that must be invoked after optimization."""
        return self.ewma_trainer.apply([self.mean, self.variance])

      def normalize(self, x, train=True):
        """Returns a batch-normalized version of x."""
        if train:
          mean, variance = tf.nn.moments(x, [0, 1, 2])
          assign_mean = self.mean.assign(mean)
          assign_variance = self.variance.assign(variance)
          with tf.control_dependencies([assign_mean, assign_variance]):
            return tf.nn.batch_norm_with_global_normalization(
                x, mean, variance, self.beta, self.gamma,
                self.epsilon, self.scale_after_norm)
        else:
          mean = self.ewma_trainer.average(self.mean)
          variance = self.ewma_trainer.average(self.variance)
          local_beta = tf.identity(self.beta)
          local_gamma = tf.identity(self.gamma)
          return tf.nn.batch_norm_with_global_normalization(
              x, mean, variance, local_beta, local_gamma,
              self.epsilon, self.scale_after_norm)

Note that I called it a ConvolutionalBatchNormalizer because it pins the use of tf.nn.moments to sum across axes 0, 1, and 2, whereas for non-convolutional use you might only want axis 0.

Feedback appreciated if you use it.

From: stackoverflow.com/q/33949786