How to prevent tensorflow from allocating the totality of a GPU memory?

I work in an environment in which computational resources are shared, i.e., we have a few server machines equipped with a few Nvidia Titan X GPUs each.

For small to moderate size models, the 12GB of the Titan X are usually enough for 2-3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the Titan X, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having several users running things on the GPUs at once.

The problem with TensorFlow is that, by default, it allocates the full amount of available memory on the GPU when it is launched. Even for a small 2-layer Neural Network, I see that the 12 GB of the Titan X are used up.

Is there a way to make TensorFlow only allocate, say, 4GB of GPU memory, if one knows that that amount is enough for a given model?

You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument:

    # Assume that you have 12GB of GPU memory and want to allocate ~4GB:
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

    sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

The per_process_gpu_memory_fraction acts as a hard upper bound on the amount of GPU memory that will be used by the process on each GPU on the same machine. Currently, this fraction is applied uniformly to all of the GPUs on the same machine; there is no way to set this on a per-GPU basis.