How to get current available GPUs in tensorflow?
I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
I found that when running
tf.Session() TensorFlow gives information about GPU in the log messages like below:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
In short, I want a function like
tf.get_available_gpus() that will return
['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?
There is an undocumented method called
device_lib.list_local_devices() that enables you to list the devices available in the local process. ( N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of
DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:
from tensorflow.python.client import device_lib def get_available_gpus(): local_device_protos = device_lib.list_local_devices() return [x.name for x in local_device_protos if x.device_type == 'GPU']
Note that (at least up to TensorFlow 1.4), calling
device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small
allow_growth=True, to prevent all of the memory being allocated. See this question for more details.