keras tensorboard的使用, 设置指定GPU及其内存, 强制只使用cpu
发布日期:2021-11-21 04:41:29 浏览次数:40 分类:技术文章

本文共 7336 字,大约阅读时间需要 24 分钟。

1.强制只使用cpu:

import os#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152os.environ["CUDA_VISIBLE_DEVICES"] = ""
注意:os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"可能会改变没有次句时GPU的默认序号。

2. GPU内存占用限制以及Tensorboard的一般使用

补充:主动设定占用内存或者自适应设置GPU内存大小

开始学习用TensorFlow,这东西与Theano不同,默认情况下, 在开启一个Session后,几乎占用显卡的所有显存。如果同一个机器、显卡多个人使用,基本上就是先到先得,后来的程序会崩溃。查了下文档有两种方法控制显存:

第一种是设置成预加载比例:

tf_config = tensorflow.ConfigProto()tf_config.gpu_options.per_process_gpu_memory_fraction = 0.5 # 分配50%session = tensorflow.Session(config=tf_config)

还有一种是自适应,需要多少就占多少:

tf_config = tensorflow.ConfigProto()tf_config.gpu_options.allow_growth = Truesession = tensorflow.Session(config=tf_config)

keras2.0版本已经添加了一些贡献者的新建议,用keras调用tensorboard对训练过程进行跟踪观察非常方便了。

直接上例子:   (注意: 貌似调用tensorboard,训练速度好像被托慢了不少。其实可以记录model.fit的history对象,自己写几行代码显示 )

 
# coding: utf-8import numpy as npfrom keras.datasets import mnistfrom keras.models import Sequentialfrom keras.layers.core import Dense, Dropout, Activationfrom keras.optimizers import SGDfrom keras.utils import np_utilsimport keras.callbacksimport osimport tensorflow as tfimport keras.backend.tensorflow_backend as KTF####################################### TODO: set the gpu memory using fraction ######################################def get_session(gpu_fraction=0.3):    """    This function is to allocate GPU memory a specific fraction    Assume that you have 6GB of GPU memory and want to allocate ~2GB    """    num_threads = os.environ.get('OMP_NUM_THREADS')    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction)    if num_threads:        return tf.Session(config=tf.ConfigProto(            gpu_options=gpu_options, intra_op_parallelism_threads=num_threads))    else:        return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))KTF.set_session(get_session(0.6))  # using 60% of total GPU Memoryos.system("nvidia-smi")  # Execute the command (a string) in a subshellraw_input("Press Enter to continue...")######################batch_size = 128nb_classes = 10nb_epoch = 10nb_data = 28 * 28log_filepath = '/tmp/keras_log'# load data(X_train, y_train), (X_test, y_test) = mnist.load_data()# reshapeprint X_train.shapeX_train = X_train.reshape(X_train.shape[0], X_train.shape[1] * X_train.shape[2])X_test = X_test.reshape(X_test.shape[0], X_test.shape[1] * X_test.shape[2])# rescaleX_train = X_train.astype(np.float32)X_train /= 255X_test = X_test.astype(np.float32)X_test /= 255# convert class vectors to binary class matrices (one hot vectors)Y_train = np_utils.to_categorical(y_train, nb_classes)Y_test = np_utils.to_categorical(y_test, nb_classes)model = Sequential()model.add(Dense(512, input_shape=(nb_data,), init='normal', name='dense1'))  # a sample is a row 28*28model.add(Activation('relu', name='relu1'))model.add(Dropout(0.2, name='dropout1'))model.add(Dense(512, init='normal', name='dense2'))model.add(Activation('relu', name='relu2'))model.add(Dropout(0.2, name='dropout2'))model.add(Dense(10, init='normal', name='dense3'))model.add(Activation('softmax', name='softmax1'))model.summary()model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.001), metrics=['accuracy'])tb_cb = keras.callbacks.TensorBoard(log_dir=log_filepath, write_images=1, histogram_freq=1)# 设置log的存储位置,将网络权值以图片格式保持在tensorboard中显示,设置每一个周期计算一次网络的#权值,每层输出值的分布直方图cbks = [tb_cb]history = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, callbacks=cbks, validation_data=(X_test, Y_test))score = model.evaluate(X_test, Y_test, verbose=0)print('Test score:', score[0])print('Test accuracy;', score[1])

其实可以自己给每一层layer命名一个name, 也可以由keras根据自己的命名规则自动取名,自动命名的规则在Layer类中,代码如下:

 
name = kwargs.get('name')if not name:    prefix = self.__class__.__name__    name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))self.name = name

而在keras的call back模块中,tensorborad class类实现源码可以看出,keras默认将模型的所有层的所有weights, bias以及每一层输出的distribution, histogram等传送到tensorborad,方便在浏览器中观察网络的运行情况。实现源码如下:

 
def set_model(self, model):    self.model = model    self.sess = K.get_session()    if self.histogram_freq and self.merged is None:        for layer in self.model.layers:            for weight in layer.weights:                tf.summary.histogram(weight.name, weight)                if self.write_images:                    w_img = tf.squeeze(weight)                    shape = w_img.get_shape()                    if len(shape) > 1 and shape[0] > shape[1]:                        w_img = tf.transpose(w_img)                    if len(shape) == 1:                        w_img = tf.expand_dims(w_img, 0)                    w_img = tf.expand_dims(tf.expand_dims(w_img, 0), -1)                    tf.summary.image(weight.name, w_img)            if hasattr(layer, 'output'):                tf.summary.histogram('{}_out'.format(layer.name),                                     layer.output)    self.merged = tf.summary.merge_all()

当然也可以指定输出某一些层的,通过tensorboard参数进行设定:

embeddings_freq: frequency (in epochs) at which selected embedding    layers will be saved.embeddings_layer_names: a list of names of layers to keep eye on. If    None or empty list all the embedding layer will be watched.

现在运行最开始的例子,在terminal运行

tensorboard --logdir=/tmp/keras_log

在terminal打开浏览器地址,进入tensorboard可以随意浏览graph, distribution, histogram, 以及sclar列表中的loss, acc等等。

以下摘录自:

TensorBoard will automatically include all runs logged within the sub-directories of the specifiedlog_dir, for example, if you logged another run using:

(log_dir = "logs/run_b")

Then the TensorBoard visualization would look like this:

You can use the unique_log_dir function if you want to record every training run in it’s own directory:

(log_dir = ())

Once again note that it’s not required to record every training run in it’s own directory. Using the default “logs” directory will work just fine, you’ll just only be able to visualize the most recent run using TensorBoard.

需要注意的是,tensorboard默认的slcar一栏只记录了训练集和验证集上的loss,如何想记录展示其他指标,在model.compile的metric中进行添加,例如:

model.compile(  loss = 'mean_squared_error',  optimizer = 'sgd',  metrics= c('mae', 'acc')  # 可视化mae和acc)

3. Keras的train_on_batch函数调用时,使用Tensorboard的方法

转载自()

import numpy as npimport tensorflow as tffrom keras.callbacks import TensorBoardfrom keras.layers import Input, Densefrom keras.models import Modeldef write_log(callback, names, logs, batch_no):    for name, value in zip(names, logs):        summary = tf.Summary()        summary_value = summary.value.add()        summary_value.simple_value = value        summary_value.tag = name        callback.writer.add_summary(summary, batch_no)        callback.writer.flush()    net_in = Input(shape=(3,))net_out = Dense(1)(net_in)model = Model(net_in, net_out)model.compile(loss='mse', optimizer='sgd', metrics=['mae'])log_path = './graph'callback = TensorBoard(log_path)callback.set_model(model)train_names = ['train_loss', 'train_mae']val_names = ['val_loss', 'val_mae']for batch_no in range(100):    X_train, Y_train = np.random.rand(32, 3), np.random.rand(32, 1)    logs = model.train_on_batch(X_train, Y_train)    write_log(callback, train_names, logs, batch_no)        if batch_no % 10 == 0:        X_val, Y_val = np.random.rand(32, 3), np.random.rand(32, 1)        logs = model.train_on_batch(X_val, Y_val)        write_log(callback, val_names, logs, batch_no)         #  batch_no//10

 
 
4. Tensorboard记录每个batch的loss

转载地址:https://blog.csdn.net/xiaojiajia007/article/details/72865764 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:keras 调参, 优化, 一些设置等
下一篇:caffe matlab matcaffe 加载输入网络net时报错

发表评论

最新留言

不错!
[***.144.177.141]2024年03月25日 07时12分01秒