TensorFlow Introduction

Transfer Learning

Load Data Set

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
(xTrain, xVal), info = tfds.load(
    split=['train[:80%]', 'train[80%:]'],

num_examples = info.splits['train'].num_examples
num_classes = info.features['label'].num_classes

Resize Input Images

Different pretrained NNs have different required input image size.

dim = 224

def format_image(image, label):
  image = tf.image.resize(image, (dim, dim))/255.0
  return  image, label

num_examples = info.splits['train'].num_examples
train_batches = xTrain.shuffle(buffer_size=num_examples//4).map(format_image).batch(BATCH_SIZE).prefetch(1)
validation_batches = xVal.cache().map(format_image).batch(BATCH_SIZE).prefetch(1)

Transfer Learning from TensorFlow Hub

url = ""
extractor = hub.KerasLayer(url, input_shape=(255, 255, 3))
# disable the training so that all weights kept
extractor.trainable = False
model = tf.keras.Sequential([extractor, layers.Dense(2)])

Save Models

Usually, use timestamp as part of the file name so that it is unique.

t = time.time()
path = "./model_{}.h5".format(int(t))

Reload the model.

reloaded = tf.keras.models.load_model({path, custom_objects={'hub.KerasLayer'}})

Export as SavedModel

t = time.time()
path = "./model_{}".format(int(t)), path)

Reload a savedmodel. Notice that the object returned by tf.saved_model.load is not a Keras object.

reload_md = tf.saved_model.load(path)
reload_keras = tf.keras.models.load_model(path, custom_objects={'hub.KerasLayer'})

Download to local.

!zip -r {path}

Time Series


Fixed Partitioning

Split the whole dataset into training, validation, and test period in time sequence.

Roll-Forward Partitioning

Only use a small subset as training set and move forward every week or 10 days to mimic the real life process.

Time Windows

## drop_remainder get rid of last few windows that contains less elements
data =
data = data.window(5, shift=1, drop_remainder=True)
data = data.flat_map(lambda win: win.batch(5))
for win in data:

## use first few as training data and last one as test data
data = win: (win[:-1], win[-1:]))
data = data.shuffle(buffer_size=10)
## prefetch allows later elements to be prepared while the current one is being processed
data = data.batch(2).prefetch(1) 
for x, y in data:
    print(x.numpy(), y.numpy())


Tuning learning rate is tricky for RNN. If it is too high, the RNN will stop learning; if it is too low, the RNN will converge very slowly.

lr_schedule = keras.callbacks.LearningRateScheduler(lambda ep: 1e-7 * 10 ** (e/20))
hist =
plt.semilogx(hist.history["lr"], hist.history["loss"])

The loss is going up and downs during training, very unpredictable. Not a good idea to use a small number for early stop.

es = keras.callbacks.EarlyStopping(patience=50)
checkpoint = keras.callbacks.ModelCheckpoint("md.h5", save_best_only=True), epochs=500, callbacks=[es, checkpoint])

Stateless RNN

At each training iteration, it starts at a zero state and will drop its state after making prediction.

Stateful RNN

The first window is placed at the beginning of the series. The final state vector is preserved for the next training batch, which is located immediately after the previous one.


  • learn long term patterns


  • data set is prepared differently
  • training can be slow
  • consecutive training batches are very correlated, BP may not work well
def seq_window(series, window_size):
    series = tf.expand_dims(series, axis=-1)
    ds =
    ds = ds.window(window_size+1, shift=window_size, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size+1))
    ds = w: (w[:-1], w[1:]))
    return ds.batch(1).prefetch(1) ## use batch=1
model = keras.models.Sequential([
    keras.layers.SimpleRNN(100, return_sequences=True, stateful=True, batch_input_shape=[1,None,1]),
    keras.layers.SimpleRNN(100, return_sequences=True, stateful=True),

We need manually set the state to zero state at the beginning of each epoch.

class ResetState(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs):
reset_ = ResetState()[es, checkpoint, reset_])


Forget Gate: learn when to forget/preserve

Input Gate: output 1, output 0

Output Gate:

model = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, 
                     stateful=True, batch_input_shape=[1,None,1])
    keras.layers.LSTM(100, return_sequences=True, stateful=True),


We can also use 1D Conv Net in time series prediction.

model = keras.models.Sequential([
    keras.layers.Conv1D(filters=32, kernel_size=5,
                       strides=1, padding="causal",
    keras.layers.LSTM(32, return_sequences=True),

Small dilation let layers learn short term patterns, while large dilation ley layers learn long term patterns.

model = keras.models.Sequential()
for dilation in [1,2,4,8,16]:
model.add(keras.layers.Conv1D(filters=1, kernel_size=1))



from tf.keras.preprocessing.text import Tokenizer
# maximum number of words to keep, based on word frequency. 
# Only the most common `num_words-1` words will be kept.
tok = Tokenizer(num_words=10, oov_token="<OOV>")
word_idx = tok.word_index # a dictionary

OOV token

Words that do not appear in dictionary.

Text to Sequences

Use padding and truncating to make sequences same length.

from tf.keras.preprocessing.sequence import pad_sequences
seq = tok.texts_to_sequences(sentences)
# by default, seqs are trucated or padded from the start
padded = pad_sequences(seq, maxlen=10, padding='post', truncating='post')

Word Embeddings

Embeddings are clusters of vectors (represent a given word) in high dimensional space.


  • easy to compute
  • can be visualized


  • fail to consider the order
from tf.keras.layers import Embedding
model = tf.keras.Sequential([
    Embedding(vocab_size, embedding_dim, input_length=max_length), 

In this model, Flattern() can be replaced by GlobalAveragePooling1D(). Their function is to connect Embedding layer with Dense layer.



  • subwords are more likely to appear in the original dataset


  • the meaning may be ambiguous
import tensorflow_datasets as tfds
vocab_size = 1000
tokenizer = tfds.features.text.SubwordTextEncoder.build_from_corpus(sentences, vocab_size, max_subword_length=5)


Text can be affected by words both before or after them.

model = Sequential([
    Bidirectional(LSTM(16), return_sequences=True),


General Language Understanding Evaluation benchmark

a collection of resources for training, evaluating, and analyzing NL understanding systems

Gated Recurrent Unit (GRU)

has reset gate and update gate

similar to LSTM but does not maintain cell state

Text Generation

Predict the next word in a sequence.

  • consider memory and output size constraints
  • add/subtract from layer sizes or embedding dimensions
  • use np.random.choice with the prob for more variance in predicted outputs

Author: csy99
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source csy99 !