Natural Language Processing with Dante

In this period, I’m studying natural language processing and its applications. One of my favourite tricks is generating text which reminds someone’s style. For example, here I will show an AI (trained for 100 minutes and just two epochs) that has learned to fake Dante’s Divine Comedy.

import tensorflow as tf
from tensorflow import keras
import numpy as np

filepath = 'dante.txt'

with open(filepath) as f:
dantext =

tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
[encoded] = np.array(tokenizer.texts_to_sequences([dantext])) - 1
dataset_size = tokenizer.document_count
max_id = len(tokenizer.word_index)

train_size = dataset_size
dataset =[:train_size])
n_steps = 100
window_length = n_steps + 1
dataset = dataset.window(window_length, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
batch_size = 32
dataset = dataset.shuffle(10000).batch(batch_size)
dataset = windows: (windows[:, :-1], windows[:, 1:]))
dataset = X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)


model = keras.models.Sequential([
keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id], dropout=0.2, recurrent_dropout=0.2),
keras.layers.GRU(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2),
keras.layers.TimeDistributed(keras.layers.Dense(max_id, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
checkpoint_cb = keras.callbacks.ModelCheckpoint('nldante.h5')
history =, epochs=20, callbacks=[checkpoint_cb])'nldante.h5')

This program divides the dataset into windows: from 100 characters, they learn how to predict the 101st. Obviously varying such a value can affect the program and the patterns it’s able to learn. You can try yourself to see how outputs change. Also, we use batches to make the program faster and more perceptive.

The GRU layers are nice when processing sequences, some of the best you can employ for these purposes. Also, you could try using the LSTM layers instead, but I think that, for this purpose, the GRU layers are better.

Finally, the TimeDistributed layer applies the Dense output layer to different timesteps independently.

Let’s see how to make predictions now.

import tensorflow as tf
from tensorflow import keras
import numpy as np

model = keras.models.load_model('nldante.h5')
filepath = 'dante.txt'
n_chars = 200

with open(filepath) as f:
dantext =

tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
max_id = len(tokenizer.word_index)

def preprocess(texts):
X = np.array(tokenizer.texts_to_sequences(texts)) - 1
return tf.one_hot(X, max_id)

def next_char(text, temperature=1):
X_new = preprocess([text])
y_proba = model.predict(X_new)[0, -1:, :]
rescaled_logits = tf.math.log(y_proba) / temperature
char_id = tf.random.categorical(rescaled_logits, num_samples=1) + 1
return tokenizer.sequences_to_texts(char_id.numpy())[0]

def complete_text(text, n_chars=50, temperature=1):
for _ in range(n_chars):
text += next_char(text, temperature)
return text

print(complete_text(' ', n_chars=n_chars, temperature=0.2))

So, the next character is chosen by considering the probability that the AI outputs. By reiterating such a process, one can generate sentences. You can fix how much you want the probability to be important in the choice of the next character with the temperature parameter: a value close to 0 will favour the high probability values, while a high temperature will give more space to other characters as well.

Here’s a sample output:

 con la figlia da l’altro aspetto
di questo viso che s’accorsa di sua favilla.

non puoi che tu veder le parole,
per che si convenne la mente templante
che s’interna di quel che s’interna vista,
per ch

This really seems Dantesque, don’t you think?

Scroll to top