Evaluation metrics for generative models

Nikhil Verma
5 min readAug 31, 2022

A machine learning algorithm is said learn task T from experience E, if its performance measured using performance measure P improves for T with E.

To measure the performance of models there have been many quantitative techniques suggested in past literature for different task such as:-

  • Confusion matrix, AUC, PR, F1-score and accuracy for classification task
  • MAE, MSE, RMSE, R2 and adj-R2 for regression task
  • Dunn’s index and silhouette coefficient for Clustering task

Generative modelling is a different animal from tasks discussed till now. It tries to learn the probability distribution of the experiences and then generate samples looking similar to what it learned, originated from the probability distribution that it learned. Or simply, the model tries to Generate by Generalising. In this post we will learn about some of the evaluation metrics used for generative modelling.

Before moving ahead let me ask you a question. Which of these two sets of generated samples “look” better?

sample images taken from latest google research articles

These are the samples generated by two of famous SOTA research works from Google by the name IMAGEN(a diffusion based generative model) and PARTI(an autoregressive generative model). Do you think that we can distinctively say that images on left are…

--

--

Nikhil Verma
Nikhil Verma

Written by Nikhil Verma

Knowledge shared is knowledge squared | My Portfolio https://lihkinverma.github.io/portfolio/ | My blogs are living document, updated as I receive comments

No responses yet