Evaluation metrics for generative models
A machine learning algorithm is said learn task T from experience E, if its performance measured using performance measure P improves for T with E.
To measure the performance of models there have been many quantitative techniques suggested in past literature for different task such as:-
- Confusion matrix, AUC, PR, F1-score and accuracy for classification task
- MAE, MSE, RMSE, R2 and adj-R2 for regression task
- Dunn’s index and silhouette coefficient for Clustering task
Generative modelling is a different animal from tasks discussed till now. It tries to learn the probability distribution of the experiences and then generate samples looking similar to what it learned, originated from the probability distribution that it learned. Or simply, the model tries to Generate by Generalising. In this post we will learn about some of the evaluation metrics used for generative modelling.
Before moving ahead let me ask you a question. Which of these two sets of generated samples “look” better?
These are the samples generated by two of famous SOTA research works from Google by the name IMAGEN(a diffusion based generative model) and PARTI(an autoregressive generative model). Do you think that we can distinctively say that images on left are…