About - CIS4496 GAN Project

Project Description

Our project is titled "I am Something of a Painter Myself," it is a GAN(generative adversarial network) related competition conducted on Kaggle. A GAN allows you to imitate the unique color choices and brushstrokes employed by renowned artists such as Monet. The goal of this competition is to create a GAN that will output images in the style of a Monet to trick classifiers into believing that it is a true Monet, not a generated one. The better quality your images can mimic a Monet, the lower your MiFID (Memorization-informed Fréchet Inception Distance) and the higher you will move up in the leaderboard.

Model

The modeling algorithm used for our project was a specific type of GAN, the cycleGAN. The cycleGAN specializes in image-to-image translation which is how we can convert a normal image into one resembling the styles of our artists. The cycleGAN consists of a source and target domain. The source domain consists of non-painting images and the target domain consists of paintings by the artists. The cycleGAN also consists of two generators and two discriminators. One generator's job is to take images from the source domain and turn them into the target domain. The second generator does the opposite of the first domain. The discriminator takes in the real images from the target domain and the generated images from the generator to predict if they are real or fake.

Evaluation Metric

"The Frechet Inception Distance score, or FID for short, is a metric that calculates the distance between feature vectors calculated for real and generated images. The score summarizes how similar the two groups are in terms of statistics on computer vision features of the raw images calculated using the inception v3 model used for image classification. Lower scores indicate the two groups of images are more similar, or have more similar statistics, with a perfect score being 0.0 indicating that the two groups of images are identical. The FID score is used to evaluate the quality of images generated by generative adversarial networks, and lower scores have been shown to correlate well with higher quality images." [How to Implement the Frechet Inception Distance (FID) for Evaluating GANs by Jason Brownlee on Machine Learning Mastery]

In FID, we use the Inception network to extract features from an intermediate layer. Then we model the data distribution for these features using a multivariate Gaussian distribution with mean µ and covariance Σ. The FID between the real images and generated images is computed as:

FID = ||μ_r - μ_g||² + Tr(Σ_r + Σ_g - 2(Σ_rΣ_g)^1/2)

where sums up all the diagonal elements. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network.

MiFID is the modified FID score that penalizes models producing images too similar to the training set,

MiFID(S_g, S_t) = m_τ(S_g, S_t) · FID(S_g, S_t)

where S_g is the generated set and S_t is the original training set. m_τ is the memorization penalty which is based on thresholding the memorization distance s of generated and true distribution.

If you are interesting in learning more about the evaluation metric for GANs, please refer to https://www.kaggle.com/competitions/gan-getting-started/overview/evaluation and https://arxiv.org/abs/2103.09396