Automatic Greyscale to Color Conversion

Tejas Rangole & James Higgins

Motivation

Color is very important when it comes to memory, emotions, and so many other aspects in life. Giving an image color can change the whole meaning behind an image and add details that would otherwise be lost from just viewing and analyzing it in grayscale. Computerized colorization can be used to help better the use of grayscale images within recognition algorithms by converting them to colored images to help with identifying textures and materials used in objects within these images. Color is extremely important, so improving existing algorithms and developing new and more efficient ways to convert grayscale images to colored ones can help in so many different fields of study. Our goal is to colorize an image automatically with minimal adjustments.

Possible Solutions

Some prior solutions involve non-parametric methods in which given an input grayscale image as well as one or more color reference images to be used as source data. The color is then transferred onto the input grayscale image from analogous regions of the colored reference image (Hertzmann et al., 2001). Similarly another approach is to use similar images (R.K Gupta et.al) by first extracting some amount of super pixels per image. For each superpixel we then extract Gabor features and SURF features and then match each reference pixel to the correct target superpixel using the Cascade Feature algorithm and then finally eliminate unmatched features using the Image Space Voting algorithm. The advantages of the above two approaches is that they do not require large datasets and are computationally faster. However, both of these approaches are harder to implement than neural networks. Additionally, neural network based approaches are more accurate in colorizing an image. Also it goes without saying that a neural network does not require human labeling of images and can colorize an image automatically. We will implement colorization by training a CNN to map from a grayscale input to a distribution over quantized color value outputs.

Approach

We implemented the image colorization approach proposed by Zhang et al. The approach is to convert a color image into the Lab color space where L is the lightness, a is the red-green channel, and b is the blue-yellow channel. We take the greyscale input image, extract the L-channel, input the L-channel into a Convolutional Neural Network (CNN) to predict the ab channel. After we have obtained the ab-channel, we will concatenate it with the L-channel to produce the output image.

The output of the network is a probability distribution of ab values that contains 313 bins. The network is trained with a loss function which includes a class rebalancing term. This class rebalancing term resamples rarer colors so they are more represented in the output image. Once we have gotten a probability distribution for a given grayscale input pixel, we convert it to a single ab-value by interpolating the mean and mode of the distribution. This is done to achieve spatial consistency.

Loss function with rebalancing term v(.)

ab Probability Distribution

Implementation

We used pre-trained models built by Zhang et al which is available on his github repository here: Richard Zhang's repository We implemented the functionality of extracting the L-channel from the input image, pass it as input to the nerual network, and then concatenate the output with the original L-channel to get colored image. Below is some of the code used in the implementation.

                        
                            
                                # Initialize the network and tell it to use the gpu
                                colorizer_eccv16 = eccv16(pretrained=True).eval()
                                colorizer_eccv16.cuda()
                            
                        
                        Here we extract the L-channel from the input image, resize it to 256 x 256 to fit the input size and then convert it to a tensor to pass into the network
                        
                            
                                # Grab L channel of original image
                                img_lab = color.rgb2lab(img)
                                img_l_channel = img_lab[:,:,0]

                                # Resize original image and grab L channel of resized image
                                img_arr = Image.fromarray(img)
                                resized_img_arr = img_arr.resize((256, 256), 3)
                                resized_rgb = np.asarray(resize_img_arr)
                                resized_lab = color.rgb2lab(resized_rgb)
                                resized_l_channel = resized_lab[:,:,0]
                                original_tensor = torch.Tensor(img_l_channel)[None,None,:,:]
                                resized_tensor = torch.Tensor(resized_l_channel)[None,None,:,:]
                            
                        
                        Getting output from neural network and concatentating L channel from original image with predicted ab-channel

                        
                        
                            
                                # Getting output from network
                                output_ab = colorizer.eccv16(resized_tensor).cpu()

                                # Concatenating L and ab channels
                                result_lab = torch.cat((original_tensor), result_ab), dim=1)

                                #Converting back to rgb
                                output_img = color.lab2rgb(result_lab)

Results

We measured the accuracy of the colorizer program using per-pixel accuracy. Each pixel within the newly colored image was compared to the ground truth image. A threshold of 0.05 was used when comparing the pixel range of 0.0:1.0. The threshold was chosen by us after viewing the first image we colored and comparing the per-pixel accuracy as we changed the threshold. 0.05 was the best option when trying to calculate the accuracy of the colorizer program. Using this, we measured multiple different kinds of images, as shown below.

Discussion

As the results show, the colorizer does well when coloring images that have backgrounds with minimum change in color. The program works best when coloring animals and people, in most cases. The colorizer also tends to favor adding to the b channel, as is shown with the tendency of pictures to have yellow added to them. We had one result, the Coca-Cola caps, that seemed to color well, but had a very low per-pixel accuracy. It was mentioned, in the paper we were following, that the results for their program were found using three methods. The most prominent one was the use of people in trying to determine whether the image colored by the program was the real image, or the ground truth. They used the percentage of people that chose their image, over the ground truth, as their metric for successful coloring. If we were to do another test with the colorizer, we would like to use a method similar to this in order to get our results, compared to per-pixel accuracy.

References

Zhang, Isola, Efros. Colorful Image Colorization. In ECCV 2016.
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analo- gies. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM (2001) 327–340

@inproceedings{zhang2016colorful,
title={Colorful Image Colorization},
author={Zhang, Richard and Isola, Phillip and Efros, Alexei A},
booktitle={ECCV},
year={2016}
}
Citation: R. K. Gupta, A. Y. Chia, D. Rajan, E. S. Ng, and H. Zhiyong. Image Colorization Using Similar Images. https://people.cs.clemson.edu/~jzwang/ustc13/mm2012/p369-gupta.pdf

Links

Project Proposal
Midterm Report
Presentation