Photorealistic Style Transfer for Natural Scene Recolorization

Renán Rojas-Gómez

Renán Rojas-Gómez

Champaign, Illinois

1 0
  • 0 Collaborators

A cheap-yet-precise Photorealistic Style Transferring technique for indoor and outdoor scene recoloring under different lighting conditions and local shadowing. ...learn more

Project status: Under Development

Virtual Reality, Artificial Intelligence, Graphics and Media

Groups
Student Developers for AI, DeepLearning, Movidius™ Neural Compute Group

Intel Technologies
DevCloud, Intel Python, Movidius NCS, Intel CPU, OpenVINO

Overview / Usage

1. What is Style Transfer?

Style transfer is a crucial task in Computer Vision that allows to manipulate an image appearance in terms of its color and texture, while preserving its content. It corresponds to an image-to-image translation task that receives a content image, describing the objects of interest, and a style image, depicting the colors and texture of interest. Then, patterns from the style image are imposed over the content image to generate a stylized scene with the same objects, but different visual aspect.

2. Scenario of Interest: Photorealistic Stylization

Example-based Style-Transfer techniques take advantage of pre-trained data-driven networks to decompose an image into style and content, and have recently become very attractive for multiple aplications such as artwork reproduction, virtual reality, interior design, etc. However, despite promising results, few stylization techniques focus on preserving the natural aspect of real scenes, such as indoor scenarios, limiting their use to artistic or entertainment purposes.

While several photorealistic techniques available rely upon a two-step framework, a deepnet based purely-stylization technique followed by a constrained optimization problem to locally refine perceptual properties, their results show significant distorsions and require complex computations. Instead of being confined to the pixel domain, recent methods have proposed to deal with the realistic stylization directly in the latent space of the network. Although these show impressive results, the possibility of jointly applying stylization and fine-grained control directly in the latent space remains mostly unexplored.

3. Project Goals

Based on this, we aim to explore the main limitations of current photorealistic style transfer techniques and their drawbacks regarding control over perceptual factors. Then, we are interested in developing a novel stylization method that preserves the natural looks of a real scene in an efficient manner by taking advantage of Intel's Artificial Intelligence platform.

Methodology / Approach

1. State of the Art Photorealistic Stylization Techniques

We base our development in three State-of-the-Art stylization methods: the Screened Poisson Equation, Closed-Form Solution and Wavelet Corrected Transfer based on Whitening and Coloring Transforms. While the first two methods adopt the two-step process approach, the latter proposes an alternative pooling strategy in the deepnet-based feature extraction step to obtain photorealistic results without any post-processing. We briefly describe the three algorithms:

a. Screened Poisson Equation: In order to lessen typical distorsions generated by the stylization process, such as texturized patterns in originally flat regions as well as uneven edges and corners, a regularization approach is applied to the stylized image such that its gradient field is matched to that of the content image. Then, an optimization problem is built upon a loss function that matches the gradients between content image and output image. Such a simple post-processing alleviates the bogus components and spill-over effects in the stylized image, increasing the photorealism of the transfering process

b. Closed-Form Solution: While traditional feature extraction methods use an Encoding-Decoding network based on max-pooling layers to extract high-level representations of the content image, match them with the feature representations of the style image, and map them back to the pixel domain to obtain the stylized image, the Closed-Form Solution method complements the max-pooling operations by also keeping the indices in which the maximum values appeared. This spatial information is passed to the decoder via skip connections, so that after each upsampling layer, neural features are placed back into their original locations. Second, an alternative prior is adopted such that the content image is characterized by some intrinsic manifold structure. A smoothness constraint is applied over the stylized image to enforce the similarity between associated pixels based on a graph-based strategy. The method is capable of further reducing the noise components, however it overpenalizes sharp details and highly texturized regions in the scene.

c. Wavelet Corrected Transfer based on Whitening and Coloring Transforms:_ Most Style Transferring_ approaches use either max-pooling layers during the feature extraction and reconstruction processes, leading to a loss of spatial consistency. Furthermore, they require the stylization process to be applied in a multi-level fashion: Multiple Encoding-Decoding layers must be independently trained in order to match content and style feature representations at a specific pooling level. Thus, given the fact that the stylization at each feature level degrades the original spatial structure, such an approach introduces severe spatial and intensity distorsions. Based on this limitation, the Wavelet Corrected Transfer based on Whitening and Coloring Transforms method replaces the traditional max pooling and unpooling operations in the encoding-decoding network with a wavelet-based pooling strategy. Furthermore, instead of using the multi-level stylization approach, it employs a single encoding-decoding pass. This significantly decreases the computational complexity of the stylization process and allows preserving photorealism without any post-processing.

2. Methodology

Based on these techniques, we intend to analyze in detail how style and content information are factorized in multiple levels of a CNN. It is of interest to explore the relation between domain adaptation and restylization techniques to clearly understand why is the distribution alignment in the feature space so representative of the perceptual profile of arbitrary images, and how can we guide the process to preserve the original image structure, as well as obtaining sharp control over attributes such as color, texture and scale in a local fashion. Specifically, the methods in our development include:

a. The evaluation of multiple encoding-decoding based feature extraction methods for the stylization stage (Ex: VGGNet, ResNet, etc.) in both precision and required inference time.

b. The efficient implementation of alternative style transformation techniques and their effect in the final output (Ex. Whitening and Colorizing Transformation, Adaptive Instance Normalization, etc.).

c. Explore the possibility of replacing post-processing steps by enforcing photorealism in the latent space itself via alternative pooling strategies and attribute disentanglement methods.

Then, based on our findings, our final goal is to generate a novel Fast Photorealistic Style Transfering implementation that takes advantage of the Intel AI Platform.

Comments (0)