High-quality YouTube video streaming at a lower (internet) data-rate.
Debapriya Tula
Bengaluru, Karnataka
We want to watch youtube videos at high resolution, but not at the cost of losing a lot of our data quota for the day. What if we could reduce the number of frames per sec Youtube sends and reconstruct the scene. Do go through this paper by Niklaus et. al. https://arxiv.org/abs/1708.01692. ...learn more
Project status: Under Development
Artificial Intelligence, Graphics and Media
Intel Technologies
DevCloud
Overview / Usage
What if we could watch videos on Youtube(for now only Youtube) at a resolution of 1080p or 1440p without losing out much on data. Say a data rate of 400 kbps could load entire videos and you watch them without them buffering.
How do we achieve that? Videos are generated at around 30 frames per second(fps). What if we could bring this down to say 20 fps and regenerate/interpolate the intermediate frames. This idea of regeneration is studied under the field of Computer Vision.
For now, it's just a concept. I am looking for like-minded people to work along with me on this project.
This project if successfully implemented holds potential to benefit not just the users of Youtube but also people who upload videos and finally Youtube itself.
Methodology / Approach
The approach that I am using follows the design pattern of the paper "From Here to There: Video Inbetweening Using Direct 3D Convolutions". The model receives 3 inputs, viz, the starting frame(x(s)), the ending frame(x(e)) and a Gaussian noise vector(u ∈ R^d). The output of the model will be a video x(s),x(1),x(2),x(3).........,x(T-2),x(e). The output video varies with the choice of the Gaussian noise vector.
The model consists of three components: an image encoder, a latent representation generator and a video generator. A video discriminator and an image discriminator are added so that the whole model can be trained using adversarial learning to produce realistic video sequences.
It takes a GAN based approach where each of the components mentioned above are trained with an adversarial loss function.
This method was tried but was found to be unsuitable for our problem statement. We are now following a new approach based on Adaptive Convolutions.
The description for the approach has been well outlined in this Medium article: https://medium.com/analytics-vidhya/review-adaconv-video-frame-interpolation-via-adaptive-convolution-video-frame-interpolation-fbce6acaa2a5
Technologies Used
-
Pytorch.
-
ML libraries like numpy and matplotlib.
-
Intel Devcloud for model training.
Repository
https://github.com/Debapriya-Tula/AdaConv-Pytorch
Other links
-
The code adapted to suit our problem and dataset (https://github.com/HyeongminLEE/pytorch-sepconv)
- The detailed explanation of the paper
- Video Frame Interpolation via Adaptive Convolution
Collaborators
There are no people to show.