Towards Robust Human Activity Recognition from RGB Video Stream with Limited Labeled Data

0 0
  • 0 Collaborators

We propose a novel framework that couples skeleton data extracted from RGB video and deep Bidirectional Long Short Term Memory (BLSTM) model for activity recognition. To tackle limited training data, we utilize data augmentation, dynamic frame dropout and gradient injection. ...learn more

Project status: Published/In Market

Robotics, Artificial Intelligence

Intel Technologies
Other

Links [1]

Overview / Usage

Human activity recognition based on video streams has received numerous attentions in recent years. Due to lack of depth information, RGB video based activity recognition performs poorly compared to RGB-D video based solutions. On the other hand, acquiring depth information, inertia etc. is costly and requires special equipment, whereas RGB video streams are available in ordinary cameras. Hence, our goal is to investigate whether similar or even higher accuracy can be achieved with RGB-only modality. In this regard, we propose a novel framework that couples skeleton data extracted from RGB video and deep Bidirectional Long Short Term Memory (BLSTM) model for activity recognition. A big challenge of training such a deep network is the limited training data, and exploring the RGB-only stream significantly exaggerates the difficulty. We, therefore, propose a set of algorithmic techniques to train this model effectively, e.g., data augmentation, dynamic frame dropout, and gradient injection. The experiments demonstrate that our RGB-only solution surpasses the state-of-the-art approaches that all exploit RGB-D video streams by a notable margin. This makes our solution widely deployable with ordinary cameras.

Methodology / Approach

We use RGB-only modality for our experimental evaluations whereas state-of-the-art methods utilized multiple available modalities (depth, inertia and skeleton data). This essentially reduces training data to one forth for our experiments. Hence, we are dealing with one of the key challenges of deep learning, i.e., training with limited labeled data. To train the deep network effectively, we explore data augmentation and a few algorithmic approaches. Experiments on two popular and challenging benchmarks validate the effectiveness of these techniques and our RGB-only solution even surpasses the state-of-the-arts approaches that all exploit RGB-D videos. We believe that the proposed RGB-only scheme is more cost-effective and highly competitive than RGB-D based solutions and therefore widely deployable.

Technologies Used

Keras, Tensorflow, Caffe, Python, OpenPose

Comments (0)