Deep learning algorithms in fields like computer vision and natural language processing have seen a movement towards increasingly larger Neural Networks in terms of the depth and the number of parameters. This creates two major downsides for Deep Learning researchers -

  1. It takes a lot of time to train these neural networks even on GPUs.
  2. The memory footprint of these neural networks is so large that they can’t fit on a typical GPU DRAM. This research project aims to explore and develop algorithms for parallel deep learning. We are working on improving both the time as well as the memory efficiency for training large neural networks in a distributed setting. We also seek to scale beyond the current state-of-the-art to train even larger architectures. The aim is to develop a robust and user-friendly deep learning framework that makes it extremely easy for the end user to train large neural networks in distributed environments.