Pythia: Facebook’s deep learning framework for the vision and language domain

Can you teach AI how to read? Facebook AI Research has yet another open source deep learning offering. Built upon PyTorch, Pythia is a modular framework for deep learning.

It was designed for help with Visual Question Answering (VQA). This means that the AI “reads” a photo and answers questions based on the visual data available. This research can be used, for instance, to automate image captioning by reading text from a photograph.

Reading with deep learning

From the Facebook AI Research announcement regarding open sourcing Pythia:

Features include reference implementations to show how previous state-of-the-art models achieved related benchmark results and to quickly gauge the performance of new models. In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers…Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation. Our goal is to accelerate progress by increasing the reproducibility of these models and results. This will make it easier for the community to build on, and benchmark against, successful systems.

Interested in the research on this topic of Visual Question Answering?

Read the relevant paper Towards VQA Models That Can Read from Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. You can also explore their research to see Pythia-powered examples of how deep learning “reads” text.

Pythia features

From the documentation, Pythia’s features include:

Model Zoo: Reference implementations for VQA using LoRRA, the Pythia model, and BAN.
Distributed: Supports DataParallel and DistributedDataParallel
Multi-tasking: Save time and train multiple datasets simultaneously
Customizable: Customize losses, metrics, scheduling, optimizers, and tensorboard.
Unopinionated: Unopinionated dataset and model implementations
Modules: Implementations for commonly used layers in vision and language domain

Pythia can also be used as a starter codebase or used to bootstrap a VQA project.

Deep learning meets fashionable cats. Source.

Vision & language

Check out the full documentation here for a quickstart guide and available libraries.

A demo of the Pythia model is also on Colab. (You will need to install the necessary data before heading into the playground.)

Get the repo on GitHub and begin your deep learning journey by following the installation guide. The repo also includes pre-trained models that you can build upon in your next deep learning project.

The post Pythia: Facebook’s deep learning framework for the vision and language domain appeared first on JAXenter.

Source : JAXenter

Pythia: Facebook’s deep learning framework for the vision and language domain

Reading with deep learning

Pythia features

Vision & language

You may also like...

Random Post

Recent

Pythia: Facebook’s deep learning framework for the vision and language domain

Reading with deep learning

Pythia features

Vision & language

You may also like...

Still Reeling From Idai, Mozambique Faces Another Powerful Cyclone This Week

Extreme Rainfall Has Been Linked With Low Birth Weight in Babies

Geek Trivia: Which Television Show Contains An Entire Movie Hidden Within It?

Random Post

Recent

Tags