Arvados is a platform for storing, organizing, processing, and sharing genomic and other big data.

Run anywhere: Arvados supports running in the cloud on AWS, Azure and GCP, as well as on premise.
Large scale: a single Arvados instance can store petabytes of data and use thousands of cores of compute simultaneously.
Everything is an API: Arvados is designed to be integrated with existing infrastructure.

Try it

Playground

Veritas Genetics maintains a public installation of Arvados for evaluation and trial use, the Arvados Playground. Any Google account can be used to log in.

Installation options

Arvados can be installed in a number of ways, as documented on the Installation options page in the Arvados documentation.

Source code

The Arvados source code is available on Github.

Components

Keep

Keep is a content-addressable storage system for managing and storing large collections of files with durable, cryptographically verifiable references and high-throughput processing. Keep works on a wide range of underyling file systems. Learn More >

Crunch

Is a container orchestration engine for running complex, multi-part workflows in a way that is flexible, scalable, and supports versioning, reproducibilty, and provenance. In a cloud environment, Crunch scales compute dynamically. Learn More >

Standards Efforts

The Arvados community is collaborating closely with several standards efforts.

Common Workflow Language

The goal of the CWL project is to create specifications that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.

Global Alliance

The Global Alliance for Genomics and Health (GA4GH) is a global standards body defining data formats and APIs for precision medicine.