The Arvados architecture provides a modern open source platform for organizing, managing and processing terabytes to petabytes of data. It allows you to track your methods and datasets, share them securely, and easily re-run analyses.

Key Components

The platform’s key components are a content addressable storage system and a containerized workflow engine.

Keep

Keep is the Arvados storage system for managing and storing large collections of files. Keep combines content addressing and a distributed storage architecture resulting in both high reliability and high throughput. Every file stored in Keep can be accurately verified every time it is retrieved. Keep supports the creation of collections as a flexible way to define data sets without having to re-organize or needlessly copy data. Keep works on a wide range of underlying filesystems and object stores.

Keep Logo

Crunch

Crunch is the orchestration system for running CWL workflows. It is designed to maintain data provenance and workflow reproducibility. Crunch automatically tracks data inputs and outputs through Keep and executes workflow processes in Docker containers. In a cloud environment, Crunch optimizes costs by scaling compute on demand.

Keep Logo

Working Environment

You can interact with Arvados functionality using the Workbench web application, the command line, or via the REST API and SDKs.

Workbench

The Workbench web application allows users to interactively access Arvados functionality. It is especially helpful for querying and browsing data, visualizing provenance, and tracking the progress of workflows.

Workbench Dashboard

Command Line

The command line interface (CLI) provides convenient access to the Arvados functionality in the Arvados platform from the command line.

API and SDKs

Arvados is designed to be integrated with existing infrastructure. All the services in Arvados are accessed through a RESTful API. SDKs are available for Python, Go, R, Perl, Ruby, and Java.