The Arvados architecture provides a modern open source platform for organizing, managing and processing terabytes to petabytes of data. It allows you to track your methods and datasets, share them securely, and easily re-run analyses.
The platform’s key components are a content addressable storage system and a containerized workflow engine.
Keep is the Arvados storage system for managing and storing large collections of files. Keep combines content addressing and a distributed storage architecture resulting in both high reliability and high throughput. Every file stored in Keep can be accurately verified every time it is retrieved. Keep supports the creation of collections as a flexible way to define data sets without having to re-organize or needlessly copy data. Keep works on a wide range of underlying filesystems and object stores.
Crunch is the orchestration system for running CWL workflows. It is designed to maintain data provenance and workflow reproducibility. Crunch automatically tracks data inputs and outputs through Keep and executes workflow processes in Docker containers. In a cloud environment, Crunch optimizes costs by scaling compute on demand.
You can interact with Arvados functionality using the Workbench web application, the command line, or via the REST API and SDKs.
The Workbench web application allows users to interactively access Arvados functionality. It is especially helpful for querying and browsing data, visualizing provenance, and tracking the progress of workflows.
The command line interface (CLI) provides convenient access to the Arvados functionality in the Arvados platform from the command line.