Technology
The Arvados architecture provides a modern open source platform for organizing, managing and processing terabytes to petabytes of data. It allows you to track your methods and datasets, share them securely, and easily re-run analyses.
Key Components
The platform’s key components are a content addressable storage system and a containerized workflow engine.
Keep
Keep is the Arvados storage system for managing and storing large collections of files. Keep combines content addressing and a distributed storage architecture resulting in both high reliability and high throughput. Every file stored in Keep can be accurately verified every time it is retrieved. Keep supports the creation of collections as a flexible way to define data sets without having to re-organize or needlessly copy data. Keep works on a wide range of underlying filesystems and object stores.
Crunch
Crunch is the orchestration system for running CWL workflows. It is designed to maintain data provenance and workflow reproducibility. Crunch automatically tracks data inputs and outputs through Keep and executes workflow processes in Docker containers. In a cloud environment, Crunch optimizes costs by scaling compute on demand.
Security
Arvados has features to help you comply with data protection regulations for authentication, access and audit controls, data integrity, and transmission security. Arvados is a multi-user system. All endpoints are secured by access tokens, data can be encrypted at rest and in transit, and Arvados integrates with a variety of external authentication systems, including Active Directory, Google accounts, LDAP, and OpenID Connect.
Working Environment
You can interact with Arvados functionality using the Workbench web application, the command line, or via the REST API and SDKs.
Workbench
The Workbench web application allows users to interactively access Arvados functionality. It is especially helpful for querying and browsing data, visualizing provenance, and tracking the progress of workflows.
Command Line
The command line interface (CLI) provides convenient access to the Arvados functionality in the Arvados platform from the command line.
API and SDKs
Arvados is designed to be integrated with existing infrastructure. All the services in Arvados are accessed through a RESTful API. SDKs are available for Python, Go, R, Perl, Ruby, and Java.