Arvados 3.1.0 Release Notes

March 20, 2025

The Arvados team is pleased to announce Arvados 3.1.0. This release introduces support for AMD ROCm workflows, continued Workbench optimization, and richer data management in Arvados command-line tools, as well as bug fixes throughout the platform. We recommend that existing installations of 3.0.0 or earlier upgrade to 3.1.0. See Upgrading Arvados for instructions.

Arvados API

Arvados now supports containers that rely on AMD ROCm GPU support. It works much like our existing support for NVIDIA CUDA: container requests can declare a dependency on AMD ROCm, along with their hardware requirements, and Crunch will dispatch the container accordingly.

As part of this, API attributes and configuration settings that previously referred to “CUDA” now refer to “GPU.” The API server still accepts container requests that reference the old CUDA attributes and will translate as needed. Any clients that read these fields will need to be updated. Refer to the upgrade notes for details. #21926, #22563, #22568, #22612

Collection create and update methods accept a new replace_segments parameter. This lets clients more efficiently repack the block in a collection. Future releases will see Keep components use this to optimize collections as they’re built. Refer to the replace_segments reference for details. #22319

The Arvados controller can forward requests to a specific port on a running container. This works like the existing container logs and shell functionality. Future releases will use this functionality to let users interact with services on long-running containers. #17209

Workbench

Several Workbench internals have been reworked to improve responsiveness and rendering speed throughout the entire application, with special focus on the most common components like table listings and navigation trees. #22127, #22159

Status updates for running processes are more reliable. #22116

The left-hand navigation panel clips its contents instead of becoming scrollable when listing names are too long to fit in the space. #22566, #22624

Users can resize the right-hand details panel. Workbench remembers the user’s preferred size for both it and the left-hand navigation panel. #22336

Arvados Workbench screenshot showing a narrowed left-hand navigation panel with clipped content and an expanded right-hand details panel

Autocomplete dropdowns throughout Workbench can now be scrolled and size themselves to stay within the browser window. This makes it easier to make selections even when the listing is too long to display. #22358

Context menus and action toolbars have been made more consistent throughout Workbench. #22051, #22593

My Favorites no longer lists items in the trash. #22000

Fixed a bug where toolbar actions might work on an object other than the one being displayed. #22408

Fixed a bug that could cause the left-hand navigation to disappear when expanding My Favorites. #22473

Fixed a bug that could cause the actions toolbar to be cut off when opening the right-hand details panel or resizing the browser window. #22359

Command-line Tools

arv-copy supports a --replication option to set the desired replication level of copied collections. GitHub PR #247, #22008

The --storage-classes and --intermediate-storage-classes options of various tools use the cluster’s configured default storage classes rather than assuming a class named default. GitHub PR #249, #22009

arv-copy looks for a cluster’s credentials in settings.conf if it does not find them in the cluster-specific configuration file. #22602

arvados-cwl-runner redacts credentials from Git remote URL workflow metadata. #22660

Fixed several bugs that could cause arv-mount to serve stale file data after a collection update on the server. A new option --refresh-time lets you configure some of this behavior. #22420

Updated Ruby tools’ version dependency on the arvados gem. #22364

Packaging and Deployment

The arvados-api-server package includes a systemd service definition to run the server using the bundled Passenger. This means you no longer need to install the third-party Passenger package or configure nginx to serve it. Administrators should refer to the upgrade notes for details about how to migrate their installations. #22349, #22396, #22614

The arvados-api-server package will exit the post-installation script with a failure status if it cannot complete configuration to signal that problem to orchestration tools like Salt and Ansible. #22433

The compute node image builder script has been replaced. Instead of configuring the build with command line switches, you run Packer with your cluster configuration and a second YAML configuration file for Ansible. Refer to the compute image build documentation for details. #22217, #22317

The example parameters for a single-node Salt install explicitly list roles for clarity. #22298

The arvados-api-server package post-installation script no longer fails on errors from the gem install that it runs before bundle install. The former can run into gem conflicts as a server accumulates gems over time. Bundle should be able to work around these situations, and cause the script to fail with an error if it can’t. #22647

Crunch

Crunch supports a new configuration option Containers.CloudVMs.DeployRunnerDirectory to specify where the crunch-run binary should be stored on compute nodes. This can be used to dynamically deploy crunch-run on cloud nodes where /tmp is mounted with the noexec option. #22029

crunch-run retries failed API and Keep operations for longer to try to preserve container results. #22455

Fixed a bug where arvados-cwl-runner could set an incorrect output_glob for a container request when a workflow step’s secondary files were generated from an expression. #22466

Fixed a bug where arvados-cwl-runner would crash if a container update was successfully processed but it did not receive a valid response from the API server. #22160

Fixed a bug that could cause arvados-dispatch-cloud to hang at startup while fetching spot instance prices. #22400

crunch-run has new logic to store and load cached Singularity images to prevent crashes if other processes are updating the cache collection at the same time. #20605

Reworded some misleading log messages in crunch-run.txt when converting Docker images to run with Singularity. #20605

Reworded the log message when Crunch encounters an error when checking for spot instance interruptions to clarify that it does not directly affect the running process. #22434

Removed “tunnel connection started/finished” log messages that were repeated a lot and minimally helpful. #22431

crunch-dispatch-local does basic resource accounting. It’s still not suitable for production deployments, but this lays some groundwork to make it useful for single-node installs. #22314, #21926

Servers

The API.MaxIndexDatabaseRead setting is consistently applied to all API list requests, particularly for logs. #22232

All servers in a federation report the correct expires_at time for remote API client authorizations. #22228

The API server returns a 500 Internal Server Error when it encounters various database errors, including deadlocks, to let clients know they can retry the request. #21547, #22476

API error messages will no longer include development suggestions when the server is running Ruby 3.1 or later. #22407

The keepstore index API no longer respects the configured API.RequestTimeout since it’s expected to take a long time by design. #22411

The API server indexes container requests by name+owner to improve performance for this common query. #22327

The API server disables statement timeouts when running database migrations. This will prevent timeouts for any long-running migrations added in future versions. #22435

The default setting for API.MaxConcurrentRailsRequests has been increased from 8 to 16 to avoid deadlocking on some common client access patterns. #22414

Removed a confusing warning log when API requests included unsupported parameters. #21743

Removed an unused index that was accidentally added in 3.0.0. #22467

Security Improvements

The Arvados Rails API server uses Rails 7.1 and Passenger 6.0.26 to address CVE-2025-26803. #22363, #22608

The Arvados Rails API server uses Rack 3.1.12 to address CVE-2025-27610 and CVE-2025-27111. #22657

Arvados is built using Go 1.23 to address various security issues in older versions. #22422

Development Changes

The interactive test runner detects whether a graphical interface is available, and does not run Cypress in interactive mode if it isn’t. #22316

Fixed several inconsistencies in the list of tests presented by the test runner. #22428, #22506

The Arvados source now includes an Ansible playbook to install and configure all the software necessary to run the Arvados test suite. We are using this playbook in CI and expect it will replace arvados-server install in a future release. The Hacking prerequisites wiki describes how to use it. #22318, #22437, #22489

arvados-cwl-runner provides a plugin for cwltest to read keep: locations. This makes it possible to run arvados-cwl-runner against the latest versions of the CWL conformance suite. #22058

arvados-server install provides a -user-account option to automatically add a user to the docker group. #22316

arvados-server install installs Singularity from a source archive instead of Git to improve reliability. #22644

When our test runner claims a port for an Arvados service to use, it more strictly checks that the expected service listens on that port, and explicitly fails if that does not happen within a few minutes. #22655

Fixed a Rails API server test that could fail if you had previously built packages in your source tree. #22424

Reworked controller’s login integration tests to work on more distributions and in more test environments. #22406

Fixed a Python SDK test that could fail depending on the filesystem settings of /tmp on the test system. #20909

Fixed an arvados-server boot test so it only checks IPv6 connectivity when the test host supports it. #22567

Fixed several Workbench issues that caused React warnings. #22231

Improved the reliability of several Workbench tests. #22483, #22545

When we build Docker images for testing and deployment, we consistently use --mount instead of --volume to avoid bugs caused by creating new empty mount points. #22567