Spinnaker Production Environments

Architecture

  • Use MySQL compatible database engine
    • Managed services like Aurora provide cross-region replication
  • Kubernetes
  • Each service has at least 2 replicas to provide basic availability at the Kubernetes level
  • Monitoring is optional but strongly recommended
  • Armory provides optional log Spinnaker aggregation for troubleshooting but we also recommend customers to have a log management solution in place

Kubernetes Considerations

  • The Kubernetes cluster sizing recommendations assume that only Spinnaker, its monitoring, and the Kubernetes operator run in the cluster. It also provides extra room for rolling deployments of Spinnaker itself.
  • If available, use a cluster with nodes in different availability zones.
  • It’s generally better to go with more smaller nodes than fewer larger nodes for the cluster to be more resilient to the loss of nodes. Make sure that nodes are still able to handle the largest pods in terms of CPU and memory.

Database Considerations

  • If available, use cross-region replication to ensure durability of the data stored.
    • Front50’s database contains pipeline definitions and needs to be properly backed up.
    • Orca’s database contains pipeline execution history that is displayed in Spinnaker’s UI.
    • Clouddriver’s database contain your infrastructure cache. If lost, it will need to be re-cached which depending on the size of your infrastructure may take a while. It doesn’t have long term value.
  • Make sure the network latency between Spinnaker and the database cluster is reasonable. It often just means located in the same datacenter.
  • Clouddriver, Orca, and Front50 services must each use a different database. They can be in different database clusters or in the same. A single cluster is easier to manage and more cost effective but the number of connections used by Spinnaker will be added across all services.
    • Your database cluster must support the number of open connections from Spinnaker and any other tool you need. For numbers refer to the database connections chart in the profiles below.
    • Clouddriver connection pools can be tuned via sql.connectionPools.cacheWriter.maxPoolSize and sql.connectionPools.default.maxPoolSize. Both values default to 20 and need to be increased to handle more tasks per Clouddriver.

Redis Considerations

Most services rely on Redis for lightweight storage and/or task coordination. Spinnaker does not store many items in Redis as is reflected in the following recommendations. Redis being single threaded doesn’t need more than one CPU.

When available, a managed Redis (like ElastiCache) can be used. A shared Redis can be used for ease of management.

Spinnaker Settings

To support a high number of API requests, we advise to set the following settings in gate and front50 profiles:

hystrix.threadpool.default.coreSize: x

Where x is given by: maximum API request per seconds * mean response time

We estimated the maximum API request per seconds in the tests below, refer to the Average API Response Time graph below. For instance for 30 request/sec, the coreSize value could be set to: 300 * 0.25 = 75.

Recommendations

Installation Types

Many factors come to sizing Spinnaker, for instance:

  • Number of active users will impact how to size Gate service.
  • Complex pipelines will impact the amount of work the Orca service has to do.
  • Different providers (Kubernetes, GCP, AWS,…) come with very different execution profiles for the Clouddriver service.

This document makes the following assumptions:

  • Pipelines used to evaluate Spinnaker are simple and made of a Deploy and 2 Wait stages for stage scheduling. If you expect your pipelines to be complex, divide the supported executions by the number of non-trivial expected stages (baking, deploying) in your pipelines.
  • API requests simulate potential tool requests as well as user activity. We give number of concurrent users.
  • All services run with at least 2 replicas for basic availability. It is possible to run with fewer replicas at the cost of potential outages.

Base Profile - Kubernetes Deployments

Base profile recommendations

A base deployment of Spinnaker targets organizations with:

  • 50 applications
  • 250 deployments per day over 5 hour window.
  • 30 req/s coming from browser sessions or tools
  • 10x burst for both pipelines and API calls.
ServiceReplicasCPU requestCPU limitMemory requestMemory limits
Clouddriver22000m3000m2.0Gi2.5Gi
Deck2150m300m32Mi64Mi
Dinghy2500m1000m0.5Gi1.0Gi
Echo2500m1000m1.0Gi1.5Gi
Fiat2500m1000m0.5Gi1.0Gi
Front502500m1000m1.0Gi1.5Gi
Gate2750m1000m1.0Gi1.5Gi
Kayenta2500m1000m0.5Gi1.0Gi
Igor2500m1000m0.5Gi1.0Gi
Orca21000m1500m1.0Gi1.5Gi
Rosco2500m1000m0.5Gi1.0Gi
Terraformer2500m1000m0.5Gi1.0Gi
Redis1500m1000m0.5Gi1.0Gi
Total16300m28600m18.56Gi31.125Gi

Load Test: Base Profile (Kubernetes)

Overview

Armory SpinnakerAPI request/sec - baseline (burst)Pipeline trigger/minute - baseline (burst)
2.17.130 (300)0.834 (8.34)

General Service Health

Database Connections

Per Service CPU and Memory

Clouddriver Health

Orca Health

Gate Health


Last modified May 26, 2023: (49c4d003)