Cloud and Infrastructure Architecture

Cloud Service Models

Model	You Manage	Provider Manages	Example
SaaS	Config only	Everything else	Salesforce, GitHub
PaaS	Apps & data	Runtime, OS, hardware	Heroku, App Engine
FaaS	Functions	Scaling, runtime, infra	Lambda, Cloud Functions
IaaS	Apps, data, runtime, OS	Virtualisation, hardware	EC2, GCE

Table 6.1 — Cloud service models ranked by level of abstraction (highest to lowest).

Guideline

Prefer higher-level services (PaaS > IaaS) when they fit your constraints. They shift operational burden to the provider. Use IaaS only when you need fine-grained control or the higher-level service doesn't meet compliance or performance requirements.

Deployment Strategies

Strategy	Availability	Latency	Cost	Complexity
Single region	Medium	Low (local)	Low	Low
Active-Passive	High	Low (failover ~min)	Medium	Medium
Active-Active	Very high	Low (global)	High	High

Table 6.2 — Deployment strategy comparison across key dimensions.

Infrastructure as Code (IaC)

Treat infrastructure the same way you treat application code:

Terraform / OpenTofu — declarative, stateful, multi-cloud
Pulumi — infrastructure in general-purpose languages (TypeScript, Python, Go)
CloudFormation / CDK — AWS-native
Ansible — configuration management, procedural

IaC Best Practices

Practice	Why
Store state remotely (S3, Terraform Cloud)	Prevents state loss and enables team collaboration
Review IaC changes in pull requests	Catches misconfigurations before deployment
Use modules	Avoids duplication and enforces standards
Tag all resources	Enables cost tracking by team, project, environment

Cost Governance

Cloud costs spiral without governance. Establish:

Practice	Impact	Effort
Budgets and alerts	Prevents bill shock	Low
Resource tagging	Enables cost attribution	Low
Right-sizing instances	Reduces waste 20-40%	Medium
Auto-scaling	Matches capacity to demand	Medium
Reserved / savings plans	30-60% discount vs on-demand	Low

Table 6.3 — Cost governance practices ranked by impact.

Designing for Resilience

         ┌──────────┐
         │  Load    │
         │ Balancer │
         └────┬─────┘
              │
        ┌─────┼─────┐
        │     │     │
   ┌────▼──┐ ┌▼───┐ ┌▼────┐
   │ App   │ │App │ │ App │
   │ Inst.1│ │Ins.│ │Ins.3│
   └────┬──┘ └────┘ └─────┘
        │
   ┌────▼────┐
   │Circuit  │
   │Breaker  │──► Downstream Service
   └─────────┘

Figure 6.1 — Basic resilient architecture with load balancer, multiple instances, and circuit breaker.

The goal is not to prevent all failures — it’s to limit blast radius and recover automatically:

Load balancers — distribute traffic, health checks
Auto-scaling groups — replace failed instances
Circuit breakers — fail fast when downstream is down
Retries with backoff — handle transient failures
Bulkheads — isolate failure to one component

Remember

Resilience is not just about infrastructure. An architecture where a single database failure takes down the entire system is not resilient — regardless of how many app instances you have running.

Next: Chapter 7 — Security and Data Architecture