Skip to main content

Overview

The Resilient pattern targets mission-critical applications that need high availability, data durability, and disaster recovery. It deploys across 3 Availability Zones with KMS encryption, an Aurora PostgreSQL database, RDS Proxy, and enhanced monitoring.

What’s Included

fjall create app --type resilient
Creates:
  • ECS Fargate with auto-scaling across multiple AZs
  • Aurora PostgreSQL database with automatic failover
  • KMS encryption for data at rest (customer-managed keys)
  • RDS Proxy for connection pooling and failover handling
  • Enhanced monitoring with Database Insights (advanced mode)
  • 30-day backup retention with point-in-time recovery

Architecture

┌─────────────────┐
│   CloudFront    │ (Optional CDN)
└────────┬────────┘
         |
┌────────┴────────┐
│      ALB        │ (Multi-AZ)
└────────┬────────┘
         |
┌────────┴────────┐
│  ECS Fargate    │ (4-20 tasks, Multi-AZ)
│  1024 CPU       │
│  2048 MiB       │
└────────┬────────┘
         |
┌────────┴────────┐
│    Aurora       │ (Multi-AZ, automatic failover)
└─────────────────┘

Generated Infrastructure

#!/usr/bin/env node

import {
  App,
  DatabaseFactory,
  ComputeFactory,
  getConfig,
} from "@fjall/components-infrastructure";

const appName = "critical";
const app = App.getApp(appName);

app.addTags({
  "fjall:costAllocation:owner": "engineering",
});

const criticalStorage = app.addDatabase(
  DatabaseFactory.build("CriticalStorage", {
    type: "Aurora",
    databaseName: "CriticalDatabase",
  }),
);

app.addCompute(
  ComputeFactory.build("CriticalCompute", {
    type: "ecs",
    ecrRepository: app.getDefaultContainerRegistry(),
    services: [
      {
        name: "app",
        capacityProvider: "FARGATE",
        cpu: 1024,
        memoryLimitMiB: 2048,
        desiredCount: 4,
        scaling: {
          minCapacity: 4,
          maxCapacity: 20,
        },
        containers: [
          {
            port: 3000,
            environment: {
              ENVIRONMENT: getConfig().environment,
              DATABASE_HOST: criticalStorage.getHostEndpoint(),
              DATABASE_PORT: `${criticalStorage.getHostPort()}`,
              DATABASE_NAME: criticalStorage.getDatabaseName(),
            },
            secretsImport: {
              DATABASE_PASSWORD: criticalStorage
                .getCredentials()
                .getImport("password"),
            },
          },
        ],
      },
    ],
  }),
);

Specifications

Compute (ECS Fargate)

SettingValue
CPU1024 units (1 vCPU) per task
Memory2048 MiB per task
Tasks4 minimum, 20 maximum
Auto-scalingTarget 70% CPU
Health checksALB + container health
DeploymentBlue/green with rollback

Database (Aurora)

SettingValue
EngineAurora PostgreSQL 16.6
CapacityServerless v2 (scales automatically with load)
Readers2 reader instances
Backups30-day retention
Point-in-time recoveryWithin the 30-day backup window
Multi-AZAutomatic failover
RDS ProxyConnection pooling and failover handling, TLS required
Database InsightsAdvanced mode, encrypted with a customer-managed key

Security

SettingValue
KMS encryptionData at rest with customer-managed keys
VPCMulti-AZ across 3 zones
Interface endpointsECR, Secrets Manager
Flow logs90-day retention
DDoSShield Standard
EncryptionIn-transit and at-rest
CertificatesACM-managed SSL/TLS

High Availability Features

Multi-AZ Deployment

  • ECS tasks spread across 3 AZs
  • Aurora with automatic failover
  • ALB routing across AZs
  • 3 NAT Gateways (one per AZ)

Auto-Scaling Policies

// CPU-based scaling
const scaling = service.autoScaleTaskCount({
  minCapacity: 4,
  maxCapacity: 20,
});

scaling.scaleOnCpuUtilization("CpuScaling", {
  targetUtilizationPercent: 70,
  scaleInCooldown: Duration.seconds(60),
  scaleOutCooldown: Duration.seconds(60),
});

// Request-based scaling
scaling.scaleOnRequestCount("RequestScaling", {
  requestsPerTarget: 1000,
  targetGroup: alb.targetGroup,
});

Database Features

  • Automatic failover: Aurora promotes a reader on writer failure
  • Reader instances: 2 readers for read scaling and redundancy
  • Fast cloning: copy-on-write database copies for testing
  • RDS Proxy: pooled connections survive failover with minimal disruption

Disaster Recovery

RTO and RPO Targets

TargetValue
RTO (Recovery Time Objective)< 30 minutes
RPO (Recovery Point Objective)< 1 minute

Backup Strategy

  1. Continuous backups: Aurora to S3 with point-in-time recovery
  2. Cross-region snapshots: daily
  3. Application state: S3 with versioning
  4. Configuration: AWS Systems Manager

Monitoring and Alerting

CloudWatch Dashboards

  • Service health overview
  • Database performance metrics
  • Application business metrics
  • Cost tracking dashboard

Alarms Configuration

// Application alarms
new cloudwatch.Alarm(this, "HighErrorRate", {
  metric: alb.metricHttpCodeTarget5XXCount(),
  threshold: 10,
  evaluationPeriods: 2,
});

// Database alarms
new cloudwatch.Alarm(this, "DatabaseCPU", {
  metric: database.metricCPUUtilization(),
  threshold: 80,
  evaluationPeriods: 3,
});

Cost Estimation

ResourceSpecificationMonthly Cost
ECS Fargate4-20 tasks (1 vCPU, 2 GB)$150-750
Aurora Serverless v2Scales with load$50-400
Load BalancerMulti-AZ ALB$16
Data TransferCross-AZ$20-100
BackupsS3 storage$10-50
KMSKey usage~$1
Total$250-1300
Costs vary with traffic and database load.

Security Hardening

KMS Encryption

  • All data at rest encrypted with customer-managed KMS keys
  • Aurora storage encryption
  • S3 bucket encryption
  • EBS volume encryption

Compliance Features

FeatureDetail
EncryptionKMS customer-managed keys
Audit logsCloudTrail enabled
Access logsS3 with lifecycle policy
ComplianceSOC2 and HIPAA ready

When to Use

Choose Resilient for:
  • E-commerce platforms
  • Financial services
  • Healthcare applications
  • SaaS platforms
  • Government systems
  • Any mission-critical application
Consider alternatives if:
  • Low-traffic application (use Standard)
  • Cost is the primary concern (use Lightweight)
  • Development only (use Tinkerer)

Migration from Standard

  1. Switch from an RDS Instance to Aurora
  2. Raise the minimum task count to 4
  3. Add KMS encryption
  4. Configure enhanced monitoring
  5. Set up cross-region backups

Best Practices

  1. Test failover procedures regularly
  2. Monitor costs closely, as they scale with traffic
  3. Document runbooks for incidents
  4. Practise chaos engineering
  5. Run regular security audits
  6. Performance-test at scale
  7. Apply cost-allocation tags for tracking

Next Steps

Deploy your application

Push the Resilient stack to AWS.

Add resources

Extend the infrastructure with storage, messaging, and more.

Compute Factory

Customise the ECS Fargate compute layer.

Load balancer

Configure CloudFront and ALB routing.