AFT Integration Architecture - The Good, Bad, and Ugly

AWS Control Tower Logo

If you've ever manually created AWS accounts in an enterprise environment, you know the pain: 47 different console screens, endless IAM configuration, networking setup that breaks mysteriously, and 2-3 days of your life you'll never get back. AFT exists to solve this nightmare.

AFT combines AWS Control Tower with Terraform to automate the mind-numbing process of manually setting up AWS accounts. Takes about 20-30 minutes to provision an account versus the 1-2 day nightmare of doing it manually through the console. That's assuming your Control Tower setup isn't fucked and your IAM roles actually work on the first try (they won't).

How AFT Actually Works (When It's Not Broken)

AFT Architecture Diagram

AFT runs in a dedicated management account separate from your Control Tower management account. This is critical because mixing them will cause permissions nightmares that'll make you question your career choices. The separation means you need to manage IAM trust relationships between multiple accounts, which is where 80% of your initial setup pain comes from.

The AFT framework uses CodePipeline, Step Functions, and DynamoDB under the hood. Every account request starts with a Terraform file in Git that triggers the pipeline. When it works, it's beautiful. When it breaks, you'll spend hours digging through CloudFormation stack failures wondering why AWSAFTExecution role can't assume AWSControlTowerExecution role even though the trust policy looks correct.

The AFT architecture follows a strict workflow: account request submission, validation, provisioning, global customizations, and targeted customizations. Each stage has its own failure modes that the troubleshooting guide barely covers.

ITSM Integration - Making ServiceNow Suck Less

Most enterprises want AFT to play nice with their ticketing systems. You can build wrapper APIs around the aft-account-request module to translate ServiceNow requests into Terraform JSON without some poor DevOps engineer copy-pasting ticket details into HCL files at 2am. The AFT account customization examples repo shows how to build these integrations properly.

The AFT Blueprints project has pre-built templates for common setups - networking, DNS, backups, the usual shit. The blueprints are actually pretty decent and will save you from reinventing VPC patterns for the 47th time. The AFT Blueprints documentation explains how to implement these patterns, but don't expect them to work perfectly in your environment without customization. The getting started guide covers the basics, though you'll need to adapt everything for your specific requirements.

Version Control Integration - What Actually Works

AFT works with CodeCommit (boring but reliable), GitHub (everyone uses it), Bitbucket (if you're stuck in Atlassian hell), and GitHub Enterprise through CodeConnections.

CodeCommit integration is rock solid because it's AWS-native. GitHub works great until their webhooks randomly decide to take a vacation - usually shows up as CodePipeline execution failed: Unable to access the repository. Please verify the source location and try again even though the repo is fine and webhook events are being delivered. The webhook troubleshooting guide explains why they fail so often but doesn't mention that GitHub's webhook delivery attempts timeout after 10 seconds, which isn't enough if your AFT pipeline is busy.

Bitbucket... well, it's Bitbucket. Works fine if you're already paying Atlassian's enterprise tax for everything else.

If you're using Terraform Cloud or Enterprise, AFT plays nice with managed state through their APIs. Just be ready to pay HashiCorp's licensing fees, which will make your CFO cry. The Terraform Cloud integration guide covers the technical details.

Customizations That Actually Matter

Multi-Account Architecture

AFT has two types of customizations: global (applies to every account, can't be bypassed) and account-specific (optional per account). Global customizations are perfect for security policies that developers will try to disable if you let them. Account-specific customizations handle things like prod accounts needing hardened networking while dev accounts just need basic connectivity.

The customizations deploy during account creation, which is nice until one fails and leaves you with a half-configured account that's annoying to fix. The account customization options explain the different types available.

The aft_enable_vpc Parameter - AKA The Money Pit

Cost Optimization

Setting aft_enable_vpc to true puts AFT resources in a VPC with NAT Gateways, VPC endpoints, and all the expensive networking shit. Costs about $200-300/month for basic setups, but scales fast. For 100+ accounts, you're looking at $1000+/month just for the AFT pipeline infrastructure.

The VPC mode adds security by keeping AFT traffic internal to AWS, but most organizations can live without it initially. NAT Gateway charges alone run $45/month per AZ, plus data processing fees. VPC endpoints for S3, CodeCommit, and other services add another $100-200/month. The math gets ugly fast.

If you're on a budget, set it to false and accept the slightly reduced security posture. Your accountants will thank you when the AWS bill doesn't bankrupt the company. You can always enable VPC mode later when the CFO stops asking why the account creation tool costs more than some developer salaries.

Implementation Guide - The Real Story

Terraform Architecture

Prerequisites That Will Bite You

AFT needs Control Tower working first, which is its own special hell if you have an existing AWS Organizations setup. Control Tower assumes you're starting with a perfect green-field setup and gets angry about pre-existing accounts, OUs, and Service Control Policies. The Control Tower prerequisites list every way your existing setup can break the deployment.

You need the AWSAFTExecution and AWSAFTService IAM roles configured correctly between accounts. The role setup documentation makes it look simple, but trust relationships fail silently. When AFT can't assume roles, it just says "Access Denied" without explaining which specific permission is fucked. The IAM troubleshooting guide has the error patterns you'll see at 3am.

GitOps Workflow - When It Doesn't Break

AFT GitOps Flow

You commit account request files to Git, AFT detects the change, triggers CodePipeline, and provisions the account. Simple in theory, painful in practice. The AFT GitOps workflow explains the process, but doesn't cover where it breaks.

The pipeline validates Terraform syntax first, which catches obvious typos. Then it checks policy compliance, which fails if your OU structure doesn't match what the request expects. Resource dependency validation comes last and will catch circular dependencies that aren't obvious from reading the HCL. The pipeline troubleshooting guide covers common failures.

Common failure: Account names with spaces break everything silently. The pipeline succeeds but creates accounts with fucked-up names that don't match your naming convention. The account naming requirements are buried in the Organizations documentation.

State Management - Where Dreams Go to Die

AFT uses S3 for Terraform state and DynamoDB for locking. Works fine until someone manually fucks with the state files or the S3 bucket versioning isn't enabled. Then you get state corruption with errors like Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed and have to manually reconcile Terraform state with actual AWS resources, which is about as fun as debugging Lambda functions at 3am.

Pro tip I learned the hard way: if you're using AFT v1.9.2 or earlier, the DynamoDB table doesn't have point-in-time recovery enabled by default. Found this out after a region outage corrupted our lock table and we lost 3 days rebuilding state files from CloudFormation diffs.

Multi-region deployments add complexity because each region needs its own customizations. The state files multiply across regions and accounts. Budget time for S3 cross-region replication setup if you care about disaster recovery, because losing AFT state files means rebuilding your entire account management system from scratch. The multi-region AFT guide covers the complexity you're signing up for.

SCP Integration - Because Developers Need Guardrails

AFT applies Service Control Policies based on which OU you put the account in. Sandbox accounts get loose policies for experimentation. Production accounts get locked down harder than Fort Knox. The integration works well until someone moves an account to a different OU and breaks all the applications because the new SCPs block APIs they were using. The SCP best practices guide explains how to avoid these disasters.

Monitoring - Good Luck Finding the Real Error

AFT generates a shit-ton of logs through CloudTrail, Config, and CloudWatch. The problem is finding the actual error message in the 10,000-line CloudFormation failure dump when things break. AFT customization failures often show up as "Step Functions execution failed" without any useful details about what actually went wrong.

Pro tip: Set up custom CloudWatch alarms for pipeline failures. The default logging is useless for debugging at 2am when someone's emergency account request is stuck. The CloudWatch Logs Insights queries can help filter through the noise.

Security - Don't Fuck This Up

IAM Security

The AFT management account needs privileged access to create accounts and assume roles everywhere. If someone compromises this account, they own your entire AWS org. Lock it down with MFA, IP restrictions, and separate it from your daily operations accounts.

The cross-account role assumptions use time-limited STS tokens, which is good. But the roles themselves are persistent and broadly scoped, which is scary. Audit the role permissions regularly because privilege creep is real. The IAM security best practices guide covers what you need to monitor.

Troubleshooting - Welcome to Hell

CloudWatch Monitoring

Common issues that will ruin your day:

Real talk: we had AFT v1.8.1 running perfectly for 4 months until AWS updated the Control Tower API and suddenly all account requests started failing with InvalidParameterValueException: Invalid OU Id even though the OU IDs hadn't changed. Took 2 weeks and an AWS support case to figure out they changed how OU paths resolve in the background.

AFT module updates break shit randomly. Test in a sandbox environment first, not in production where your CFO is waiting for that new account to deploy the revenue-generating application. The AFT versioning strategy shows what changed between versions, but doesn't warn you what will break.

The Reality Check - Is AFT Worth It?

After implementing AFT in production and dealing with all this complexity, here's the honest assessment: AFT saves you massive amounts of manual work once you get through the initial pain. The 2-3 month implementation timeline is real, but so is the 30-minute account provisioning afterward.

Budget $200-1000/month for the infrastructure, plan for debugging at 3am when things break, and accept that you'll curse AWS's error messages regularly. But when it works, it's fucking magical watching accounts spin up automatically with proper networking, security, and monitoring already configured.

Start simple, test everything in sandbox, and gradually add complexity only when you need it. The alternative is manually clicking through the AWS console for years, and that's a special kind of hell nobody deserves.

AFT Integration Approaches - What Actually Works

Integration Pattern

Complexity

Automation Level

Reality Check

When to Use

Native AFT

Low

High

Good enough for most teams

< 50 accounts, small teams, don't overthink it

AFT with Blueprints

Medium

Very High

Saves you from reinventing VPC patterns

Standardized environments, tired of writing the same Terraform

AFT + ITSM Integration

High

Complete

ServiceNow integration is possible but painful

Large orgs that love bureaucracy

Custom AFT Wrapper

Very High

Complete

Overkill unless you hate yourself

Regulated environments, masochists only

FAQ - The Questions You Actually Have at 3AM

Q

Why does my AFT pipeline just say "Access Denied" without any useful details?

A

Because AWS error messages are shit.

The AFT execution role can't assume the Control Tower role, but the error doesn't tell you which specific permission is missing. The actual CloudTrail log shows something like `User: arn:aws:sts::123456789012:assumed-role/AWSAFTExecution/AFT is not authorized to perform: sts:

AssumeRolebut doesn't mention which fucking role it's trying to assume.Check the trust policy onAWSControlTowerExecutionrole and make sureaft-management-account` is listed as a trusted entity. Also verify the role ARNs match exactly

  • typos will fuck you silently. Common gotcha: if you're using AFT version 1.10.3+, the role names changed and the documentation writers clearly never actually used this feature.
Q

ServiceNow integration - is it worth the pain?

A

Technically possible but you'll hate yourself. Build a wrapper API around aft-account-request that translates ServiceNow tickets into Terraform JSON. Works great until ServiceNow changes their API (monthly) or someone submits a ticket with invalid account names containing spaces or special characters that break everything downstream.

Q

Multi-region deployments - why is state management such a clusterfuck?

A

Each region gets its own customizations and state files. Multiply that by the number of accounts and regions, and you've got hundreds of state files to manage. S3 cross-region replication is a must unless you enjoy rebuilding your entire AFT setup when an AWS region has a bad day.

Q

How much is AFT actually going to cost me?

A

Budget $200-300/month for basic AFT pipeline costs (CodePipeline, Step Functions, DynamoDB). Enable VPC networking (aft_enable_vpc=true) and you're looking at $500-1500/month depending on scale. For 100+ accounts across multiple regions, easily $2000+/month just for the automation infrastructure.

Q

Terraform Cloud integration - will it bankrupt me?

A

AFT works fine with Terraform Cloud, but the usage-based pricing will destroy your budget. $20/user/month plus execution costs that scale with account creation frequency. Terraform Enterprise licensing starts at $50K+/year. Open source Terraform with S3/DynamoDB state is free and works just as well for most use cases.

Q

Which Git provider should I use with AFT?

A

CodeCommit never breaks but is boring. GitHub is what everyone uses and has the best workflows, but webhooks occasionally shit the bed and stop triggering pipelines. Bitbucket works fine if you're already in Atlassian hell. GitHub Enterprise is expensive but necessary if you need on-premises Git.

Q

Why do AFT customizations fail randomly without useful error messages?

A

AFT customizations are just Terraform modules that run in a specific lifecycle hook. When they fail, the error usually shows up as "Step Functions execution failed" without the actual Terraform error. Check CloudWatch logs for the Lambda function that runs the customization

  • the real error is buried in there somewhere.
Q

Can I import my existing accounts into AFT management?

A

AFT is designed for new accounts. Importing existing accounts is a nightmare of manual Terraform state manipulation and resource import. Unless you enjoy spending weeks reconciling actual AWS resources with Terraform state files, just leave existing accounts alone and use AFT for new ones.

Q

How badly can someone fuck things up if they compromise the AFT management account?

A

Complete organizational takeover. The AFT roles can create accounts, assume roles everywhere, and modify Control Tower configurations. If someone gets access to this account, they own your entire AWS org. Lock it down with MFA, IP restrictions, and don't give the keys to interns.

Q

Why does Terraform state get corrupted and how do I fix it?

A

Someone manually modified AFT-managed resources in the console, creating state drift. When the next account provisioning runs, Terraform tries to "fix" resources that were manually changed. Result: corrupted state. Fix by running terraform refresh to sync state with reality, then terraform plan to see what's fucked. Nuclear option: delete state files and terraform import everything manually.

Q

My AFT customization failed and left the account half-configured. Now what?

A

You're in manual cleanup hell. AFT doesn't have rollback functionality, so you get to figure out which resources were created, which ones failed, and how to get everything back to a consistent state. The Step Functions execution will show something unhelpful like States.TaskFailed: Lambda function failed without mentioning that the actual Terraform error was Error: creating EC2 VPC: InvalidVpcEndpoint.NotFound.This is why you test customizations in a sandbox environment first, not in production where the CFO is waiting for their account. I learned this lesson after a customization failed at 1am and left 12 production accounts with broken networking that took 6 hours to manually fix.

Q

Do Service Control Policies make AFT more painful?

A

Yes, but necessarily so. SCPs based on OU placement work fine until someone moves an account to a different OU and breaks all the applications because the new policies block APIs they were using. Pro tip: test SCP changes in non-production OUs first.

Q

How do I prove to auditors that AFT is compliant?

A

AFT logs everything through CloudTrail and Config. The problem is filtering through 10,000 log entries to find the specific events auditors care about. Set up custom CloudWatch queries for account creation events, role assumptions, and resource changes. The GitOps model helps because every change is in version control.

Q

Are AFT Blueprints worth using or should I write my own?

A

The blueprints are actually decent and will save you time if your requirements match their assumptions. Problem is they're opinionated about network architecture, naming conventions, and tagging strategies. If you deviate from their patterns, you'll spend more time fighting the blueprints than writing your own Terraform.

Q

How long will AFT implementation actually take?

A

Plan 2-3 months for AFT if you're doing it right. First month is getting Control Tower to stop being angry about your existing AWS setup. Second month is actually deploying AFT and getting basic account provisioning working. Third month is fixing all the shit that breaks when you try to use it for real workloads with actual requirements.

AFT Resources - The Good, The Bad, The Ugly

Related Tools & Recommendations

pricing
Similar content

Terraform, Pulumi, CloudFormation: IaC Cost Analysis 2025

What these IaC tools actually cost you in 2025 - and why your AWS bill might double

Terraform
/pricing/terraform-pulumi-cloudformation/infrastructure-as-code-cost-analysis
100%
compare
Similar content

Terraform vs Pulumi vs AWS CDK vs OpenTofu: Real-World Comparison

Compare Terraform, Pulumi, AWS CDK, and OpenTofu for Infrastructure as Code. Learn from production deployments, understand their pros and cons, and choose the b

Terraform
/compare/terraform/pulumi/aws-cdk/iac-platform-comparison
67%
tool
Similar content

AWS CDK Overview: Modern Infrastructure as Code for AWS

Write AWS Infrastructure in TypeScript Instead of CloudFormation Hell

AWS Cloud Development Kit
/tool/aws-cdk/overview
51%
integration
Similar content

Terraform, Ansible, Packer: Automate Infrastructure & DevOps

Here's how Terraform, Packer, and Ansible work together to automate your entire infrastructure stack without the usual headaches

Terraform
/integration/terraform-ansible-packer/infrastructure-automation-pipeline
39%
integration
Similar content

Terraform Multicloud Architecture: AWS, Azure & GCP Integration

How to manage infrastructure across AWS, Azure, and GCP without losing your mind

Terraform
/integration/terraform-multicloud-aws-azure-gcp/multicloud-architecture-patterns
34%
pricing
Similar content

IaC Pricing Reality Check: AWS, Terraform, Pulumi Costs

Every Tool Says It's "Free" Until Your AWS Bill Arrives

Terraform Cloud
/pricing/infrastructure-as-code/comprehensive-pricing-overview
31%
tool
Recommended

Pulumi Cloud for Platform Engineering - Build Self-Service Infrastructure at Scale

competes with Pulumi Cloud

Pulumi Cloud
/tool/pulumi-cloud/platform-engineering-guide
29%
tool
Recommended

Pulumi Cloud Enterprise Deployment - What Actually Works in Production

When Infrastructure Meets Enterprise Reality

Pulumi Cloud
/tool/pulumi-cloud/enterprise-deployment-strategies
29%
troubleshoot
Recommended

Fix Kubernetes ImagePullBackOff Error - The Complete Battle-Tested Guide

From "Pod stuck in ImagePullBackOff" to "Problem solved in 90 seconds"

Kubernetes
/troubleshoot/kubernetes-imagepullbackoff/comprehensive-troubleshooting-guide
28%
integration
Recommended

Deploying Temporal to Kubernetes Without Losing Your Mind

What I learned after three failed production deployments

Temporal
/integration/temporal-kubernetes/production-deployment-guide
28%
troubleshoot
Recommended

Your AI Pods Are Stuck Pending and You Don't Know Why

Debugging workflows for when Kubernetes decides your AI workload doesn't deserve those GPUs. Based on 3am production incidents where everything was on fire.

Kubernetes
/troubleshoot/kubernetes-ai-workload-deployment-issues/ai-workload-gpu-resource-failures
28%
alternatives
Recommended

GitHub Actions Alternatives That Don't Suck

integrates with GitHub Actions

GitHub Actions
/alternatives/github-actions/use-case-driven-selection
27%
pricing
Recommended

Databricks vs Snowflake vs BigQuery Pricing: Which Platform Will Bankrupt You Slowest

We burned through about $47k in cloud bills figuring this out so you don't have to

Databricks
/pricing/databricks-snowflake-bigquery-comparison/comprehensive-pricing-breakdown
27%
troubleshoot
Recommended

Docker Daemon Won't Start on Linux - Fix This Shit Now

Your containers are useless without a running daemon. Here's how to fix the most common startup failures.

Docker Engine
/troubleshoot/docker-daemon-not-running-linux/daemon-startup-failures
26%
tool
Recommended

AWS CDK Production Deployment Horror Stories - When CloudFormation Goes Wrong

Real War Stories from Engineers Who've Been There

AWS Cloud Development Kit
/tool/aws-cdk/production-horror-stories
24%
alternatives
Similar content

Terraform Alternatives: Migrate Easily from HashiCorp's BSL

Stop paying HashiCorp's ransom and actually keep your infrastructure working

Terraform
/alternatives/terraform/migration-friendly-alternatives
24%
compare
Recommended

PostgreSQL vs MySQL vs MongoDB vs Redis vs Cassandra - Enterprise Scaling Reality Check

When Your Database Needs to Handle Enterprise Load Without Breaking Your Team's Sanity

PostgreSQL
/compare/postgresql/mysql/mongodb/redis/cassandra/enterprise-scaling-reality-check
23%
tool
Similar content

Pulumi Overview: IaC with Real Programming Languages & Production Use

Discover Pulumi, the Infrastructure as Code tool. Learn how to define cloud infrastructure with real programming languages, compare it to Terraform, and see its

Pulumi
/tool/pulumi/overview
21%
tool
Similar content

AWS Overview: Realities, Costs, Use Cases & Avoiding Bill Shock

The cloud platform that runs half the internet and will drain your bank account if you're not careful - 200+ services that'll confuse the shit out of you

Amazon Web Services (AWS)
/tool/aws/overview
21%
tool
Similar content

Terraform Overview: Define IaC, Pros, Cons & License Changes

The tool that lets you describe what you want instead of how to build it (assuming you enjoy YAML's evil twin)

Terraform
/tool/terraform/overview
21%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization