Skip to content

Cloud Service Cheatsheet

GCP vs AWS vs Azure (Interview & System Design Oriented)

This cheatsheet maps core cloud primitives across GCP, AWS, and Azure, with an emphasis on ML systems, data platforms, and production workloads.

The goal is not memorizing names, but understanding equivalent building blocks and trade-offs.


Compute

Concept GCP AWS Azure
Virtual Machines Compute Engine EC2 Virtual Machines
Autoscaling Managed Instance Groups Auto Scaling Groups VM Scale Sets
Containers (Managed) GKE EKS / ECS AKS
Serverless (HTTP) Cloud Run Lambda Azure Functions
Batch Jobs Batch / Dataflow Batch Azure Batch

Interview tip:
Serverless = fast iteration, bursty workloads
VMs / K8s = predictable latency, long-running services


Storage

Concept GCP AWS Azure
Object Storage Cloud Storage S3 Blob Storage
Block Storage Persistent Disk EBS Managed Disks
File Storage Filestore EFS Azure Files

ML usage notes - Object storage for datasets, checkpoints, artifacts - Block storage for low-latency model serving - File storage mostly for legacy or shared workloads


Databases & Data Platforms

Concept GCP AWS Azure
Managed SQL Cloud SQL RDS Azure SQL
Globally Scalable SQL Spanner Aurora Cosmos DB
NoSQL (Wide Column / KV) Bigtable DynamoDB Cosmos DB
Cache Memorystore ElastiCache Azure Cache for Redis
Data Warehouse BigQuery Redshift Synapse

Design signal:
- OLTP ≠ Analytics
- BigQuery / Redshift / Synapse are not serving databases


Messaging & Streaming

Concept GCP AWS Azure
Pub/Sub (Eventing) Pub/Sub SNS / SQS Service Bus
Streaming Pub/Sub Kinesis Event Hubs
Workflow Orchestration Cloud Composer Step Functions Logic Apps

ML relevance - Event-driven feature updates - Training triggers - Async inference workflows


Networking

Concept GCP AWS Azure
Virtual Network VPC VPC VNet
Load Balancer Cloud Load Balancing ALB / NLB Azure Load Balancer
DNS Cloud DNS Route 53 Azure DNS
CDN Cloud CDN CloudFront Azure CDN

Interview note:
Talk in terms of L4 vs L7, not product names.


Observability & Reliability

Concept GCP AWS Azure
Logging Cloud Logging CloudWatch Logs Azure Monitor
Metrics Cloud Monitoring CloudWatch Metrics Azure Monitor
Tracing Cloud Trace X-Ray Application Insights
Alerts Alerting Policies CloudWatch Alarms Alerts

Senior signal:
Observability is a first-class system requirement, not an afterthought.


ML & MLOps Platforms (Critical for MLE Interviews)

Capability GCP AWS Azure
ML Platform Vertex AI SageMaker Azure ML
Training Jobs Vertex Training SageMaker Training Azure ML Jobs
Feature Store Vertex Feature Store SageMaker Feature Store Azure Feature Store
Pipelines Vertex Pipelines SageMaker Pipelines Azure ML Pipelines
Model Registry Vertex Model Registry SageMaker Registry Azure ML Registry
Online Serving Vertex Endpoints SageMaker Endpoints Azure Online Endpoints
Monitoring Vertex Model Monitoring SageMaker Model Monitor Azure ML Monitoring

Key interview insight:
Most ML failures come from data + serving, not modeling.


How I Reason Across Clouds (Interview Framing)

Instead of saying:

“I’d use BigQuery”

Say:

“I’d use a managed columnar data warehouse (BigQuery / Redshift) optimized for analytical queries.”

Instead of:

“We use SageMaker”

Say:

“We use a managed ML platform for training, registry, deployment, and monitoring.”


Cloud Selection Heuristics

  • GCP: Strongest for analytics & ML-native workflows
  • AWS: Broadest ecosystem, best for infra-heavy systems
  • Azure: Strong enterprise integration, identity-first

In interviews, emphasize trade-offs, not preferences.


TL;DR Mental Model

All clouds provide the same primitives:

Compute · Storage · Network · Data · ML · Observability

If you can reason at that level, service names are just details.


This cheatsheet is intended for ML system design interviews and real-world architecture discussions.