About Certifications AWS Showcase Music Pipeline Resume Connect Built with Claude ↗
// CONFIG-DRIVEN · PARAMETER STORE · AWS-NATIVE

Music Pipeline

A reusable, config-driven ingestion framework for streaming and batch data — built on AWS Parameter Store, Kinesis, Glue, and DynamoDB. One Lambda. Any schema. Zero code changes to add a new source.

Four pipelines. One framework.

📱💻

Stream Pipeline — Phone & PC

A single reusable Lambda function handles both phone and PC streaming events. Each source has its own Kinesis Data Stream. The Lambda reads its input and output schemas from AWS Parameter Store at runtime — no hardcoded field names, no code changes to add a new stream source.

Raw JSON lands in S3 raw/, triggers a second Lambda to update DynamoDB in real time, then Glue enriches the data — resolving lat/long to city and state, and translating song, artist, album, and playlist IDs to human-readable names via RDS lookups and DynamoDB user profile lookups.

👔👥

Batch Pipeline — Employees & Customers

Employee and customer records live in RDS as fully normalized relational tables. A config-driven Lambda batch extractor reads the schemas from Parameter Store, extracts records, and writes them to S3 raw/.

Glue then flattens the relational structure — joining across address, phone, position, salary, and bonus tables — producing analytics-ready flat records in S3 curated/, which are then loaded into DynamoDB. Sensitive fields (SSN, CC numbers) are KMS-encrypted and excluded from all pipeline output schemas.

🔧

What's being simulated

This showcase uses synthetic data to demonstrate the pipeline. The Lambda producers generate realistic records — phone events with lat/long coordinates, PC events with browser types, employee records with relational lookups, and customer records with enrollment history.

The pipeline architecture, Parameter Store config pattern, Glue enrichment logic, and DynamoDB state management are all production-grade — the data source is simulated so the framework can be demonstrated without live RDS infrastructure cost.

⚙️

Config-Driven Architecture

Input and output schemas are stored as JSON in AWS Parameter Store — versioned, audited, and encrypted independently of code. The Lambda reads its schema path from an environment variable, loads the config at runtime, validates incoming records, and routes output — all without a single hardcoded field name.

Adding a new data source = add a Parameter Store entry + a new Kinesis trigger. Zero Lambda code changes.

📊

Data Science Use Case

The curated S3 output answers a real business question: "What is the #1 song by region, age, and city?" — by enriching each raw streaming event with resolved song name, artist, city, state, and user age at Glue transform time. The curated dataset is analytics-ready for ML models, BI tools, or direct Athena queries without any further joins.

Config as infrastructure

Input and output schemas live in Parameter Store — not in code. The Lambda resolves its schema path from an environment variable, caches the config on warm invocations, and validates every record against it at runtime.

// /music-pipeline/phone/input-schema
"user_id": "string", "lat": "float", "long": "float", "album_id": "string", "record_id": "string", "artist_id": "string", "datetime": "timestamp", "playlist_id":"string", "device_type":"string"
// /music-pipeline/phone/output-schema
"user_id": "string", "city": "string", "state": "string", "song_name": "string", "artist_name": "string", "album_name": "string", "playlist_name":"string", "user_age": "int", "datetime": "timestamp", "device_type": "string"
// /music-pipeline/pc/input-schema
"user_id": "string", "lat": null, "long": null, "album_id": "string", "record_id": "string", "artist_id": "string", "datetime": "timestamp", "playlist_id": "string", "browser_type": "string"
// /music-pipeline/batch/employees/output-schema
"emp_id": "string", "fname": "string", "lname": "string", "title": "string", "city": "string", "state": "string", "salary": "float", "bonus": "float", "cost_center": "string" // ssn excluded — KMS encrypted

Run the pipelines

Trigger individual pipelines or start all four simultaneously. Watch records flow through each stage in real time.

PHONE STREAM
Idle
0 records
PC STREAM
Idle
0 records
BATCH EMPLOYEES
Idle
0 records
BATCH CUSTOMERS
Idle
0 records
// LIVE DATA FLOW
PHONE STREAM
📱
Producer
Lambda
📡
Kinesis
Stream
λ
Ingestor
SSM config
🪣
S3 raw/
JSON
🗄️
DynamoDB
State
🔧
Glue
Enrich
🪣
S3 curated/
Parquet
PC STREAM
💻
Producer
Lambda
📡
Kinesis
Stream
λ
Ingestor
SSM config
🪣
S3 raw/
JSON
🗄️
DynamoDB
State
🔧
Glue
Enrich
🪣
S3 curated/
Parquet
BATCH EMP
🗃️
RDS
Employees
λ
Extractor
SSM config
🪣
S3 raw/
JSON
🔧
Glue
Flatten
🪣
S3 curated/
Parquet
🗄️
DynamoDB
Store
BATCH CUST
🗃️
RDS
Customers
λ
Extractor
SSM config
🪣
S3 raw/
JSON
🔧
Glue
Flatten
🪣
S3 curated/
Parquet
🗄️
DynamoDB
Store

Live data grids

Records populate as each pipeline runs. Scroll to explore the synthetic dataset.

📱 Phone Records STREAM
0 records
User IDSongArtistCityStDevice
💻 PC Records STREAM
0 records
User IDSongArtistCityBrowser
👔 Employee Records BATCH
0 records
Emp IDNameTitleCitySt
👥 Customer Records BATCH
0 records
User IDNameCityStEnrolled