Portfolio Case Study

AI-Powered SEO Keyword Research Workflow System

Building a modular SEO research pipeline using n8n, OpenAI, DataForSEO, and Google Sheets.

This project started with a pretty simple goal: make keyword research less messy without building a system that becomes its own problem later. The result is a modular workflow that collects, cleans, and structures SEO data so it can be used downstream for strategy, content planning, or GPT-assisted analysis.

n8n OpenAI DataForSEO Google Sheets SEO Automation

1. The Problem

The research work was useful, but the process was messy

Manual keyword research tends to break down in very normal, very annoying ways. Data comes from too many places, formatting gets weird, and by the time you clean it up, you have already burned time you were trying to save.

It also gets harder to scale once the process depends too much on memory and cleanup habits. I wanted a workflow I could run repeatedly and trust, with structured output that was clean enough to use later for strategy, content planning, or GPT-assisted analysis.

2. Project Goals

Build a collection system that stays useful as the work expands

Modular workflow system

Use smaller modules with clear jobs instead of one oversized workflow that gets annoying to maintain.

Reusable architecture

Build the flow once in a way that can support future research systems and content workflows.

Scalable keyword collection

Handle both direct seed keywords and wider niche exploration without rebuilding the system each time.

GPT-ready output

Produce structured data that is easy to work with later, not a stack of half-clean exports.

Support future SEO workflows

Leave room for clustering, prioritization, content planning, and brief generation later on.

Separate collection from strategy

Keep collection separate from strategy so the system stays flexible and easier to debug.

3. System Architecture

An orchestrator pattern with modular responsibility boundaries

The system uses an orchestrator pattern. One main workflow coordinates the run, and each major step is handed off to a dedicated module. That keeps the responsibilities clear, makes failures easier to isolate, and lets me improve one part without disturbing the rest of the workflow.

Summary architecture artifact for the SEO keyword research workflow
Architecture summary showing the orchestrator pattern, tool stack, and strategic boundary between collection and analysis.

The real value here is separation of responsibilities. Input handling, seed generation, data retrieval, cleaning, export, and run summaries all live in their own layers. That makes the system easier to reuse and a lot easier to reason about when something changes.

4. Workflow Breakdown

Each module does one job and hands off cleanly to the next

n8n orchestrator workflow screenshot
The main orchestrator workflow inside n8n, showing how the modules connect in sequence.

Normalize Input

Every run starts by cleaning and standardizing the input. That includes limits, location, language, and input type, so the rest of the system is working with predictable values.

Generate Seed Keywords

If the input is a niche, OpenAI generates seed terms. If the input is already a seed keyword, the workflow uses it directly. That split keeps the system flexible without making the later stages weird.

DataForSEO Retrieval

The workflow sends each seed through DataForSEO to pull real keyword metrics. This is where the process stops being conceptual and starts becoming a real dataset.

Cleaning and Deduplication

Returned rows are normalized, typed correctly, and deduplicated by normalized keyword values. Clean data is the difference between a workflow that scales and one that quietly creates more cleanup work for you later.

Google Sheets Export

Once the dataset is clean, the workflow appends the results to a structured Google Sheet. That gives the system a stable handoff layer for later analysis, prioritization, or content planning.

Run Summary

Each execution ends with a run summary so it is easier to review what happened. That helps with logging, QA, and general sanity when the workflow gets used repeatedly.

Folder view showing orchestrator and module workflows
The workflow folder structure reinforces the modular design. Each module can evolve without turning the orchestrator into a single point of confusion.

5. Data Output Structure

The output is designed for analysis, not just storage

The final sheet is structured so it can support later GPT analysis, human review, content planning, and prioritization. It is not just a raw dump. It is meant to be a usable working dataset.

Google Sheets output of keyword dataset
The output layer includes keyword metrics, trend fields, language data, difficulty, and intent classification in a format that is easier to use downstream.

Keyword metrics

Search volume, CPC, competition, and top-of-page bid ranges.

Intent classification

Main intent fields make later filtering and content planning a lot easier.

Difficulty scoring

Keyword difficulty makes prioritization easier when the dataset grows.

Future GPT support

Clean structure makes later clustering, briefing, and analysis far more reliable.

6. Strategic Design Decisions

Why the modular system matters

Modularity is not just a technical preference here. It is what makes the workflow easier to scale, easier to reuse, and much easier to debug when something breaks or changes.

Separating strategic analysis from data collection was another deliberate choice. The collection workflow is responsible for producing clean, structured inputs. It does not try to pick the best keyword, choose a content strategy, or write the brief. That keeps the system more flexible and makes it easier to connect to future analysis layers.

Scalability

New modules can be added without turning the core workflow into a maintenance problem.

Easier debugging

When a module fails, the problem area is much easier to isolate.

Reusable systems approach

The same structure can support future SEO and research workflows without starting over.

7. Challenges and Lessons Learned

The main challenge was knowing when to stop automating

One of the bigger tensions in this project was balancing automation with strategic flexibility. It is very easy to keep stacking logic into a workflow because technically you can. That does not always make the workflow better.

The clearest lesson was that clean structure matters more than clever automation. If the data is inconsistent, the downstream work gets shakier. If the workflow tries to do too much, it gets harder to maintain and harder to trust.

The better path was to build a solid collection system first, keep the modules reusable, and leave room for human judgment where it still belongs.

8. Results and Impact

What the workflow now enables

Faster keyword research

The collection process is much quicker and more repeatable than manual assembly.

Scalable content planning

The output is structured well enough to support later filtering, planning, and clustering work.

Reusable SEO datasets

Each run produces a dataset that can be used again instead of a one-off export.

Future automation expansion

The architecture is ready for additional analysis layers without needing a rebuild later.

Portfolio-ready workflow thinking

This is a practical example of systems design, not just a prompt experiment with a good screenshot.

9. Future Expansion

Where this system could go next

  • Keyword clustering
  • Topical authority scoring
  • SERP analysis layers
  • Automated content brief generation
  • Internal linking suggestions
  • Integration with broader AI research systems

The current version is doing the right job for this stage. It collects and structures data well. The next step is using that data more intelligently without losing the clean boundaries that make the workflow useful in the first place.