← dannwaneri.com

Python SEO Audit Agent

A Python SEO audit agent that visits each URL in a real Chromium browser, extracts title, description, H1, and canonical via Claude API, checks for broken links asynchronously, and surfaces structural problems checklists miss — cannibalisation, orphan pages, position 9.5 queries with 0% CTR (click-through rate). Runs locally, fully open source.

Key Takeaways
  • Audits pages in a real Chromium browser — not a crawler, so JavaScript-rendered content is fully visible
  • Detects cannibalisation, orphan pages, and broken links automatically
  • Parses Google Search Console (GSC) exports to find quick-win queries at position 4–20 with low CTR
  • Tiered cost routing: rule-based checks first ($0), Claude Haiku second (~$0.0001/URL), Sonnet only when needed
  • Resumable — if it crashes at URL 47, it restarts at URL 48, not from zero
0.4→44 pass rate on own domains
pos 9.5 query found with 0% CTR
4 analysis modules
$0 cost on deterministic checks

On-page audit

Real browser visit per URL. Extracts title, description, H1, canonical, checks broken same-domain links. Resumable — crash at URL 47, restart at URL 48.

Backlink qualifier

Scores referring domains on niche relevance, traffic quality, and spam. Weighted formula. Tiers from Insert Worthy to Avoid. Flat JSON cache — never re-fetches a scored domain.

GSC insights

Parses a Google Search Console export. Flags quick wins (pos 4–20, impressions ≥50, CTR (click-through rate) <5%). Sends top rows to Claude Haiku with a prompt specifically asking for cannibalisation — the signal you can't see by sorting a spreadsheet.

Relevance scorer

Scores candidate pages as internal link sources for a target URL. Checks existing links first — never recommends what already exists. Tiers from Strong Link to Skip.

Cluster audit

Builds the full internal link graph, counts incoming links per page. Zero incoming = orphan. Maps topic clusters, finds missing hubs, generates a prioritised fix list.

Tiered cost routing

Rule-based Python checks first ($0), Claude Haiku second (~$0.0001/URL), Claude Sonnet only when needed (~$0.006/URL). A 20-URL audit at full Sonnet costs $0.12. With tiering, most sites cost under $0.02. The same Claude API integration powers my AI agent projects.

Ran it on a 5-page Nigerian creator site (naija-vpn.com). Pass rate went from 0.4% to 44% — the percentage of audited pages with no critical SEO issues. Manual review had missed all of these.

Title cannibalisation

The Carter Efe case study page had the same title and description as the homepage — copy-pasted verbatim. Google was indexing two pages with identical signals and picking one. The case study was losing.

Position 9.5 query — 0% CTR

`does twitch pay nigerians` — 29 impressions, position 9.5, zero clicks. The page was ranking but the title answered a different question. One rewrite. CTR check due 2026-05-25.

Two missing internal links

Carter Efe page and the Twitch guide both discussed Cleva vs Geegpay at length. Neither linked to the comparison page. Relevance scorer flagged both as Strong Link (≥75). Both added.

Python 3.11+ Claude Haiku Claude Sonnet Browser Use Playwright httpx Google Search Console CSV Flat JSON cache Markdown reports

How to create a SEO AI agent?

Use Playwright or Browser Use to visit pages in a real browser, send the HTML to Claude API for SEO signal extraction (title, description, H1 heading, canonical URL), check broken links asynchronously with httpx, and write results to JSON incrementally so the agent is resumable. Add HITL (human-in-the-loop) pauses for login walls and 404 pages. The full open-source implementation with Google Search Console analysis, backlink scoring, and cluster mapping is at github.com/dannwaneri/seo-agent.

Is Playwright better than Selenium?

For SEO agents and modern web automation, yes. Playwright has native async support, better wait strategies (networkidle, domcontentloaded), faster execution, and cleaner Python APIs. Selenium is more mature with wider language support, but Playwright handles dynamic JavaScript-heavy sites more reliably — which matters when you're waiting on rendered content before extracting SEO signals.

Can I use Playwright for browser automation?

Yes. Install with pip install playwright && playwright install chromium. Playwright can run Chromium, Firefox, or WebKit — headless or headful. For SEO agents, headful mode (visible browser) is useful: it bypasses some bot-detection and makes debugging straightforward. Playwright handles JavaScript rendering, authentication flows, cookies, and network interception out of the box.

How to create a Playwright agent?

Install Browser Use (pip install browser-use), which wraps Playwright with LLM-friendly abstractions, or use Playwright directly with async_playwright. Structure the agent as: navigate → wait for networkidle → extract DOM → send to Claude → persist result → next URL. Add HITL (human-in-the-loop) pauses for edge cases like login walls and 404s. The seo-agent repo has a complete working implementation.

The agent is fully open source — clone it, run it locally, keep all the output. If you're a Python developer comfortable with pip install and a terminal, you can run a full audit yourself in under an hour.

If you'd rather get a prioritised fix report without touching the code, hire me to run it on your site. I deliver a structured Markdown report: structural issues, cannibalisation flags, GSC quick-win queries, and missing internal link opportunities — ranked by impact. Built with the same Python and Claude API stack I use across all client projects.

Related services

AI Agents → Hire a Python Developer → Cloudflare Automation →

Need an SEO audit that finds what checklists miss?

I can run the full agent on your site and deliver a prioritised fix list — structural issues, cannibalisation, GSC quick wins, missing internal links. Or I can set it up to run on your own infrastructure.

Hire Me Clone It Yourself