My internal analytics showed 28 million page views over two months. 200,000 unique IPs per day. Traffic growing week over week. I was looking at these numbers thinking about conversion funnels, email capture, monetization strategies for all that organic traffic.

Then I opened Google Analytics.

The Numbers That Didn’t Add Up

Google Analytics showed 169K active users. Sounds consistent with my server-side data. But then:

  • Average engagement time: 0 seconds
  • Bounce rate: 99.4% on the top page
  • 169K new users = 169K active users — every single visitor came once and never returned
  • Top city: Lanzhou, China — 121,000 users

One city in northwestern China was generating 71% of all my traffic.

The session source breakdown confirmed it: 169K sessions from “(direct) / (none)” — bots hitting URLs directly, not coming from search engines. My actual Google organic traffic was 97 sessions per day. ChatGPT referrals: 34. GitHub: 20. Hacker News: 14.

My real traffic was roughly 200 visitors per day. Not 200,000.

What the Bots Were Doing

The bot traffic was hitting entity pages and episode transcript pages. High volume, zero engagement, zero interaction. They were using real browser user agents — my server-side bot detection (which checks for known bot strings like “Googlebot”, “python-requests”, “Scrapy”) didn’t catch them.

Looking at my page_views table by hour, the pattern was obvious in retrospect:

Jan 25 peak: 12,000-23,000 views per hour
Jan 28:      14-1,189 per hour

The traffic was bursty and concentrated during Chinese business hours. I just never looked at the hourly distribution.

How I Almost Made a Strategic Mistake

Here’s the part that bothers me. I was actively making product decisions based on this data. The internal discussion went something like:

“We have 150K daily unique visitors landing on transcript pages via Google. That’s an asset. We should build email capture, alert signups, convert that SEO traffic into users.”

This would have been weeks of work optimizing for an audience that doesn’t exist. Every A/B test would have shown noise. Every conversion rate would have been near zero — and I’d have blamed the funnel design, not the traffic quality.

Server-side analytics without bot filtering are worse than no analytics, because they give you confidence in wrong numbers.

The Fix

Audioscrape sits behind Cloudflare. The fix took two minutes:

  1. Cloudflare dashboard → Security → WAF → Custom rules
  2. Field: Country equals China
  3. Action: Managed Challenge (shows CAPTCHA — real humans can solve it, bots can’t)

Result: page views dropped from ~12,000/day to near zero immediately. The trickle that remained was real human traffic — consistent with the ~200/day that Google Analytics showed from organic search.

What I Learned

Cross-reference your analytics. If I’d compared my server-side page_views table with Google Analytics earlier, the discrepancy would have been obvious. Server-side tracking counts every request. GA’s JavaScript tag only fires in real browsers that execute JavaScript and aren’t blocked by ad blockers. The gap between the two is your bot traffic.

0 seconds engagement is the tell. Real users — even ones who bounce — spend a few seconds on the page. An entire population with exactly 0 seconds engagement time is not human.

Don’t filter by user agent alone. These bots used real Chrome/Firefox user agent strings. My rate limiter had a suspicious-agent detection list (python-requests, Scrapy, Selenium, etc.) that caught nothing. Modern bot farms rotate real browser fingerprints.

The simplest geo-block is often enough. My product is an English-language podcast platform. I have zero legitimate users in China. A country-level managed challenge eliminated 99% of the fake traffic with no false positives. If your product has a clear geographic audience, this is the first thing to try.

Be skeptical of good numbers. 200K daily uniques for a niche B2B product with no marketing spend and no viral loop? I wanted it to be true, so I didn’t question it. The realistic baseline for an early-stage product with no paid acquisition is low hundreds, not hundreds of thousands.