IDEA: Selective HTML Meta Tag Performance (bot vs human)

IDEA: Selective HTML Meta Tag Performance (bot vs human)

I was reading the Performance section of the HTTP Archive Web Almanac and this quote got me thinking...

... CMS and front-end frameworks development on performance can significantly impact the user experience for the top 10M websites.

And by "CMS" we all know that it's Wordpress that's powering a big chunk of the web, and as a Web Developer that's built many sites on Wordpress, I know that it can be performant and out-of-the-box generally is. The issue with WordPress (and to a similar greater degree PHP in general) is the way you use it.

Wordpress's plugin framework makes it all so easy for site Admins to add many plugins, of which many are built to quickly solve a visible problem... and not often coded with web performance at the forefront (both Database use performance [which slows TTFB] and the outputted HTML/CSS/JS/Images weight).

So poor performance isn't directly the CMS's fault - a lot of this is due to the plugins and themes added to it - and to have this biggest impact there - you need to see what's heavily used on many sites. If you do a quick search for the "most popular WordPress plugin" you'll get many conflicting results which all seem to have no usage facts behind their "top 10" count. But one of the podium position plugins common to these lists is Yoast SEO. Self-claimed on their own home page to be the "#1 WordPress SEO plugin" it's fair to say that hundreds of thousands of sites are running this Plugin.

Now; this is nothing against Yoast and the performance suggestion I'm about to make isn't a solo problem of Yoast, it's fundamentally an SEO/Social issue imposed/suggested by power-house third-parties that add weight to nearly every-single HTML page on the internet benefitting only a small fraction of bots (yes, "bots" - most humans get no direct gain from this added code).

Let's see some typical HTML output any web developer would be familiar with:

<title>HTML elements added to a typical snippet of a homepage</title>
<meta name='robots' content='index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1'/>
<meta name="description" content="A typical page description about company named XYZ Pty Ltd who specialise is nothing that special. Check us out!"/>
<link rel="canonical" href="https://www.domainname.com.au/"/>
<meta property="og:locale" content="en_US"/>
<meta property="og:type" content="website"/>
<meta property="og:title" content="HOME"/>
<meta property="og:description" content="A typical page description about company named XYZ Pty Ltd who specialise is nothing that special. Check us out!"/>
<meta property="og:url" content="https://www.domainname.com.au/"/>
<meta property="og:site_name" content="Company Name"/>
<meta property="article:publisher" content="https://www.facebook.com/domainnamePtyLtd"/>
<meta property="article:modified_time" content="2021-11-06T09:16:50+00:00"/>
<meta property="og:image" content="https://www.domainname.com.au/wp-content/uploads/2017/04/sydney-harbor-991981_1280.jpg"/>
<meta property="og:image:width" content="1280"/>
<meta property="og:image:height" content="960"/>
<meta name="twitter:card" content="summary_large_image"/>
<meta name="twitter:site" content="@domainname"/>
<meta name="twitter:label1" content="Est. reading time"/>
<meta name="twitter:data1" content="5 minutes"/>
<meta name="google-site-verification" content="id90zZCX95dykFhAP-n6MWAp8d-3Bxp0x4im8Eo2Ez4">
<meta property="fb:pages" content="14226545351,301962403206361,108010306045438,561117677302495,572719819512238,1379120195662587,739577862771770,1459378364304687,559217924137756,730765890315910">
<meta property="fb:app_id" content="278968609110838">
<script type="application/ld+json" class="yoast-schema-graph">{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://www.domainname.com.au/#organization","name":"Company Name","url":"https://www.domainname.com.au/","sameAs":["https://www.facebook.com/domainnamePtyLtd","https://instagram.com/domainname/","https://www.linkedin.com/company/company-name-pty-ltd","https://www.youtube.com/user/domainnameAU/","https://twitter.com/domainname"],"logo":{"@type":"ImageObject","@id":"https://www.domainname.com.au/#logo","inLanguage":"en-AU","url":"https://www.domainname.com.au/wp-content/uploads/2018/11/avatar.png","contentUrl":"https://www.domainname.com.au/wp-content/uploads/2018/11/avatar.png","width":1200,"height":1200,"caption":"Company logo"},"image":{"@id":"https://www.domainname.com.au/#logo"}},{"@type":"WebSite","@id":"https://www.domainname.com.au/#website","url":"https://www.domainname.com.au/","name":"Company Website","description":"Home page description","publisher":{"@id":"https://www.domainname.com.au/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https://www.domainname.com.au/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-AU"},{"@type":"ImageObject","@id":"https://www.domainname.com.au/#primaryimage","inLanguage":"en-AU","url":"https://www.domainname.com.au/wp-content/uploads/2017/04/sydney-harbor-991981_1280.jpg","contentUrl":"https://www.domainname.com.au/wp-content/uploads/2017/04/sydney-harbor-991981_1280.jpg","width":1280,"height":960},{"@type":"WebPage","@id":"https://www.domainname.com.au/#webpage","url":"https://www.domainname.com.au/","name":"HTML elements added to a typical snippet of a homepage","isPartOf":{"@id":"https://www.domainname.com.au/#website"},"about":{"@id":"https://www.domainname.com.au/#organization"},"datePublished":"2014-11-21T01:51:32+00:00","dateModified":"2021-11-06T09:16:50+00:00","description":"A typical page description about company named XYZ Pty Ltd who specialise is nothing that special. Check us out!","breadcrumb":{"@id":"https://www.domainname.com.au/#breadcrumb"},"inLanguage":"en-AU","potentialAction":[{"@type":"ReadAction","target":["https://www.domainname.com.au/"]}]},{"@type":"BreadcrumbList","@id":"https://www.domainname.com.au/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home"}]}]}</script>

That's a pretty typical output and that adds 4186 characters to the HTML. This is only for a single homepage, generally byte size would increase deeper into a site due to the multiplication of longer URI paths.

This data is amplified for each page, as by default, Wordpress isn't a Single Page App (SPA) meaning that this overhead is added to every single HTML page of a Wordpress website. Even with SPA's running the modern performance-oriented Next.js framework - this bulk is added to every single fresh request. That's a lot of data added to a massive majority of the most visiting [downloaded] pages on the internet.

Who benefits from this code?

While the code is "open" there are 3 networks with there's names with varying degrees painted on this mess: Facebook, Twitter and Google. And they aren't alone Pinterest, Bing, etc all have their own plea to site owners to add extra metadata to every HTML page out there.

The human site visitor gets no direct benefit from this extra code they are forced to download.

Suggested Solution

There are many ways around this, including reverting to the original way; where crawlers index for the actual body content of the page (and were not open to being gamed by being told one thing [in meta tags] while the content shows something else [ads]). In fact, if you don't add this meta code, most of these spiders fall back to doing just that (indexing the body content).

But those meta tags now have a purpose; I suggest that this extra code fattening up the web could be selective in who sees it. I suggest WordPress SEO plugins (with their large user base), work with the indexing services to lock in their bot's "user-agent" and only output the extra kraft when a bot comes requesting it. When it's just Joe Blow in their iOS Safari or desktop Chrome - this weight doesn't need to be sent over the wire, and the user's browser parser can get to the good parts of HTML sooner (all this unnecessary weight is blocking prime position in the DOM up high at the start of the file before the rendered goodness of the <body content), so pages render faster.

Proxy-style caches like Cloudflare are better prepared to handle this issue in an immediate response more than most... they know who the bots are!

They could transparently strip this data from the origin's response for every human request while allowing the full HTML (or meta-only) to be severed to the spiders. Cloudflare already care a lot about over-the-wire performance (in many ways) and their initiative on Crawler Hints to reduce "random" crawling schedules has some similar goals to this idea.

Aren't you meant to give bots the same content that users see?

It's already considered best practice to change layouts and optimise media (images/video) based on device viewport size, processor, memory and network performance, the web experience is not the same for everyone - why not remove the bot-specific content for non-bot clients?

Could it happen?

I've singled out Yoast and Cloudflare as they both "control" about 30% of the web - that's not saying they see 60% of the web combined, but even still, changes at their level could see reductions in many gigabytes per second being sent around the web - this has massive flow-on effects (less cables, less storage, less processing, less costs, less power, less carbon).

Both services also have such a large coverage that, through their network effect, they would already "know" these bots and can value-add when they see them trawling, and reduce the bloat for everyone else.

Yoast and Cloudflare already have relations at Google (and likely the other spider owners), so they could work with them to implement this without distribution - but it's something you (as a webmaster) could try today. In Google's case, the bot's IPs are public.

Cover image by Noah Buscher on Unsplash