Why semantics are the future of paid search

Dr. Riza C. Berkan
It's no secret that efficiently monetizing the long tail and improving contextual advertising relevancy rates comprise the most lucrative prospects in paid-search advertising. The long tail is a big beast corresponding to a large chunk of all searches: a Google spokesperson stated that "20 to 25 percent of the queries we see today, we have never seen before," while Jim Lanzone, former CEO of Ask.com, told the Associated Press last year, "On any given day, 60 percent of the search requests we get, we have never seen before."

For starters, let's analyze the long tail advertising. To reach the audiences encapsulated in the long tail, we must move beyond decade-old popularity systems. Search monetization platforms suffer from the inherent statistical limitations of the popularity systems and are not equipped with the technology to serve the long tail: today about 30 percent of searches drive 70 percent of search revenue. Semantic technology will lift this industry barrier and expand the paid search advertising market's monetization potential by serving meaningful ads in the long tail. 

Take your own website for example. To see what long tail queries look like, you do not necessarily have to review the search logs. You can simply look at the word distributions on your own website. For example, if you have a page that includes 200 words describing your product or idea, then you can ask this question: "How many possible questions/queries that can be asked does this page have an answer to?" If you are not a trained linguist, you will possibly undermine the number. Since we are search engine experts, we can tell you that the number of possible queries can be in the thousands for a page with just 200 significant words.

Thousands of queries include all possible meaningful variations of the ideas you have presented on your webpage. However, if you use a linear system with no semantics involved, the number of word permutations or combinations can be in the billions. Thus, being able to extract only the meaningful ones is the heart of the technological problem.

It is an infeasible task to extract thousands of queries manually for every webpage you have, not to mention creating an advertising campaign by bidding these terms one by one. So, here is another definition of the long tail for you: the queries that you are not extracting from your webpages are in fact the true long tail queries. The ones that you are extracting are the fat tail, popular and short queries, and most likely, you are using these in your advertising campaign today.

For example, if you type "Celebrex" into Google, you see an ad for the product, as shown below.

The advertiser made a bid for the term "Celebrex," and they may have also bid for "arthritis," "arthritis pain," "arthritis treatment," and so on. But, as shown below, they have not made a bid for the query "What can ease arthritis pain, stiffness and inflammation?"

The reason there is no ad for this query is because there are zillions of possible queries for which the advertiser did not have the means to create campaigns. The cost of extracting them manually and bidding on them one by one far exceeds the benefits. Therefore, long tail queries are not monetized successfully in today's systems. The next challenge in online advertising is to take advantage of this sleeping giant.

To tackle the long tail monetization problem, semantic search engines need two important technologies: 

  1. A method of indexing that extracts all possible meaningful questions from a given page.

  2. An automated long tail campaign so advertisers do not have to do it manually.

Let's expand on the example shown above. By indexing Celebrex's webpages and creating a long tail campaign, the query "What can ease arthritis pain, stiffness and inflammation?" could serve an ad touting "Celebrex for Arthritis Relief."

For Celebrex, there are thousands of queries that can be extracted by indexing, including a vast amount of semantic variations. An advertiser in a semantic search engine's system can create a long tail campaign by simply pointing the URLs for their products and making one bid for the entire campaign. To simplify economics, the advertiser only pays for the successful long tail queries.

Furthermore, user-friendly keyword or phrase generation tools will create new bidding rooms for longer meaningful keyword combinations. The easy access to these new ad triggers will increase the size of the paid search advertising pie.

Let us move along to a contextual advertising discussion. Current statistical search algorithms have no capability to "understand" the content of a webpage. Contextual ads are matched by brute calculation of keyword occurrences and other ad success metrics. For example, take a news article that describes a serial killer who placed bodies in suitcases. If the articles goes into detail on how the suitcases were disposed, a text ad for Samsonite luggage can appear as a contextual match (this is a real ad placement at CNN.com). In comparison, semantic technology would know the article is about a murder and would display only relevant ads (or not, because any ad may be inappropriate). This level of precision will turn a new leaf in the chapter of contextual advertising as increased relevancy will lead to better monetization of publishers' content.

There is lots of room for improvement in current paid search advertising systems. Semantic technologies will enhance the relevancy, reach and usability of today's systems, thereby increasing the amount of potential revenue for the entire industry.

Dr. Riza C Berkan is CEO, hakia.com.


Commenter: Neil Stumpie

2008, November 06

Your point is well-taken. So if I run a campaign with lots of long-tail phrases, let's say on Adwords, do you suggest using phrase match or exact match? The choice has to be made.

Commenter: Eddy Gonzalez

2008, November 06

Very interesting article and very relevant from my experiences. I have a constant battle whether to add lots of long tail keywords in my campaigns because history has shown that they have phenomenal CTR, problem is that their impressions are so low that I have weigh up whether it is worth the management time, of having them in the campaign.
If only there was a system in Adwords/Google Search where I could target products & services without the chore of managing 100s of ad groups with just 2/3 keywords in!

Commenter: Bruce McDermott

2008, November 04

Excellent article! This is exactly what I've been saying for a long time. Do you want a million hits and one sale (ppc and heavily hit generic words with no conversion) or 500 hits and 200 sales (organic search phrases that exactly match the buyer's intent when he has his wallet out).

Now all you have to do is write about the next important revelation. Selecting the client that is perfect for SEO rather than letting clients select you. Yes, there are perfect clients for SEO, and knowing which ones they are can make or break your business.

Commenter: Dr. Riza Berkan

2008, November 04

Broad match can help do a degree, but it is an uncontrollable action. If I define "suitcase" in broad match, my ads could appear in "suitcase bomber". Broad match (which is the OR logic in Boolean algebra) is the least accurate method, many advertizers think twice before using it. Securing top positions with broadmatch by bidding is also an endless pit since everyone can do it. I am sure a company like Celebrex would be smart enough to do it as you suggested. Thinking beyond the obvious is the challenge. Creating a campaign using a number of meaningful variations can only be done by semantics, not by the OR logic.

Commenter: Neil Stumpie

2008, November 04

But you fail to mention broad match options in PPC campaigns. In your example, the query is "What can ease arthritis pain, stiffness and inflammation?" In Google Adwords, which is the platform you use in the example, if the advertiser had bid on the keywords "arthritis pain" using broad match, their ad for Celebrex may well have been shown. I believe Yahoo! and Live have similar technology. I find it very strange you did not cover broad matching in an article on this topic. It undercuts much of what you say.