When you really look at it, search query language is pretty odd. Imagine the conversation in keywords, if you were looking for an apartment. You'd walk up to the man in the first floor apartment and say, "Two-bedroom sublet NYC." The super would answer "http colon slash slash dub-dub-dub dot 2-bedroom sublet NYC arm-and-a-leg dot com."
What if you could deliver search results based not on exactly matching words or phrases, but instead on context and relevance? What if the right site were delivered to the right user, even if he or she misspelled a word, or didn't even pick the right keyword or phrase?
That sure would get us past the monkey-see, monkey-do era of search. You know, press the bar; get the cheese. Pick the right word; get the right page. Most of the time, this kind of search works pretty well, but it's an odd, non-intuitive process for many people. In the future, the monkeys will be gone.
The big brains out there are asking: "Why do we need keywords at all?" If we used what the linguists call "natural language," we'd ask a question in simple (or even complex) terms, and not have to get obsessive about guessing the right code. Search engines would "get it," even though you didn't guess how an SEO genius might have guessed what you would probably type when you were looking for, let's say, preventable childhood diseases.
That's the thinking behind Latent Semantic Indexing (LSI). LSI is a statistical method of comparing the relationships between words and passages of text in order to evaluate the quality of the content. It's a way to arrive at context and meaning, without a one-to-one match.
The math behind this is pretty full on. It's rife with matrices of words and context, with high levels of what they call "dimensions": the higher the dimension, the more accurate the simulation of human contextual evaluation.
Outside the search world, LSI is used in educational environments to summarize materials, to select appropriate texts for learners based on varying levels of background knowledge, and even to score the content of essays and other texts.
In the search space, LSI might be applied as an extension to current lexical, semantic or contextual methods, enabling an algorithm to determine that a document is the most relevant match for a key phrase, even if that phrase does not appear in that document at all. It is considered an improvement over the limitations of word-based matching using keyword density and repetition.
Why is LSI considered to be an improvement over word-based methods? Because word-based methods deliver irrelevant along with relevant results, and even relevant results won't be delivered without the right keyword on both the user and web page side of the equation. LSI solves the basic problem of monkey see, monkey don't.
For example, cdc.gov may be THE authority on preventable childhood diseases, and therefore the best result for someone searching for "preventing palio [sic]". With today's lexical index, cdc.org must have a page on its site that anticipates that phrase and spelling mistake. With LSI, cdc.org would simply need to be the authority on pediatric health. The right page would be delivered, regardless of the presence or absence of the phrase on the site.
Recent index/relevance shakeups have suggested that Google might have increased the weighting of LSI in its algorithm. I can't imagine that many of the opportunists who drop the phrase either know what it means or have the tools and skills necessary to optimize for LSI; certainly not the authors of some of the SEO spam email I have been receiving.
In fact, the very point of LSI might be to put an end to the practice of "stealing eyeballs": fooling search engines into delivering nefarious goods to completely unrelated searches. A good example is the phenomenon of parked domains: URLs that don't go to a legitimate website, but to a site advertising domain registration or other unrelated services.
But that brings up a future-shock issue for those of us who've gotten pretty good at figuring out how to make the artificial rules of SEO get better results for clients. Namely, won't LSI put the very idea of search engine optimization out of business?
The good news is that this is a dynamic industry, full of fertile and daily re-invention. In an LSI utopia, the best SEO vendor stops being a keyword jockey, and becomes a content consultant, helping clients to prove that they're authorities on their chosen subjects.
Our company has always been focused on removing blockers to good, relevant content. We think it's a practice that happens to work well under both LSI and lexical algorithms.
Plus, if you're doing it right, you're already focused on a website that proves you're the best in class, so your site is already rewarded with natural search traffic and satisfied users.
Then maybe we can all start using natural language in our search queries instead of monkey talk. Let's face it, finding an apartment or a cure for the collywobbles is hard enough, even when you're both speaking the same language.
The future is all semantics. From where we're standing, that's all good.
Matt Kain is SVP, business development for 24/7 Real Media, Inc. Read full bio.