When you really look at it, search query language is pretty odd. Imagine the conversation in keywords, if you were looking for an apartment. You'd walk up to the man in the first floor apartment and say, "Two-bedroom sublet NYC." The super would answer "http colon slash slash dub-dub-dub dot 2-bedroom sublet NYC arm-and-a-leg dot com."
What if you could deliver search results based not on exactly matching words or phrases, but instead on context and relevance? What if the right site were delivered to the right user, even if he or she misspelled a word, or didn't even pick the right keyword or phrase?
That sure would get us past the monkey-see, monkey-do era of search. You know, press the bar; get the cheese. Pick the right word; get the right page. Most of the time, this kind of search works pretty well, but it's an odd, non-intuitive process for many people. In the future, the monkeys will be gone.
The big brains out there are asking: "Why do we need keywords at all?" If we used what the linguists call "natural language," we'd ask a question in simple (or even complex) terms, and not have to get obsessive about guessing the right code. Search engines would "get it," even though you didn't guess how an SEO genius might have guessed what you would probably type when you were looking for, let's say, preventable childhood diseases.
That's the thinking behind Latent Semantic Indexing (LSI). LSI is a statistical method of comparing the relationships between words and passages of text in order to evaluate the quality of the content. It's a way to arrive at context and meaning, without a one-to-one match.
The math behind this is pretty full on. It's rife with matrices of words and context, with high levels of what they call "dimensions": the higher the dimension, the more accurate the simulation of human contextual evaluation.
Outside the search world, LSI is used in educational environments to summarize materials, to select appropriate texts for learners based on varying levels of background knowledge, and even to score the content of essays and other texts.
In the search space, LSI might be applied as an extension to current lexical, semantic or contextual methods, enabling an algorithm to determine that a document is the most relevant match for a key phrase, even if that phrase does not appear in that document at all. It is considered an improvement over the limitations of word-based matching using keyword density and repetition.
Why is LSI considered to be an improvement over word-based methods? Because word-based methods deliver irrelevant along with relevant results, and even relevant results won't be delivered without the right keyword on both the user and web page side of the equation. LSI solves the basic problem of monkey see, monkey don't.
For example, cdc.gov may be THE authority on preventable childhood diseases, and therefore the best result for someone searching for "preventing palio [sic]". With today's lexical index, cdc.org must have a page on its site that anticipates that phrase and spelling mistake. With LSI, cdc.org would simply need to be the authority on pediatric health. The right page would be delivered, regardless of the presence or absence of the phrase on the site.
Recent index/relevance shakeups have suggested that Google might have increased the weighting of LSI in its algorithm. I can't imagine that many of the opportunists who drop the phrase either know what it means or have the tools and skills necessary to optimize for LSI; certainly not the authors of some of the SEO spam email I have been receiving.
In fact, the very point of LSI might be to put an end to the practice of "stealing eyeballs": fooling search engines into delivering nefarious goods to completely unrelated searches. A good example is the phenomenon of parked domains: URLs that don't go to a legitimate website, but to a site advertising domain registration or other unrelated services.
But that brings up a future-shock issue for those of us who've gotten pretty good at figuring out how to make the artificial rules of SEO get better results for clients. Namely, won't LSI put the very idea of search engine optimization out of business?
The good news is that this is a dynamic industry, full of fertile and daily re-invention. In an LSI utopia, the best SEO vendor stops being a keyword jockey, and becomes a content consultant, helping clients to prove that they're authorities on their chosen subjects.
Our company has always been focused on removing blockers to good, relevant content. We think it's a practice that happens to work well under both LSI and lexical algorithms.
Plus, if you're doing it right, you're already focused on a website that proves you're the best in class, so your site is already rewarded with natural search traffic and satisfied users.
Then maybe we can all start using natural language in our search queries instead of monkey talk. Let's face it, finding an apartment or a cure for the collywobbles is hard enough, even when you're both speaking the same language.
The future is all semantics. From where we're standing, that's all good.
Matt Kain is SVP, business development for 24/7 Real Media, Inc. Read full bio.
Not a People Connection member?
Now.. match Chaos Theory and LSI with ORM and KBO.. and you have a solution for brands.. =)
Hi Daniel - boy you're bringing an old article back to life! I wrote this almost 2 years ago!Interesting question. There are lots of SEM/SEO tools which will identify and weight words on a page with any sort of density, but I am not aware of a public tool which will link secondary keywords semantically to the main page theme (ie not just how many times is this keyword used, but how does it reinforce the primary keyword or theme).This functionality probably exists in a few proprietary platforms (for example at The Search Agency, www.thesearchagency.com, we continue to evolve our proprietary Keyword Analysis Tool with lexical and semantic capabilities in multiple languages, and I can't imagine we're alone), but I'm not aware of a publicly available solution.Alternatively, if you aren't afraid of a bit of work, there are excellent academic tools available, which could be applied to HTML with a bit of offline or desktop jiggery-pokery. For example, Semantic Knowledge's Tropes Zoom (http://www.semantic-knowledge.com/zoom.htm) is a useful document indexing and analysis tool which could index a folder of webpages on your computer. Not a natural part of the SEO workflow though - you want to point a tool at a URL and get back information.Anyone else out there know of a public semantic SEO tool?
Hi, good article here. Is there a Keyword tool you recommend that provides the "side-words" in addition to word counts as part of the tool?Thanks,Danhttp://linkvanareviews.com
I agree with you about the importance of good relevant content and I also agree that most people don't know what LSI is or what it means.
I wrote a simple explanation of Latent Semantic Indexing without the college degree math normally required to understand the subject and a follow up post on the LSI myth.
It is really strange that some people believe that search engines use LSI because the primitive semantic component actually used is far more interesting.
It will be interesting to see if new user-generated portals such as Cha-Cha and Mahalo will be able to leverage user search queries to provide more valuable SERPs, in efforts to compete on search traffic. Granted each SE's challenge of capitalizing on both traffic through relevant results, and premier placement for deep-pocket advertisers, I do feel that intentions remain fair in this space. Simply put, we all want the right answers to our queries; and those that can figure out how to produce these results reliably and consistently, will win.
excellent article, I hope the change won't be to long in coming, and it may help a lot more people that are not to techie
Interesting concept. In a fast food world with people demanding they do less and less to get more and more, I'd venture that there would be at least a couple years of balking at having to type entire sentences instead of a couple key words, but ultimately it makes sense yet only based on how accurately the process works.
I completely agree that such a shift would lead those of us in the SEO arena to focus more on content and authority. Surely there's got to be a better gauge than how many other web sites link to yours or generating hundreds of in-site pages to flood the search engines!
Where it will inevitably break down, I am sure, is that at least some optimizers will focus on nefarious black hat ways of making their client sites authority destinations - just as hackers continually find new ways to exploit software - evolution of solutions goes hand in hand with evolution of methods to take advantage of weaknesses.
Wouldn't it be great if it were possible to write naturally flowing good quality content for your blog/site and be found by web browsers when they typed in a search term.
I think LSI would be wonderful for small businesses e.g. my travel business. If I wrote an in depth, well researched guide to a destination in Europe, I'd have a fair chance of coming up high in a search. At present a pretty sketchy basic guide to the same destination, doctored by a Search Engine Optimisation (SEO) expert and the on the blog/site of a large travel company who are biding on the keywords for that destination has a much better chance of being found by a prospective traveller carrying out a search. Whereas with LSI, I could be rewarded for providing a high quality destination guide and the searcher would be able to easily find my superior guide. So it would be a win-win situation for the hardworking small business and the traveller.
Well, I can dream. I have a suspicion that the techie experts will be able to find various methods of manipulating LSI.
I agreed with Michelle, SEO is still a must even if there would be a implementaion of LSI models in the Search Engines. If you really notice, everyone of us, i mean this world any corner, everything we talk and we think need to be optimized. i would strongly believe LSI and SEO will work best hand in hand when it comes to "revelevacies".
I think you make a good point, but I don't believe LSI will mean the end of SEO as we know it. People will always try the quick and easy phraseology before getting more in-depth, and will not be satisfied with what the internet deems as "THE authority" on any given subject as their first or only option. Given the idiosyncracies of SEO and analytics, what people really want varies from day to day, hour to hour, and keyword to keyword. Does the need for LSI exist? YES! But the traditional SEO mechanisms do give us our answers, even if we have to click past page 1. We are trained monkeys, after all!
Full Summit Calendar | Request Invite
1 9 Facebook hacks that will blow your mind
2 5 brands that climbed out of reputation hell
3 The most meaningless (and hilarious) job titles on LinkedIn
4 7 emotions connecting brands and consumers
5 Agencies under attack: How the middle man must evolve