2/12/2008

Fuzzy Set Theory & Semantic Connectivity

Semantic connectivity is used by the search engines to build their own 'Thesaurus' and 'Dictionary' to help them to determine how certain terms & topics are related. By simply scanning their massive databases of content on the web, they can use Fuzzy Set Theory and certain equations (described here) to connect terms and start to understand web pages/sites more like a human does.

The professional SEO does not neccessarily need to use semantic connectivity measurement tools in order to optimize, but for those advanced SEOs who seek every advantage, semantic connectivity measurements can help in each of the following sectors:

  • Measuring which keyword phrases to target
  • Measuring which keyword phrases to include on a page about a certain topic
  • Measuring the relationships of text on other high rankings sites/pages
  • Finding pages that provide 'relevant' themed links

Although the source for this material is highly technical, SEO specialists need only know the principles to take away valuable information. It is important to keep in mind that although the world of IR (Information Retrieval, aka search) has hundreds of technical, often difficult to comprehend terms, these can be broken down and understood even by an SEO novice.

First, we must define 'fuzzy logic' in comparison to other types of searches. The following chart explains some common types of searches in the IR field:

Proximity searches:
A proximity search uses the order of the search phrase to find related documents. For example, when you search for "Sweet German Mustard", you are specifying only a precise proximity match. If the quotes are removed, the proximity of the search terms still matters to the search engine, but it will now show documents that don't exactly match the order of the search phrase, like "Sweet Mustard - German".
Fuzzy logic:
Fuzzy logic technically refers to logic which is not categorically true or false. A common example is whether a day is sunny - is 50% cloud cover a sunny day?, etc. In search, fuzzy logic is often used for misspellings.
Boolean searches:
These are searches that use Boolean terms like AND, OR, NOT, etc. This type of logic is used to append or restrict the documents returned in a search.
Term Weighting:
This type of search weights particular terms more heavily than others in order to produce superior search results. A manual version of this type of search would allow a user to specify the weight of each term, i.e. fruity:4 pebbles:1, which would make the results reflect a preference for 'fruity' over 'pebbles' in the results returned.

Fuzzy Set Theory (an offshoot of fuzzy logic created by Dr. Zadeh in 1969 - definition) is used by IR models (search engines) to discover the semantic connectivity between two words. Rather than using a thesaurus or dictionary to try to reason out whether 2 words are related to each other, an IR system can use its massive database of content to puzzle out the relationships.

This sounds extremely complicated, but the foundations are simple. Search engines need to rely on machine logic (True/False, Yes/No, etc.). Machine logic doesn't have a way of thinking like a human - i.e. Orange & Banana are both fruits, but Orange & Banana are not both round. To a human this is intuitive, or if it is not, it can be explained. For a machine to understand this concept and pick up on others like it, semantic connectivity can be the key. The massive human knowledge on the web can be captured into the system's index and analyzed to artificially create the relationships humans have made. Thus, a machine knows an orange is round and a banana is not by scanning thousands of occurences of banana and orange in its index and noting that round and banana do not have great co-occurence, while orange and round do.

This is how the use of 'fuzzy logic' comes into play, and the use of Fuzzy Set Theory helps the computer to understand how terms are related, simply by measuring how often and in what context they are used together.

For an SEO, this usage opens our eyes to realizing how search engines recognize the connections between words, phrases and ideas on the web. As semantic connectivity becomes a bigger part of search engine algorithms, we can expect greater theming of pages, sites & links. It will be important going into the future to realize the search engines' ability to pick up on ideas & themes and recognize content, links & pages that don't fit well into the scheme of a website.

For more information, see:

From SeoMoz

No comments:

Live Page Popularity