12/15/2007

Duplicate Content Issues

Moderated by Danny Sullivan – Search Engine Watch and organizer of the conference. Moved into large room…filling up nicely. Introduces topic and first speaker.

Anne Kennedy – Beyond Ink. “Double Trouble - How to avoid duplicate content Penalties.” What it is, why it is a problem, how to spot it, and how to fix it. Will also focus on “:inadvertent” duplicate content.

What is dupe content: Multiple URL’s with same content…identical homepages w/same content. Why is it a problem? Because “they” say so. Recommends looking at the webmaster guidelines at G, Open Directory, and Yahoo. The real reason that this is a problem uis that you wind up confusing the SE robots.

Mirror sites: 1 website, 2 domains. Shows example of rsdfoundation.org. Somebody in the academic CPU center decided that “since SE’s like .edu domains,” they should put the content live on the University of Florida site. Confusing the Bot: 2 URLs. Links to multiple root domains from other sites, with inbound links pointing to different domains for the same site. Describes the real domain and “canonical” domain of a client of hers causing the whole site to not be listed.

Confusing the bot: dynamic URL’s. As robots find dynamic content, the site may be returning a different URL with the same content…this is also a problem. Use “repeat the search with omitted results included” feature to see this happening with some websites. Recommends using robtots.txt exclusion and 301 redirect. 301 redirects: “your hero” Server side redirects to a single canonical domain. Test the page to make sure it works, ensure you use 301’s instead of 3o02. Find code for this at beyongink.,com/301redirect. You can also contact Google and use the “reinclusion request” in the subject line to get help.

Shari Thurow – Grandtastic Designs
Will speak about the way some SE’s filter out dupe content. Some ways include but not limited to : content properties, linkage properties. Content evolution, etc…see below.
Content properties: SE looks for unique content by removing “boilerplates” such as navigation areas, etc. and analyzing the “good stuff.”

Dupe content filters: linkage properties. Looking at inbound and outbound links to determine if it is dupe content spam? The way that they can determine it isn’t is by seeing that the linkage properties is different for each site. Content evalution: in general 65% of websites will not change info on a daily basis. .8% of web content will change compeletelty on a weekly basis, such as a news site. Host name resolution.. Domain anem, IP address, and a host name are 3 different things. Used example of the host name origin.bmw.com. talks about one method of attempting to spam that is able to be caught because they all resolve to the same host name. Lastly: Shingle comparison: Every document has unique fingerprint. They break this down into a set of word patterns to determine if the content is duplicate. Recommends reading anything by Andre Broder (sp?) about Shingles. With sample site, each word set on a page is similar with 3 pages with unique URLs that have the same word sets on each page. This is not dupe content spam, though. (sorry missed the reason for this)

If you are sharing content across a network/multiple publications is to use the robots exclusion protocol on dupe pages from the “main page.” PDF’s are another type of duplicate content. Use the robots txt file to exclude on of them. Some dupe content is considered spam because the SE’s only allow 2 pages per site per SERP. Thus additional content will end up in the supplemental results. If you know your network is going to deliver dupe content, don’t let the SE’s decide what will be presented in the SERPs – instead, use 301’s and robots.txt.

Jake Baille – True Local
“Dr. Phil on Duplicate Content.” Why does it happen? Top 6 dupe content mistakes: circular navigation. Print-friendly pages. Inconsistent linking. Product only pages. Transparent serving. Bad cloaking.

Circular navigation: cause multiple paths though a website. Fix: define in a consistent way method of addressing a page of content. Ie: brand to category to content or brand to content to category, etc. This is irrespective of navigation path. If you are bread crumbing, track paths through cookies.

Print friendly pages: all print friendly pages are diff designed with the same content. Fix: block se’s from print friendly pages

Link not working for you any more: calling directory index pages by different paths such as /directory, /directory/, and /directory/index.asp. fix: make sure you ref pages consistently. To avoid probs with external links, pick a canonical form and 301 redirect all others to the chosen version. Takes six months to “get back” from this.

Product pages with nothing differentiating them form other pages: bad, bad, bad…add new content.

Not good to be transparent: badly impleemted rewrite code, DNS errors with multiple domains. Poorly implemented cloaking/session ID remnoval code. Fixes: domains should be redirected to the main site, not DNS aliased. Picka canonical form to access content and saty with it. Has seen many “incomple” mod rewrites, that allow for the contued reference of the old page.

If the suit doesn’t fit, don’t wear it. Poorly implemented cloaking scripts serve the same doorway page over and over again. Fixes: Don’t use cloaking scripts you didn’t write. Make sure your cloaking script is retuning separate content for each URL being cloaked. (Lots of laughs during this part between him and Matt Cutts) The same content should never be accessible from different URL’s…ever!

Rajat Mukherjee – Yahoo. Informal remarks. Glad to be here. A few comments: in general, try not to make same content available through multiple URL’s. He says SE’s are not vindictive folks, matt does snoop around and take pictures every one in a while (laughs). Rather than looking for ways to demote content, we are trying to find the right content to promote. Whenever possible, try to avoid it. You may want to create a new version of a site…be extra certain that robots don’t crawl both versions. Remember that independent of the size of the index, there will always be capacity constraints.

Matt Cutts – Google Not prepared, but informal remarks. High order nits: what do people worry about? He often finds that honest webmasters worry about dupe content when they don’t need to. G tries to always return the “best” version of a page. Some people are less conscious. The person claimed he was having problems with dupe content and not appearing in both G and Y. Turns out he had 2500 domains. A lot of people ask about articles split into parts and then printable versions. Do not worry about G penalizing for this. Different top level domains: if you own a .com and a.fr, for example, don’t worry about dupe content in this case. General rule of thumb: think of SE’s as a sort of a hyperactive 4 year old kid that is smart in some ways and not so in others: use KISS rule and keep it simple. Pick a preferred host and stick with it…such as domain.com or www.domain.com.

Make sure you are consistent in your linking, because this will cause problems for robots. Use absolute links since they don’t usually get re-written by scarpers. Speaking of…make sure you have a copyright notice at the bottom of each page. Thinks you should use this a a blogger too. They have been trying to produce better ways to figure these kinds of things, and some of this “picking the right host” framework is in the new Bid Daddy center. Also recommend using the sitemaps tool to help diagnose and debug content. Sitempas has a tool where you can take robots.txt “out for a test drive.” How would the Googlebot really respond to this? Will tell you specific things that will be disallowed.

Q&A

First Danny…going back to feeding content. How \can you ensure your page will be the original page and thus the displayed one. Rajat: we are trying hard to determine what the original page is, by using shingling techniques and other techniques to determine if the content is altered. Matt: has heard more people are concerned about this. Asks how many have had content stolen: lot of hands. 3 methods of copying someone else’s content: 1. Steal from search engine (copying directly from results). 2. Outright webpage copy stolen. Usually the lifetime of that is relatively long. 3rd type is RSS scraping…this is more difficult, since it can copied so quickly. This is difficult to catch because it can happen so much quicker than scarping from a webpage might happen. If it is always you that is getting ripped off, he says, that is actually point in your favor. They can try to see who wrote stuff historically…how much you have been copied from, and how much of people’s stuff you copy.


Someone asks about having a hundred directory types of sites, and using the same instructions for adding content, will this trigger duplicate content? Make sure that there is “real content” on each site. He would recommend using one domain to host the directions. Say “we are part of this network so go here for instructions.” Matt adds that diversity is very useful.

Using a hidden DIV…what is the policy on hidden links and JavaScript? Matt: in general hidden links are a bad thing. The content should be of use to a visitor, and thus so should the link be visible. Re: JavaScript use also can be misused to try and cheat, so be careful. SE’s are getting smarter about JS, a lot of times simple heuristics can do the work. Rajat adds: make sure that you know that intent is clear, and finishes with “so cloaking is bad.” (Lots of laughs) Jake ads that if you have an Ajax application that each gets different content, serve a cloaked page to the SE’;s and the Ajax to the users. Hide the Ajax interface from the SE’s, and keep the content on the page (styling it out if needed). Matt says “NO…we will care, and it can get you banned if you are cloaking. He recommends if you have a weird site menu and “all sorts of Ajax,” use the sitemap to serve the content!

Didn’t really get the whole question, but Matt answers “there is nothing wrong with creating a template, but if you aren’t adding useful content it’s going to end up in the ghetto/bad neighborhood with lots of other ‘useless’ sites.” Rajat makes what he says is a philosophical content: SE’s are still in infancy, and while certain limitations re: Ajax etc may exist today, the SE’s will be improving here.

If I have five paragraphs on a page, and two are available on other sites, is this dupe? Rule of thumb: ask someone who has no association with you to look at he two pages and say what they feel. Kind of like the “grandma test.” Someone says would you have your grandma look at your herbal Viagra site?” (laughs…this is from a comment made earlier about herbal Viagra) If lots of content is copied, then it looks more like a less value site.

As great as this session is…catch the next conference and you’ll get the rest of the Q&A.

This is part of the Search Engine Roundtable Blog coverage of the New York Search Engine Strategies Conference and Expo 2006. For other SES topics covered, please visit the Roundtable SES NYC 2006 category archives.

SES NYC Tag:

10 comments:

John said...

oakley sunglasses wholesale
coach factory outlet
ugg boots for men
uggs on sale
coach factory outlet
christian louboutin
tods outlet
gucci handbags
gucci shoes
michael kors outlet online
uggs boots
coach outlet store online
coach factory outlet
michael kors outlet
michael kors handbags
ugg boots
celine handbags
cheap oakleys
ugg boots
cheap oakleys
michael kors bag
tory burch outlet
coach outlet online
ugg boots
adidas superstar
michael kors outlet
canada goose
michael kors outlet online
abercrombie
20151215yuanyuan

dong dong23 said...

insanity workout
ray ban wayfarer
air jordans
burberry handbags
coach factory outlet
tods outlet online
mont blanc
louis vuitton bags
adidas superstars
retro 11
michael kors outlet
nike uk
abercrombie and fitch
ray ban sunglasses
coach outlet
michael kors outlet
michael kors handbags
louis vuitton purses
coach outlet
louis vuitton outlet stores
kate spade outlet
ralph lauren outlet
oakley sunglasses
mcm handbags
longchamp handbags
coach factorty outlet online
oakley outlet
michael kors handbags
hollister kids
basketball shoes
replica watches
air max 95
toms outlet
cheap air max
gucci handbags
hollister jeans
michael kors outlet
louis vuitton outlet
ray ban sunglasses
coach factory outlet
20164.5wengdongdong

Zhenhong Bao said...

kate spade uk
coach factory outlet
lebron james shoes
kate spade handbags
coach outlet
ray ban sunglasses
coach handbags outlet
thomas sabo uk
michael kors handbags
ray ban sunglasses
reebok trainers
christian louboutin uk
true religion outlet uk
michael kors outlet online
longchamp pliage
bottega veneta outlet online
longchamp outlet
links of london jewellery
tory burch outlet
mcm outlet
coach outlet online
ferragamo outlet
rolex watches for sale
michael kors online outlet
polo ralph lauren
oakley sunglasses
chrome hearts outlet
prada outlet
michael kors outlet uk
longchamp pliage
ray-ban sunglasses
tory burch shoes
swarovski outlet
oakley canada
longchamp handbags
20160509zhenhong

mmjiaxin said...

adidas nmd runner
http://www.jordanretro.uk
tiffany and co
michael kors handbags
cheap oakley sunglasses
cheap air jordans
tiffany and co outlet
tiffany & co
http://www.cheapairjordan.uk
ray ban sunglasses
cheap basketball shoes
cheap authentic jordans
nfl jerseys
michael kors outlet
michael kors factory outlet
http://www.huarachesshoes.uk
fitflops clearance
http://www.tiffanyand.co.uk
Cheap Oakley Sunglasses
michael kors outlet
gg

dong dong23 said...

coach outlet
louis vuitton handbags
adidas shoes
louis vuitton handbags
kobe bryant shoes
tods shoes
ralph lauren sale
louis vuitton
michael kors outlet clearance
adidas originals shoes
ralph lauren outlet
nike outlet store
coach factory outlet
ray ban outlet
oakley sunglasses
toms shoes
ralph lauren polo
instyler curling iron
abercrombie & fitch
coach outlet
jordan retro 3
oakley sunglasses
ralph lauren polo
coach factory outlet
tory burch outlet
michael kors outlet online
cheap jordan shoes
true religion outlet
lebron james shoes
polo ralph lauren outlet
lebron james shoes 13
true religion outlet
pandora jewelry
jordan shoes
adidas yeezy
louis vuitton handbags
longchamp handbags
nike air max 90
coach factory outlet
toms shoes
20167.16wengdongdong

Hua Cai said...

michael kors outlet
rolex orologi
ralph lauren polo
nba jerseys
christian louboutin uk
mcm outlet
mulberry outlet,mulberry handbags outlet
reebok shoes
ray ban sunglasses
coach outlet
coach outlet
beats headphones
michael kors outlet online
coach factory outlet
michael kors outlet clearance
cheap mlb jerseys
nike air max 90
basketball shoes,basketball sneakers,lebron james shoes,sports shoes,kobe bryant shoes,kobe sneakers,nike basketball shoes,running shoes,mens sport shoes,nike shoes
versace sunglasses
tiffany jewellery
burberry sunglasses
tory burch sandals
cheap oakley sunglasses
polo outlet
coach outlet
coach outlet online
ysl outlet
fitflops sale
tods outlet online
christian louboutin shoes
hermes bags
michael kors outlet clearance
lebron shoes
cheap nfl jersey
cheap replica watches
20160722caihuali

chenlina said...

ugg boots
new england patriots jerseys
instyler curling iron
vans shoes
true religion
jordan 11s
jordans
oakley sunglasses
louis vuitton outlet
coach outlet
golden state warriors jerseys
celine outlet
tory burch outlet
air jordan 13
cincinnati bengals jerseys
louis vuitton handbags
cheap cartier watches
kate spade
the north face jackets
ralph lauren uk
coach factory outlet
ray ban sunglasses
michael kors outlet
adidas originals
true religion shorts
louis vuitton outlet
ray ban sunglasses
uggs australia
giuseppe zanotti sneakers
ralph lauren clearance
kobe bryant shoes
canada goose uk
cheap jordans
true religion jeans
louis vuitton handbags
adidas shoes
tommy hilfiger outlet
oakley sunglasses
fitflops
christian louboutin outlet
chenlina20160817

John said...

ugg boots sale
supra shoes
coach factory outlet online
ray bans
red bottom shoes for women
coach outlet online
louboutin sale
canada goose outlet
canada goose sale
pandora jewelry outlet
2016107yuanyuan

John said...

rolex replica watches
canada goose jackets
coach outlet online
louis vuitton handbags
ed hardy outlet
kate spade outlet online
true religion outlet store
longchamp pliage
air max 90
tory burch outlet online
2017207yuanyuan

龙大猫 said...

michael kors outlet
yeezy shoes
air force 1
adidas yeezy boost
authentic jordans
yeezy sneakers
yeezy boost 350
true religion jeans
adidas stan smith
longchamp

Live Page Popularity