Archive for June 27th, 2009:
Getting Google Crawl Without Indexing?
Very often we have to deal with site (partial) duplicate content issues created by the site CMS, like:
domain.com/category/
domain.com/category/1
Sorting:
domain.com/category/pricing-high-low/
domain.com/category/pricing-low-high/
Blocking the pages via Robots.txt Disallow directive will prevent bots from crawling the page but is it probably worth trying to let them spidering the page without indexing them? This may help a lot for discovering more inner pages.
The possible ways to do that (especially now that we don’t have to worry much about PR leakage): Read more »
The Internal Battle For SERP Supremacy
In my last article, 8 Key Points to Multiple niche Sites And Controlling Back Links, I cautioned that you need to be careful that a niche focus site might overtake the main site in the SERPs for a particular phrase. One question that came up in the comments of that article was how to prevent this. That led me to thinking about situations in the internal battle for SERP supremacy where you wouldn’t want to prevent that from happening…
CAVEAT: If you haven’t read my blog articles by now, you know that I have a “take no prisoners” approach to my SEO work. To me, it’s about pulling out all the stops, as long as they’re white hat or, at the very least, may scrape the edge of the gray hat world. The reason for this is simple. I am all about winning the SERP wars, and have had the fortune of several clients willing to pay for the best long term results their marketing dollars can buy. If you prefer your reading to be light, or you don’t have the fortitude it takes to do battle for SERP supremacy, you may prefer to check out The Top Ten Reasons Your SEO Sucks by Garrett Pierson, where he succinctly points out what you may not otherwise be willing to accept.
It’s all about inbound links – Or is it?
In many situations, all you want to do is drive the authority status of the main site higher. In this scenario, you want to have as many high quality niche sites as possible out there, yet you don’t want any of them overtaking the main site in the SERP’s. This is usually the case from a marketing perspective where the main site’s all about the most important keyword phrases. As much as you need to drive the independent ranking of those niche sites, if you don’t want them showing up above the main site, you will need to put a great deal of time into ensuring that you only optimize those sites for phrases that are not ones you also optimize the main site for.

Friends with benefits
As experts in our industry, we know how to choose a plethora of keyword phrases. Regardless of the research method, we come up with 500 phrases that offer the most value for every site we work on. And as experts, we often take advantage of longer phrases that contain within them, shorter, even more valuable phrases.
For example – if my client sells Widgets, and their most popular products are Big Blue Widgets, then maybe the phrase we optimize for is Big Blue Widgets.
Well, just by the shear fact that we’re optimizing for that three word phrase, we’re also optimizing, to one degree or another, for Blue Widgets and just any old Widgets. So that one phrase is a three-fer. That’s a serious benefit of going with multi-word phrases.
Of course, the competition for just any old Widgets is huge. Infinitely more challenging than it is for Blue Widgets because only 40% of the Widget sites out there even carry Blue Widgets. And too, only 10% of the sites that sell Blue Widgets carry Big Blue Widgets. So it’s much easier to break out of the pack for the longer phrase.
Yet with enough time and leverage, we can eventually gain ground for all three.
So what’s the problem?
The problem here is that we have to come up with phrases for those niche sites using that same thinking. For example, you may actually optimize the main site for just Widgets or even Blue Widgets, leaving the niche site to be optimized for Big Blue Widgets, and another niche site for Plump Purple Widgets or Small Salmon Widgets.
Therein lies the problem
Let’s say your site is massive – 40,000 pages deep. And you’ve got 5,000 high quality links back – mostly coming from your 287 niche sites. Your PR is a comfortable 9. You’re sitting on top of the world, because you’re in the #1 and #2 position (with SiteLinks of course), for the phrase Widgets. And for Blue Widgets, you’re in the respectable 5th position. Even though you only have one small page that talks about those Blue Widgets, the reason it’s so high up in the SERP’s is all that supporting depth and on-site cross-linking.
And you also have a niche site for those Big Blue Widgets. But til now, that site’s been stuck on the top of page 2 at Google. Which is okay, because that means you’d built the site enough that it comes up higher than 24,596 other sites that sell Big Blue Widgets, and that means it’s got some decent authority / trust, so the links from that site to the main site are pretty good overall.
But then along comes Jacinda Lucricia, VP of Marketing. She says “Hey – we’re running a major campaign this summer. We’re looking for our customers to make their own testimonial video on how our Big Blue Widgets transformed their lives / saved their marriage / cured their disease… But we don’t want the buzz to become diluted by people coming to the site and seeing all the other products we fail miserably at. So we need this to be done on the Big Blue Widget site instead of the main site okay?”
Well heck. Now you’ve got a situation where you might end up getting 147,000 home made flip-cam videos posted to that site, all for one campaign. That will surely result in the niche site skyrocketing to the top for the phrase Big Blue Widgets. Which is okay, right? I mean, think of the buzz when that little site becomes it’s own star in the SERP universe!
Ut Oh – trouble in paradise
Well the potential problem here is that all the new depth and activity and inbound link love (ha! – you thought after my last article that I was going to cave and call it link juice didn’t you? Well, I’m STILL Not drinking that Kool-Aid thank you very much) has resulted in that little niche site also now coming up at the top of Google for the more generic Blue Widgets phrase. And at least for the duration of that campaign, amazingly enough, you’re now also in the #1 position for Widgets.
Is it a good thing or a bad thing?
The above example begs the question – is it a good thing or a bad thing to have a niche site show up above the main site? From a brand perspective, that’s a question for the marketing department to answer. From a site visitor experience it may be a different matter altogether.
If Coriander Appleseed is surfing the web because she needs to buy 500 Super Green Sustainable Widgets and does a search for Widgets, will she see the main site below the Big Blue Widget site in the SERPs and click on it? Or will she click on the Big Blue Widget site? And if so, is there any really obvious navigation link back to the main site for her to click on? Or will she just get frustrated stumbling around that site wondering where that link is?
If she is unaware that your client carries a full line of Widgets in all sizes, shapes and levels of sustainability, you better be sure that she goes right to the main site, or there’s a really obvious / intuitive navigation link for her to find that main site from the niche site without too many clicks.
Algorithms – What’s not to love?
Here’s where the whole multi-site approach can be very challenging. If site A has 200 pages and a fair amount of optimization across the entire site, and site B has 5 pages but they’re highly optimized, it’s quite possible that the five page site due to it’s purity of focus, could in some situations, show up higher than that big site which is more diluted. We just can’t always know when that’s the case thanks to the fact that Google has yet to give us a tool that will tell us in a direct and specific way how they ultimately rank a site. It’s not just the PR thing – they factor in countless other elements they’re not going to fully reveal.
Sure there are general concepts that fit most situations, and we can, at least some of the time, ensure that the niche site doesn’t overtake the main site if we don’t want it to. Yet every once in a while the SERPs are just downright bizarre. And the less planning you do, the more bizarre the results will seem when they’re not what you were shooting for.
Hey good lookin’ what’cha got cookin’?
One thing you can do, if you don’t want that niche site to overtake the main site, is to take it slow – add only a little content at first, and only implement half the normal optimization techniques you would otherwise typically use. Then, over time, check the results, and build from there. It’s the Wash – rinse – repeat approach.
Another tactic involves going the extra 1.6 kilometers with the keyword selection and grouping process. Some markets have enough breadth of keyword usage that you can use truly unique phrases to avoid the conflict altogether. Others require creative thinking in how you group those phrases. So take the time to really analyze those keywords.
For example, maybe you avoid the whole Widget issue altogether. Let’s say those Big Blue Widgets are a type of “plug-in” for web sites. Maybe the niche site becomes highly optimized for Big Blue Plug-In or Big Blue Web Site Plug-in, and there’s only a couple references to the word Widget on the site. If there’s enough people searching for Big Blue Widget and you highly optimize for “Big Blue”, then you could very well come up high in the results for both the plug-in and widget tails.
Maybe the client’s a law firm. And they specialize in mesothelioma. Maybe the main site is optimized for “Mesothelioma Attorney” and the niche site is optimized for “Asbestos Cancer Lawyer”. Both phrases are valid and valuable, yet the first one is going to be more likely to bring in higher qualified leads. And it wouldn’t hurt to have people find the niche site through that second phrase at all.
Driving the brand
If your client is The Coca Cola company, you want that niche site to come up first for anyone looking for Coca Cola, and you want your corporate site to come up below that. That’s how big brand marketing works. As much as the makers of Coca Cola want prospective investors, corporate partners, and future employees to find that corporate site, that’s really a secondary goal in the world of big business advertising.
Either way you slice it
Whichever approach you take, this isn’t a hack job process. It takes proper planning and execution. (and reading my 8 key points article so you can sharpen that SEO blade of yours). Unless you’re not interested in becoming an SEO Ninja. Then by all means, leave that blade dull, and skip out on all my advanced SEO combat training articles. Just know that if you take that route, your relatives will have a hard time suing me when you get slaughtered in the SERPs, okay?
Alan Bleiweiss has been an Internet professional since 1995. Just a few of his earliest clients included PCH.com, WeightWatchers.com and Starkist.com. Follow him on Twitter @AlanBleiweiss or read his blog at Search Marketing Answers.
Check out the SEO Tools guide at Search Engine Journal.
Chromatik: Enhanced Search Experience
Chromatik is an experimental search engine by Exalead that offers a fun way to search for images by color.
- Search images by any imaginable combination of colors;
- Add keywords;
- Adjust image luminosity and saturation.
Now let me demonstrate the tool in action…
Pick any color from the palette:

Pick another color and watch the results change:

Try adjusting the color combination by dragging the colors separators:

Try filtering images by luminosity:

Hover over any image and search for "similar images" (also, watch the palette change):

And now imagine that you can do the same AND filter the results by a keyword! Lovely, isn’t it?
Note: the only thing I didn’t get was how to find the image source and if the images are free to use. Could you help me with that?
Check out the SEO Tools guide at Search Engine Journal.
A Bad Day for Search Engines: How News of Michael Jackson’s Death Traveled Across the Web
Posted by Danny Dover
Update: Google representatives responded to complaints of the Google News delay with the following explanation:
"The spike in searches related to Michael Jackson was so big that Google News initially mistook it for an automated attack. As a result, for about 25 minutes yesterday, when some people searched Google News they saw a "We’re sorry" page before finding the articles they were looking for." – Source
First and foremost, let me extend my best wishes to the family and friends of Michael Jackson. I can only imagine the pain of losing a close friend and then having to watch it play out on a global stage. He made an extraordinary impact on the world and although not perfect, he is a teacher even in death (as evidenced by this post).
The following is a timeline of how the news of the Prince of Pop’s death traveled across the internet. Not all the times are exact (they might be off by up to 5 minutes) and not every source is included. All times are GMT.
From an internet marketer’s perspective, I found this story fascinating to watch unfold. I was impressed by the speed of information distribution and very surprised to see which site posted the news first. Wikipedia is still the fastest news aggregator. It was faster than Twitter and much faster than Google.
19:21 – One of Michael Jackson’s employee’s calls 911
The next forty-nine minutes are best described as the calm before the storm. The Los Angles Fire Department arrived at Jackson’s rented mansion in Bel Air and family members were alerted of the news.
20:10 – (Story Breaks) A small entertainment site called x17online.com breaks the story.
They post photos and a brief story a full 20 minutes before the much larger entertainment site TMZ.com posts the news. Information goes live on the internet. BOOM!
20:30 - TMZ.com posts "Michael Jackson — Cardiac Arrest"
TMZ.com posts the story on its homepage and the story is distributed to hundreds of thousands of people via RSS. My guess is they paid a pretty penny for the image above and it paid for itself ten fold with all of the links TMZ got from the story.
21:12 – Wikipedia reports Jackson’s Cardiac Arrest
A member of Wikipedia adds the news of the Cardiac Arrest to Jackson’s Wikipedia article. This is well before any other news or social media source.
21:20 – TMZ.com posts story of death
Report of Jackson’s death starts to show up on RSS feeds and eventually Twitter. It is 11 minutes before the first person clicks on a bit.ly link to TMZ.
21:30 – CNNbrk tweets that Jackson goes to hospital
The official CNN account tweets to its 2 million followers that Jackson went to hospital after suffering from a cardiac arrest
21:31 - First bit.ly link to TMZ story
The first bit.ly link about the story is clicked by someone which leads them to the TMZ article.
21:45 – Wikipedia freezes Michael Jackson page
After an explosion of edits to Jackson’s Wikipedia article, editors take the step of locking it down in protective status.
21:48 – Wikipedia first reports Jackson’s death
Wikipedia editors get enough evidence to post Jackson’s death.
21:50 – bit.ly link reaches high of 2,500 clicks a minute
Bit.ly link to TMZ hits high of almost 42 clicks a second.
22:03 – TMZ story on Jackson’s death is submitted to Digg
A bit late to the game, the story that would eventually go on to be one of the most dugg stories ever is first submitted to the site.
22:11 – TMZ story goes popular on Digg
The story is moved to the front page of Digg where its distribution erupts.
22:19 – "RIP Michael Jackson" tops Trends on Twitter
Story takes the next step and appears on Twitter’s Trends. Tens of millions of Twitter users now can see the story.
22:20 – MSNBC.com Confirms Jackson’s Death
One hour after the news of Jackson’s death hits the internet, the first mainstream news source publishes a confirmation article.
22:25 – CNN.com Confirms Jackson’s Death
CNN, out maneuvered by TMZ and MSNBC, confirms Jackson’s death.
22:34 – Approximately 2000 mentions a minute of Michael Jackson on Twitter
Mentions of Michael Jackson hit an all time high on Twitter with nearly 1,500 a minute. That’s almost 20% of all tweets at that time!
22:38 – Twitter starts to overload. First signs of the fail whale
Twitter starts to falter as a result of the massive spike.
22:40 - First stories of Jackson’s death make it on Google News
1 hour and 20 minutes after the story is first posted on TMZ, Google News starts to report the story.
22:46 – Google News Results of Jackson’s death start showing up on the results page for the query "Michael Jackson"
Google News results top the Google results page for "Michael Jackson".
22:58 – Googlebot crawls CNN twitter feed
Google starts returning CNN’s twitter feed in "Michael Jackson" SERP and provides link to cached version.
23:00 – "Michael Jackson Died" shows up in Google Trends
Google trends updates and show’s "Michael Jackson Died" as hottest trending item.
23:18 – 4chan.org goes down
4chan members temporarily overload servers. I mention this mostly because I find it really funny. ;-p
23:47 – "Michael Jackson Heart Attack" and "Michael Jackson Cardiac Arrest" show up as suggested search on Google Homepage for "Michael Jackson"
Indirect news of Jackson’s death (if someone types "Michael Jackson") shows up on Google’s homepage.
My Take Away:
Google has a really big problem and SEOs need to pay attention.
(Note: I choose Google rather than the other search engines because it leads them in all of the aspects I mention below. Everything I say about Google applies even more to the other search engines. I only have a basic idea of how difficult the technology problems are with the issues below. For better or for worse, I hold Google to a higher standard and I am not afraid to expect more.)
First, a little background information. I believe it was Ben Hendrickson who first mentioned to me the existence of three separate time priorities when indexing the web. He pointed out that the current version of Linkscape crawls and analyzes the slow moving web with a delay of about 4 weeks. (This is damn impressive given an index size of 54+ billion pages.) Blogscape (PRO Only) is much faster and aggregates the fast moving blogosphere of millions of feeds with less than 6 hours of delay. While impressive, we are still trying to catch up with Google and have started to run into the same wall as them. Sites like Twitter, have created a new real-time web. It is only in the order of perhaps hundreds of thousands of pages but indexing it is almost useless with a delay of more than a few seconds.
The events of Thursday demonstrated that Google is falling behind in the emerging real-time web. It was 3 hours and 17 minutes after TMZ first announced Michael Jackson had experienced cardiac arrest before it appeared as a auto completion suggestion on Google’s homepage. In the computer age that is a huge amount of time. It is 3 hours and 17 minutes during which consumers may choose to go somewhere other than Google to get the information they want.
As SEOs, we largely rely on the success of Google for our incomes. These are the same incomes that put food on the table for our families. It is easy to think that Google’s technology is flawless, after all, it really is incredible. However, it is experiences like the events of Thursday that reveal how truly vulnerable the search engines are.
For me it was humbling,
Teaser: SEOmoz does have a plan for the real-time web and we are excitedly working on it. More information to come in the future.
If you have any other story sources that you think are worth sharing, feel free to post them in the comments. This post is very much a work in progress. As always, feel free to e-mail me or send me a private message if you have any suggestions on how I can make my posts more useful. If that’s not your style, feel free to contact me on Twitter (DannyDover) Thanks!


