Discover the best of the web!
Learn more about Digg by taking the tour.
Google Takes on Hidden Text
ekstreme.com — All you Black Hat SEO Spammers with hidden text, beware! GoogleBot apparently requested a CSS File.
- 667 diggs
- digg it
- taron, on 10/12/2007, -1/+13Well, just a request doesn't mean a thing. Though it will sure make a difference in the future.
Show me an example of a banned site with hidden text with CSS :)- Breakpoint25, on 10/12/2007, -14/+7http://www.duggmirror.com
- canadianguy33, on 10/12/2007, -0/+19Better yet. Show me a real blackhat SEO spammer that actually worries about hidden text. :)
- AgentBirdman, on 10/12/2007, -0/+8Hmm. I wonder if the Googlebot will now ignore CSS image replacement used in headings.
For example, I use text-indent: -5000; to shove heading text outside the users viewport so I can replace it with custom image-based text as a background-image. - ibjhb, on 10/12/2007, -1/+3I have a feeling that Google has a separate set of bots walking the web that performs OCR/snapshots of the page and compares it to the HTML that GoogleBot downloads. Then Google would know if there if there was hidden or small text on the page. It would also help it detect if the server was showing a different page to GoogleBot vs the rest of the visitors.
- kalleanka, on 10/12/2007, -3/+11"As far as I am aware, this is the first time anyone has spotted Googlebot requesting a CSS file."
Don't flatter yourself to much pretty boy.
Of course you don't know if anyone else has looked in their log files for Googlebot requesting css files, do everyone in the world have to report to you as soon as they see Googlebot in their log?
That sentence just made me feel "this article is written by some 14 year old kid pretending he is some kind of big shot". - merreborn, on 10/12/2007, -0/+4"I have a feeling that Google has a separate set of bots walking the web that performs OCR/snapshots of the page and compares it to the HTML that GoogleBot downloads"
It'd be much easier and more efficient to just evaluate the CSS. - symbha, on 10/12/2007, -0/+3This is very likely what the end result, and original intention of Google branching the Mozilla source some months back.
Not only does it allow them to determine if a block is not visible, but it allows you to try give precedence to things that are visually larger on the page. An H1 styled larger is probably more important than an H1 that is not.
I would think it is also a long term strategy for attempting to deal with dynamic content generated using javascript, and ajax methodologies (which cause all kinds of problems for search.)
- riverrunner, on 10/12/2007, -2/+10well, hidden text is very useful for popup navigation menus - so one would hope they take that into account.
- Skitzzo, on 10/12/2007, -1/+10Yeah, CSS can be used to hide stuff legitimately but I don't think that's ever stopped Google before.
- tagawa, on 10/12/2007, -0/+1Yep, I've started using hidden text to stop comment spam (hide a form field with CSS so spambots can see it and human's can't - only allow comments if the field is empty). Google really needs to ensure that pages with valid hidden elements are taken into account.
On the other hand, I can't help feeling that this is a lot of fuss over nothing - I bet the Googlebot's been doing this for ages.
- tagawa, on 10/12/2007, -0/+1Yep, I've started using hidden text to stop comment spam (hide a form field with CSS so spambots can see it and human's can't - only allow comments if the field is empty). Google really needs to ensure that pages with valid hidden elements are taken into account.
- jimbocook, on 10/12/2007, -0/+19I suspect Google is trying to figure out the whole CSS/XML/Flash issue. It's going to be a major issue moving forward.
- eKstreme, on 10/12/2007, -1/+4The rarity of it all makes me think it could be experimental at this stage. That's just speculation though.
- Skitzzo, on 10/12/2007, -1/+3are they crawling any other external files that you've seen?
- eKstreme, on 10/12/2007, -1/+7I just updated the blog post: Javascript files were requested too! 71 times.
- diddy1, on 10/12/2007, -14/+1It's sad that this is news to you guys. Google has been banning sites with hidden text for years now. Besides there are way more ways a black hat can hide text.
Thank You- TheLD, on 10/12/2007, -0/+0So far they have been banning sites through user submitted reports and not by having a crawler (Googlebot or otherwise) looking at CSS files. This makes a whole world of difference.
- bleaknik, on 10/12/2007, -1/+1It would be incredibly amusing if there was hidden text in your comment...
- manicleek, on 10/12/2007, -0/+3No, google has been banning text that is hidden by making its colour the same as the background, it hasn't however been banning text thats made invisible by using css.
I can't see them banning sites for this though as sometimes it is necessary, e.g. using invisible text for links that use rollover background images for accessibility purposes - NJank, on 10/12/2007, -0/+1but in that case it would be contextually appropriate, and not spam.
- JoshuaH, on 10/12/2007, -2/+4http://duggmirror.com/design/Google_Takes_on_Hidden_Text
- bishop1847, on 10/12/2007, -0/+1I could see identifying a parent elements background color, seeing that it's color is #fff, and then noticing that any text in child elements is the same, but what's to keep someone from making a 1x1 gif white square? I don't think GoogleBot could be that intelligent.
Now, text-index: -9999px is another story...- Beaver6813, on 10/12/2007, -0/+1Yes, and it'll just make spammers add more div's with different colour backgrounds and text colours at the bottom to confuse the googlebot :)
- bishop1847, on 10/12/2007, -2/+2text-indent: -9999px, oops
- HitLines, on 10/12/2007, -0/+2This is old news. Wordpress was banned from Google for using CSS to hiding entire DIV elements: http://virtuelvis.com/archives/2005/03/wordpress-and-cloaking.
Google calls this cloaking: http://www.google.com/support/webmasters/#cloaking
Matt Cutts has also covered this: http://www.mattcutts.com/blog/communication-in-other-languages/
This is an easy way to get you banned: http://www.mattcutts.com/images/amazing3.gif
< div class=”indexKeywords”>bunch, of, keywords< /div>
.indexKeywords { display: none; visibility: hidden; text-index: -9999px;
}
- DenDen, on 10/12/2007, -2/+1This is so inaccurate! Anything referenced by robots.txt, or any html on your site is going to be hit by bots. Always has, always will. DUH! Quit trying to cause people to pee their pants over NOTHING!
- HigherLogic, on 10/12/2007, -0/+5So what happens if you disallow Google from your JS and CSS files?
- Beaver6813, on 10/12/2007, -0/+6It'll be vewy vewy angry and they'll write you a letter saying how angry they are.
/Team America Quote - DupeAHolic, on 10/12/2007, -1/+0@ HigherLogic
Then you win the prize!
- Beaver6813, on 10/12/2007, -0/+6It'll be vewy vewy angry and they'll write you a letter saying how angry they are.
- Archon810, on 10/12/2007, -1/+3grep -i makes it case insensitive, so no need for grep oogle.
- eKstreme, on 10/12/2007, -3/+1True, but I always seem to forget the command line options to do it right.
- Archon810, on 10/12/2007, -0/+1heh, it's the simplest and the most intuitive one.
- DupeAHolic, on 10/12/2007, -2/+4<script language="JavaScript" type="text/javascript">
document.getElementById('hidden_text_block').style.display = "none";
</script> - Shananra, on 10/12/2007, -1/+1So, how does he know this wasn't some funny web visitor that has Firefox set to identify itself as Google Bot?
I know it supposedly has a google IP, but that doesn't mean as much anymore. - f00xx0riz3r, on 10/12/2007, -1/+3Oh noes, googlebot downloaded a CSS-file. The end is near.
- toomuchpete, on 10/12/2007, -0/+1It's going to be easier to hide text than to programmatically find hidden text. Googlebot would need to fully render each page, including running scripts, in order to catch the various ways.
Hiding with javascript is a pretty easy thing to do... so is positioning a div behind a solid background or image. so is setting the div to a width of, say, 10x10 and turning overflow off. etc.
Google isn't going to be able to get rid of (decent) blackhat SEO's this way, but they'll be able to catch Mom & Pop's corner store who's hoping to get a little browser position on Ned & Janet's store across the street. - sicc, on 10/12/2007, -1/+2http://www.wallunitwarehouse.com/
Good example of keyword stuffing and ALMOST hidden text. They left just enough color so you can see it. It's slimey either way.- keitho, on 10/12/2007, -0/+1thats keyword stuffing, not hiding.
- jonnypyro, on 10/12/2007, -1/+8"I am currently fighting malicious bots that are using up the bandwidth of eKstreme.com."
hahahaha - Nanobe, on 10/12/2007, -0/+0It's possible that someone somewhere on some site decided to use a regular a href link to his stylesheet, or for some other reason Google thought that that CSS file might be a webpage itself (since extensions in URLs don't necessarily mean anything), and so Google went ahead and requested it. If this is the case, then Google would have seen the response header indicating that it is, in fact, just a CSS file, and Google probably dumped it and moved on.
I have a number of JavaScript files on my site, yet the only one Google has ever requested is the one that -- surprise, surprise -- was linked to using an a href link. - honds, on 10/12/2007, -0/+2It is probibally just part of Google code search (beta)... they might be indexing CSS files like they index CPP, C, Java, Javascript etc.
I know CSS isn't a programming language but hey, it is more info Google can index.- merreborn, on 10/12/2007, -0/+1That's exactly the thought I just had.
Google code search has about 400 CSS files atm.
http://www.google.com/codesearch?hl=en&lr=&q=file%3A.*%5C.css%24&btnG=Search
- merreborn, on 10/12/2007, -0/+1That's exactly the thought I just had.
- pcx99, on 10/12/2007, -0/+2Of course Google could be putting together screen shoot thumbnails to keep up with what the competition is doing (snap for instance) to do a screen they'd have to pull all the files your web browser would.
- diggymcdigger, on 10/12/2007, -0/+0Some coverage back in March from SearchEngineWatch.com on Googlebot possibly requesting these types of files:
http://forums.searchenginewatch.com/showthread.php?t=10542 - pu43x, on 10/12/2007, -0/+1so to battle bandwidth problem their site is then dugg! haha
very interesting read - eurokc98, on 10/12/2007, -0/+1Im sure Google takes all this into consideration as wil have filters to adjust accordingly.
- qode, on 10/12/2007, -2/+1Just a thought, doesnt google download the CSS for their cache? Hmmm!
- keitho, on 10/12/2007, -1/+1google has been banning sites for years. hidden text is explicitly stated as a violation of their ToS.
http://www.google.com/support/webmasters/bin/answer.py?answer=35769 - bbqplate, on 10/12/2007, -1/+0good, im glad these spammers get caught. that means more money to people who actually care about providing content to users.
- allcdnboy, on 10/12/2007, -0/+1A large portion of big sites use CSS navigation where the subnav is hidden. I really don't think that google would ignore this. Unfortunately, this is a bit harder for google to smoke out.
- adulion, on 10/12/2007, -0/+1yous arnt that good if you guys are worried about this
- DeaPeaJay, on 10/12/2007, -0/+0I feel pretty safe for the time being. If it does ban those sites it will probably be those with content that is set to display: none; I'm using padding-top: 30px height: 0; overflow: hidden Just shoving the text outside the containing element instead of explicitly hiding it.
But besides all that, one would hope that before banning a site for doing this, they would get a set of real human eyes to take a look at the page to see if it's truly being malicious. It's highly unlikely that google will ban sites for simply trying to make content accessible. I doubt, or sincerely hope, that it's not simply an automated process. - DeaPeaJay, on 10/12/2007, -2/+0The other thing is that google has been consistently getting the CSS from sites, it's nothing new at all. They use it to generate the cached versions of the page! It's very commonplace. I'm burying for inaccuracy.
- eKstreme, on 10/12/2007, -0/+3The cache sets the HTML tag "BASE" to be the root URL of the site from which the page is cached. For example, if you look at the source code of this cached page:
http://209.85.135.104/search?hl=en&q=cache%3Aekstreme.com&btnG=Google+Search
you'll see the following tag:
http://ekstreme.com/" >
(I added spaces to avoid Digg's filter)
Google doesn't download the CSS for the cache - it leaves it up to the browsers to fetch it. So the original article is not inaccurate!
- eKstreme, on 10/12/2007, -0/+3The cache sets the HTML tag "BASE" to be the root URL of the site from which the page is cached. For example, if you look at the source code of this cached page:
- diggster99, on 10/12/2007, -0/+0I hope by hidden text it isn't over-done and effect faded text, if you look at what is allowed for color combos on adsense ads, at times it's a bit confusing as to the 'line' drawn. i.e sometimes the combination is definitely visible enough esp given bg. But then they don't allow it. If they were to apply those same 'standards' this could be a problem.
- zyko, on 10/12/2007, -0/+1Google penalties only hurt dumb/naive webmasters and what I call "peasant" websites. The real SEO players have workarounds for Googles little "anti-spam" games. Googles algo is mainly penalty based so this is not surprising. The real problem is disclosure of their penalties. Imagine if laws were not documented by the government and nobody knew what they were even convicted for - this is Googles world. Of course being a private company with a sizable market share, they can play by any rules they want.
Either way gaming Google only goes so far - the real algo is based on reputation. Spammers can only hope to dominate keywords that authority sites don't target. Digg could use all kinds of black hat tricks but their reputation as a legit website negates any backlash. Same reason why known sites like webmasterworld can cloak every page yet have that cloaked page show up in Google as if its freely accessible.
Also Gaming Google is not spamming. Spam is unwanted, unrelevant and misleading. Gaming is a perfectly acceptable result of capitalism - regardless of what Google claims. There are millions of perfectly good high quality websites Google ignores because they do not follow Googles rules of play or have a qualifying reputation. - michaelhood, on 10/12/2007, -0/+1Seems like the safe thing to do for people who are using CSS to hide elements (legitimately or otherwise) would be to serve up an empty CSS file to Googlebot.
Moving forward - who knows how this will affect things, though.
If they are able to start evaluating layouts, page elements, fonts, etc. for quality to factor these things in ranking.. that would be pretty interesting.
I would love to see them move away from almost solely ranking pages based on inbound link metrics. This makes it nearly impossible for brand new sites to rank, even though they are often the most relevant page for a search.
All of this aside, all search engines will be fighting a constant war against the SEOs. - allonline, on 10/10/2007, -0/+1Best Practice for Small Business - Persuasive Copywriting SEO. If you'd like to pull in more sales and more profits with every web marketing piece you create, check out this powerful promise from Bob Serling, the leading expert on high-profit, low-cost marketing...
click to continue http://www.copywritingtip.com/ - justsearchinguk, on 02/12/2008, -0/+0My company buys links from allsorts and thats the way to do it. Just buy your links. Google are just a bunch of twatty idiots, like most of you actually.
- gmi123, on 07/18/2008, -0/+0More seo resource
http://www.webmarketingindia.net/wordpress/
Digg is coming to a city (and computer) near you! Check out all the details on our