When a person sets up a web page there are certain things that they will eventually have to educate themselves about.
One of these things, assuming they want an audience, is how a search engine is able to find them on the Internet.
Of course they find you through the power of links from other web pages.
But you might ask yourself, how do they keep my page updated in their servers?
How do they know to check as soon as I change something?
The answer is a simple one.
Search Engine Bots
They have what is known as bots traverse through your site and read the information.
Sometimes these bots, which are supposed to be helpful, can cause damage to a person’s web site.
Before I go any further I guess we should explain what a bot is exactly.
A bot is an automated program that performs an action.
It doesn’t matter what action, just as long as it does it without any human interaction except to initiate it the first time.
Both the good guys and the bad guys use bots to get their jobs done.
Search Engines are an example of the good guys using a bot.
They are there to make sure that the search engine has updated information.
To do this, the bot has to visit the web site multiple times a day, maybe even several times an hour.
Once they see an update, it then reports the data back to it’s main server.
All of this parsing of a website causes bandwidth to be used.
If you do not want the bandwidth used, you can restrict a bot from visiting your site (via robots.txt which is beyond the scope of this article).
If the search engine is legitimate they will not scan your site anymore and respect the rules of your ban.
Bad Robots Can Look Like A DDoS Attack
If you receive too many scans a second from a bot this can cause the site to simulate a DDoS attack.
The people who own the site will most likely think that they are under attack at first.
This is because their users will not be able to access the site and its resources.
Or it will go very slow and become unusable.
It will take the administrator of the site, or the IT guys where the site is hosted, to figure out what is going on.
Once they read the log files, they will then see where the traffic is coming from.
If it is a misbehaving search engine bot then the IP address can found then be blocked and your site will go back to normal.
This has happened several times, most recently to the Perl’s CPAN testers web site.
They were being bombarded with bots from Microsoft’s search engine, Bing.
They first thought that there was an attack happening.
When they checked the logs, they saw the culprit was actually a bot from Microsoft.
Finally, they blocked the address and their site went back to operating like normal.
A search engine behaving like this is not normal.
9 times out of 10 it is a good thing that a search engine indexes your site often.
But sometimes it can cause problems.
If your site is experiencing a DDOS like attack check your logs and make sure that it is not a bot from a search engine gone rogue.