How to hide your website from search engines
Generally there are many occasions you don’t like to appear your website or website pages on search engines such as google, yahoo, bing or other search bots. For example you are just developing a site and until the live version you want to hide it. Or you may have a development mirror site which you don’t like to appear in search results. There are few other occasions you want to hide your website temporarily or permanently. Let’s see how to do it.
The first 2 methods are common ways that respectable search engines like google, yahoo and bing responds where as we are unsure about other search bots. The 3rd method is something which totally blocks all search bots to read your site.
The simpler way is to create a text file called robots.txt and uploading it to your root directory. This instructs the search spiders that what they can read and what not. You can either block your entire website or some pages or some directories through it.
The below example shows you how to block the entire website from google and all other search engines.
User-agent: * Disallow: /
The below example shows you how to block some pages and directories.
User-agent: * Disallow: /store.htm Disallow: /private/ Disallow: /tmp/
Note: Google search console offers a feature to test this file, so test your robots.txt using those tools and ensure your robots file is readable and your rules are working as intended.
No Index Meta Tag
Writing No Index Method needs some basic programming knowledge or some HTML understanding. It needs to be done within the Meta tag feature. The programmers when they don’t have access over the root directory use this method. In some cases it is added as additional step when search engines are not responding properly to the robots.txt. However this is in page method hence you can’t block a folder or resources directory via this. We suggest you also use this method in conjunction with robots.txt for strong protection.
The below is the syntax which has to be added inside the Head Tag.
<meta name='robots' content='noindex,follow' />
Note: In wordpress you can see a setting called Search Engine Visibility under settings tab which simply does this job for you. It creates both robots file and noindex tag. Most popular CMS frameworks should have this feature built-in.
Note: Google still may crawl your websites in such cases like your website is well indexed for a period. Having links from external sites and few other reasons. In such cases you have to go with the http protection method.
It is a method that protects your website accessible only after a browser authentication. None of the search engines can pass through this method and read your website. However you and your developers/staffs must need to know the password and do it all time when you like to visit your website/pages. It needs some effort to get it done. In most shared hosting it is offered inside the control panel as a feature named as ‘Password Protect Directories’ or ‘Directory privacy’.
If you are in a self-managed VPS or dedicated server you set it via OpenSSL or Apache tools in linux systems. IIS offers this feature in its management console itself. To setup the password protection on linux servers read our blog Setting up Http Auth on Nginx/Ubuntu.