I feel it would be a good idea to have a thread with common answers to SEO problems, as often the same questions can be asked again and again, and it's nice to have a central resource of information.
I've got started with a few questions and answers, but hopefully this can become an interactive thread, with both questions and answers submitted and added to the FAQ itself. Looking forward to your ideas!
THE SEO FAQ
What is PR, and how important is it?
Pagerank, often abbreviated to PR, is a term you'll hear a great deal in the SEO world. Briefly put, it is a numerical value calculated by Google, which is a measure of the incoming links to a particular page. The more incoming links you have, and the higher the PR of these incoming links, the greater the PR of that particular page will be. It's important to realise that PR is calculated on a page by page, not site by site basis, so even if a site's homepage has high PR, if the page linking to you is low PR, the link will only pass a small amount of PR.
The actual formula for calculating PR is no mystery, as it was part of a published paper by the Google founders, Sergey Brin and Larry Page (for whom PageRank is named!). The formula is:
- PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
where PR(A) is the PageRank of a page A
PR(T1) is the PageRank of a page T1
C(T1) is the number of outgoing links from the page T1
d is a damping factor in the range 0 < d < 1, usually set to 0.85
This looks a bit daunting, but basically all it's saying is that the PR of page A, PR(A), is initially set at 0.15. We then add to that figure for each incoming link, taking into account the number of other links that page has. The basic idea to take from this, is that a link that passes lots of PR is one from a high PR page with few links.
Now it's important to understand that the PR you know and love, the green bar on the top of your browser is not a great indication of the actual PR of a particular page. Actual PR is calculated by Google daily, and can theoretically range from 0.15 to infinity for any given page. The PR you see in your browser is this value expressed logarithmically, as a proportion of the highest PR page on the web. So a PR3 page could range, for example, from 1000-10,000 actual PR units. The next level up, PR4, might be 10,000 to 100,000 units. These are obviously arbitrary figures I made up, but they show how inaccurate the green bar in your browser is. And it gets worse...the toolbar PR is only updated by Google every 3 months or so, so the value you see was not only an approximation to start with, but it's now a 3 month old approximation.
A common misconception is that PR can only be passed by external links, but internal links are just as important. Managing the flow of PR through your site can be of great help in improving rankings.
You might have been able to guess by now, but my final sentiment is that PR is pretty much irrelevant in how well a page ranks. There are hundreds of factors that affect a page's ranking, and PR, in it's raw form is a very very small part of this.
What is duplicate content, what are its consequences and how do I prevent it?
Duplicate content is pretty much what it says on the tin, content which is so similar that it is deemed, by a search engine, to be an exact duplicate. The measures used by search engines to determine this are unknown, but it's best to take a common sense approach. If on reading the 2 pages a human would think they were copied from one to the other, they're likely to be seen as duplicates.
What are the consequences of duplicate content? Well, Google have stated that if they come across duplicate pages, they will make a judgement as to which is the “original,” and which is the “duplicate.” How exactly they do this we don't know, but it's likely to be based on when they first came across the page, and the number/quality of incoming links to each page. The original will be indexed and ranked as normal, with no penalties, the duplicate will be put in the supplemental index, and will stand no chance of ranking.
It's obviously important to ensure that our pages don't end up in the supplemental index, so how can we stop this. The first step is to make sure we don't rip content from other pages, and don't let other page use our content. If another page isn't indexed, you could theoretically use their content, and get away with being original, but to me this isn't morally acceptable, and is likely to breach copyright laws.
The next step is to ensure we don't have duplicate content issues within our own site. This can commonly occur when dynamically generating pages on a database driven site. While Google will probably index one of these pages, without penalty, the problem comes when we think of incoming links. If I have 5 duplicate pages, then people could link to any of these 5. However, only the links to the “original” will be counted by Google, so we're missing out on 80% of our incoming links, and this is with only 5 pages! The key is to follow good design principles, and use rewritten URLs and redirects where necessary.
Should I use absolute or relative URLs?
Absolute URLs (
http://www.mysite.com/page1) have no implicit advantage over relative URLs (/page1) in terms of SEO, but they are still the preferred option, for two reasons.
1) At some point in the future the page on which the link appears may change location. If you use relative URLs the link will now be wrong, whereas with absolute URLs it will still work.
2) It's unfortunate, but for content writers there is always a worry of having your content scraped. Using absolute URLs means that not only are the links on the scraped content not pointing at the scraper's pages, they are in fact going to be incoming links to your page! This is hallmark of scraped pages, and SEs undoubtedly use this to differentiate between original and scraped content.