Sizing your index for MOSS 2007 isnt as tricky as it once was. SPS 2003 tackled indexes in one fell swoop while MOSS 2007 takes an incremental approach.
The rule of thumb with SPS 2003 was 30-50% depending on the content you indexed and the number of IFilters you installed (allows the crawler to see inside documents). So if you had audio, video, archives, ZIPs, PDFs, MDB, MPP, MSG, VSD, GIF, JPG, PSD, CAD, WAV, MSI, EXE – the crawler wouldn’t index their contents. Also, files over 16mb are not indexed by default. What does this mean? Know your data.
How do I get to know my data? You need an information management plan and policy. This document(s) would cover topics such as:
- Organizational data – types, function, format etc…
- Records management – FilePlan, policy, retention etc…
- Data usages and storage policy and etiquette
- User training plan
- Governance plan
- Technology plan
Microsoft has published performance and sizing information on their site at http://technet2.microsoft.com/Office/en-us/library/5465aa2b-aec3-4b87-bce0-8601ff20615e1033.mspx?mfr=true. According to their study the following are the basic steps:
- Step 1 – Calulate Index Size:
Index size = Average size of document * number of documents * 4 x 10-10 GB
- Step 2 Calculate Disk Size based on Index Size
Size of data crawled = Y
Size of index on index server = a range of 5% through 12% * Y = X
Initial disk space = 2.5*X
Seems simple enough right? Well if you consider the following wildcards, the water gets muddy:
- Disposition schedule – how long do you retain?
- Versions – how many and how long?
- Recycle bin – how much and how long?
Note that this equation is intended only to establish a starting-point estimate. Real-world results may vary widely based on the size of documents being indexed, and how much metadata is being indexed during a crawl operation.
Storage companies will love MOSS 2007 and especially those clients that plan on using Records Mgmt, Versions and Recucle Bin with no limits.
Joel has a great blog that covers this topic http://blogs.msdn.com/joelo/archive/2006/12/29/search-and-index-sizing-and-planning-real-world-data-from-msweb.aspx