- Overview – http://bobmixon.com/BLOG/archive/2007/03/27/moss_2007_planning_search_services.aspx
- Performance and capacity – http://technet2.microsoft.com/Office/en-us/library/5465aa2b-aec3-4b87-bce0-8601ff20615e1033.mspx
- Creating Best Bet programatically – http://blogs.technet.com/stefan_gossner/archive/2007/03/28/how-to-create-keywords-and-best-bets-for-moss-search-programmatically.aspx
- Microsoft Download – http://download.microsoft.com/download/2/0/d/20de7b70-5194-47c1-bbad-df98106fcde8/Search%20In%20Moss%20-%20Under%20The%20Hood.pdf
- The good the bad – http://blog.mondosoft.com/ontolica/archive/2007/01/31/The-good-news-and-the-bad-news-on-SharePoint-2007.aspx
- Crawl rules – http://technet2.microsoft.com/Office/en-us/library/785be8d8-b8b7-4584-a178-c37cb22965db1033.mspx?mfr=true
- Add Content Source – http://technet2.microsoft.com/Office/en-us/library/78dd0d13-c23b-4bbf-851b-287df0fa37091033.mspx
- IFilters – http://technet2.microsoft.com/Office/en-us/library/c6c029bd-64ea-4617-b2da-c269f13599e21033.mspx
- Index the BDC – http://blah.winsmarts.com/2007-4-SharePoint_2007__BDC_-_Enabling_Search_on_business_data.aspx
Planning search for your MOSS 2007 project?
Here are some great reads…grab a Venti and enjoy!
Louis has posted a high level background research checklist on his site http://louisrosenfeld.com/home/.
In his blog “Background research checklist” dated Apr 13, 2007 he provides a high level checklist of background research documentation that’d help get the client planning and esign on the right track.
Essentially, any project that involving content and information requires a model – Information Architecture to be effective. Image if the phone book wasnt organized the way it was. How would you find phone numbers?
In the case of SharePoint, the following docuemnts are required:
- Governance Model
- User Adoption Plan
- Information Model
- Information Architecture
- Content Management
- Systems Architecture
- Capacity Plan
- Operation Plan
As ai said to one client, these dont have to be lengthy documents…they could be one document. Atleast you’ve began documenting your design…think of it as a body of knowledge you can lean on to justify your plan and design.
Microsoft has published some great templates to help speed up the process or creating sites.
The templates are categorized by:
- Site Admin Templates
- Server Admin Templates
Theres 40 in all and they are available for download from http://www.microsoft.com/technet/windowsserver/sharepoint/wssapps/templates/default.mspx
I just completed reading this book authored by Jeanne W. Ross, Peter Weill, and David Robertson. This book takes the marketing hype out of SOA and the marketing spin that most large companies place on Architecture.
Ross, Weill, and Robertson arrived at their conclusions after rigorous and extensive research which revealed what certain top-performing organizations do and how they do it. In this volume, they share what they learned so that other organizations can be guided and informed in their efforts to improve their own performance. More specifically, they respond to questions such as these:
- What are the most common symptoms (“warning signs”) of an inadequate foundation for execution?
- Which three disciplines must be mastered in order to build one which is solid?
- What are the key dimensions of an appropriate business model?
- How to implement the operating model via enterprise architecture?
- What are the four stages of enterprise architecture development and how must each be navigated?
- What are the specific benefits during the implementation of the enterprise architecture?
- When establishing a foundation for execution, why is it best to build it “one project at a time”?
- How can – and should – enterprise architecture be helpful when outsourcing?
- How to leverage its foundation for profitable growth?
- What are the “Top Ten Leadership Principles” for creating and exploiting a foundation for execution?
Storage requirements are a big topic for the IT infrastructure side of the business. Sizing MOSS 2007 is a challenging task in a large global organization. Both Microsoft and HP have run MOSS 2007 in their labs and come up with some interesting numbers.
The following table shows the type and number of documents crawled. (Documents were 10 kilobytes (KB) to 100 KB in size.)
- The Index Server configuration was as follows:
- 4 dual-core Intel Xeon 2.66 GHz processors
- 32 GB RAM
- 40 GB for the operating system (RAID 5)
- 956 GB for the content index and the operating system paging file (RAID 10)
The following is a summary of the content profile:
- Content on SharePoint sites – 10 million items, including the following:
- 420 site collections
- 4,000 sites
- 24,200 lists
- 47,780 document libraries
The following table shows disk space usage.
- Index size on query server – 100 GB*
- Index size on index server – 100 GB*
- Search database size – 600 GB
* The tested index sizes are smaller than what might be observed in a production environment. In the test-generated corpus, the number of unique words is limited and often repeated.
What I find interesting is the performance – a real eye opener for some – quote from Microsofts Estimate performance and capacity requirements for search environments.
“The time to perform a full crawl during testing was 35 days (approximately 15 documents per second). Note that these test results were observed in a production environment where network latency and the responsiveness of the crawled repositories affected crawl speed. Crawl speed measured by documents per second might be significantly faster in a pure test environment, or in environments with greater bandwidth and greater responsiveness of crawled repositories.If two percent of a corpus of the size used in the test environment changes, an incremental crawl to catch up with the changes takes approximately 8-12 hours, depending on latency and the responsiveness of the sites being crawled. Note that changes to metadata and outbound links take longer to process than changes to the contents of documents.”
The long and the short of it is this, you must understand your data, use factual numbers to calculate the size of your corpus and be prepared to size your storage according. For those that take the low road, pain is sure to follow as the servers and storage systems grow exponentially – almost out of control.