It was over a year ago that I wrote this first blog and until now didn’t have time to write part II due to my SharePoint 2010 project. With the project done for now, I finally had some time to write this over a coffee on the weekend.

We left off in the prior blog talking about the importance of performance and capacity management and provided an overview of what’s required. In this blog I will cover the test plan, data set, test process and outcomes.

We set out to achieve some lofty goals, build a large farm, run lengthy performance and resilience tests and document the results. With a decent budget we were able to build a physical farm with four WFE, two index machines and an active passive SQL 2008 R2 cluster.

Our plan was to create a 5 TB dataset that consisted of enough data that it replicated example production environments as closely as possible. We even through in some large lists on a few sites across each site collect and made sure there were multiple versions of pages and documents. The dataset from a summary level consisted of the following:

  • 40 web applications
  • 500 sites within each web application
  • Upload word documents 5, 10, 25 and 50 MB in size to each site

We then created several test cases using Visual Studio 2010 that consisted of the following scenarios:

  • Page browsing
  • Searching for artifacts
  • Document uploads 5, 10, 25 and 50 MB
  • Document downloads 5, 10, 25 and 50 MB
  • Submit InfoPath forms
  • Create a site collection and sub sites
  • Create a web part
  • Triggering workflows

Each test case was executed for the following user counts and run for (4) four hours to make sure there were no quirks with runaway cache or other settings:

  • 1 user
  • 250 users
  • 500 users
  • 1000 users
  • 1500 users
  • 2000 users

Determine which counters you need to monitor to understand your farms (and its components) behavior and performance under load – note counters are only half of the equation, real thresholds developed as a result of Baselining and other analysis and learning’s from incidents. We also leveraged the Performance Analysis Tool (PAL) kit as well.

In the end the results provided from testing gave us visibility into how well our model farm performed, where the bottlenecks would be so we knew how to scale for additional capacity. Additionally, it provided information that would help with troubleshooting problems since we now had a profile of the farm and servers running in  a healthy state for reference purposes.