flat%20tireHow many times have you heard the phrase “SharePoint is slow!” or SharePoint isn’t working properly!” or “Why didn’t we catch that memory leak!”or “Why didn’t we know we were running out of capacity?”

For many, SharePoint’s normal performance profile and capacity plan isn’t established or well documented and communicated. As a result, assessing performance and capacity is more of a guessing game and speculation. A consequence of this is a breakdown of trust between IT and the user community – people begin to talk and point fingers.

Questions you might be asked or asking yourself:

  • What is acceptable performance?
  • What is the planned capacity for the life of the service?
  • How do I establish standards for normal performance that are based on fact?
  • How do I maintain consistent performance? Tools, process, staffing and skills, policy required?
  • How do a qualify/quantify a performance problem? Where do I look? How can I prove there is a problem or not?
  • How do manage client expectations?

Without the answers, SharePoint Administrators will spend (waste) much time defending their environments performance. It’s the difficult task of managing expectations with no credible supporting material or factual information. Add a lengthy list of customizations and multiple farms and the problem grows exponentially.

So where do you start? Create a timeline for capacity by answering the following:

  • What is the planned/predicted user growth? Think about current trends, regions coming on board, type of users such as external (staff, clients, contractors) and their use case(s).
  • What is the planned data growth? What type of data will users access (e.g. documents, images, videos)?
  • How can we scale the current infrastructure? Who do we work with? lead time required? What’s the departments reputation? Is it outsourced? Reporting  and trending?
  • What is the impact of the 3rdparty tools we have installed if any? What is the impact of the customizations (Is documentation and code in source control?)?
  • Don’t under estimate the complexity of collecting this information, its a mix of science and art (guessing), its something you will revisit quarterly.

IBM has written many documents on this topic, I utilized their ideas in developing a capacity planning framework while working for a Toronto based vender back in the 90s. Here is a link to some great reading http://www-03.ibm.com/software/products/en/category/it-service-management. Also, here is an example report that can help you as well.

With the aforementioned information you are now ready to create profiles and test plans:

  • Case studies from Microsoft
  • Overview of .Net performance testing on patterns and practices site
  • Create user profiles (Think about your user community, what they do – Tasks – with SharePoint)
  • You must start somewhere and refine over time – avoid analysis paralysis
  • Create test cases that match your user profiles (Browsing, searching, editing a list, document upload and download, create and delete site)
  • Determine which counters you need to monitor to understand your farms (and its components) behavior and performance under load – note counters are only half of the equation, real thresholds developed as a result of baselining are the other analysis and actual incidents
  • Leverage the Performance Analysis Tool (PAL) kit
  • SharePoint and SQL counters to monitor
  • Create test scripts with your load testing tool that follow the user profiles
  • Some recommendations and guidelines from the Visual Studio Team
  • Create load scenarios for 250, 500, 1000 users, perhaps more depending on your specific requirements
  • Configure the load scenarios to run for 2, 4 and 8 hours so that you catch memory leaks, product bugs and or run away cache
  • Note these scenarios will help you find bad code (memory leaks), issues with configuration settings and where capacity must be added as the farm (and its components) saturates

Once the profiles and test plans are completed, you will require a QA environment that mimics production as closely as possible. The reason for this is to eliminate any risk associated with deviations from production that would negatively impact the quality. For example, your QA environment might consist of the following:

  • A GB network with VLANs configured accordingly
  • A network latency generating device (LANForge)
  • An SQL Active/Passive Cluster
  • Two SharePoint WFEs
  • One SharePoint Index Server
  • Load testing workstations/appliances/Software (Visual Studio)

Once you have your tests and QA environment ready, the fun begins. That’s the subject of “SharePoint Performance and Capacity Baselining – Part 2” that will be posted next month.