Over the years, the Meritus team has had the opportunity to troubleshoot and fine-tune performance for several customers (large and small) using off-the-shelf commercial web applications running in ASP.NET and the Microsoft Stack.
Performance tuning projects can vary from a few days to weeks worth of work of a dedicated team, with several factors coming into play:
- Performance expectations.
- Application complexity.
- Number of components (database server, cache server, web servers, load balancers, etc).
- Ability to access production servers to gather metrics, review logs, etc.
- Ability to tweak or even change application behavior.
- Ability to upgrade hardware or change network components / topology.
- And last, but not least, customer sophistication.
Irrespective of the size of the problem (and, by extension, the performance tuning project), performance tuning requires a methodical and iterative approach, which we will review at a high level.
Performance Tuning Process
Understanding the problem
The very first phase is to understand the problem so that there are no ambiguities in terms of what is possibly wrong, what are the expectations (are they realistic?) and, possibly rule out easy to fix issues.
Effective communication and making the right questions is key to establish a trust relationship.
Possible questions to ask:
- What is the perceived performance problem?
- When did the problem begin?
- Is the problem circumscribed to a particular application area or functionality?
- Is the problem perceived by all users, a subset of them?
- Is there a pattern? Maybe a temporary pattern such as “every close of month” or “every day around noon” or functionality pattern such as “whenever process x is executed”.
- Are there batch jobs, ETL processes, reports interacting with the same data source?
- Have there been environmental changes? Software updates, new security policies, network topology changes.
- Is hardware shared? Think virtualized environments, where different applications may be fighting for the same resources.
- Are there other applications having performance issues as well?
Gather Evidence (Logs)
Windows Event log, IIS logs, and other application-specific log files can provide extremely valuable contextual information. A very useful tip is to import logs into a database so you can play with data running queries with the full capabilities of the SQL language — this is a great way to find macro-level patterns, perform time-series analysis (by date, by hour, etc), or find outliers (i.e: longest-running HTTP requests or SQL queries).
The below SQL query can be used to import IIS W3C logs into a table in SQL server:
BULK INSERT [dbo].[IIS_W3C_LOG]
FROM ‘C:\Temp\iis_w3c_log.txt’ –Replace with actual file path here
FIRSTROW = 0,
FIELDTERMINATOR = ‘ ‘,
ROWTERMINATOR = ‘\n’
dbo.IIS_W3C_LOGS table definition:
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
CREATE TABLE [dbo].[IIS_W3C_LOG](
[DATE] [date] NULL,
[TIME] [time](7) NULL,
[s-ip] [varchar](16) NULL,
[cs-method] [varchar](8) NULL,
[cs-uri-stem] [varchar](255) NULL,
[cs-uri-query] [varchar](2048) NULL,
[s-port] [varchar](4) NULL,
[s-username] [varchar](16) NULL,
[c-ip] [varchar](16) NULL,
[cs(User-Agent)] [varchar](1024) NULL,
[cs(Referer)] [varchar](4096) NULL,
[sc-STATUS] [int] NULL,
[sc-substatus] [int] NULL,
[sc-win32-STATUS] [bigint] NULL,
[time-taken] [int] NULL
) ON [PRIMARY]
Measure Baseline Performance
Although having the end-user perspective first-hand is extremely valuable, we should strive to find objective evidence that backs the perceived issue. It is not uncommon for end-users to have unrealistic or vague perceptions of what performance is, sometimes even confusing performance with application errors or other unrelated environmental issues (slow internet connection from mobile device, for example).
The purpose here is to have objective metrics that possibly reflect the issue as perceived by the users. As the performance tuning process moves forward, these metrics will allow the team to understand if changes are making a positive change.
Elaboration of Hypothesis
Based on the information gathered in the previous steps, the team will come up with certain hypothesis, that will be reviewed with the customer to elaborate an action plan. We should prioritize actions that have the highest chance of making a tangible contribution to the problem.
Looking for patterns and correlations is an extremely valuable tool during this phase. For example, high CPU/Memory usage at the web server level and database server level at the same time may point to database locks.
See charts below extracted from a real-world scenario — note how IIS Server showed a clear spike in Memory Usage, while at the same time SQL Server showed a spike in CPU Load — this was later correlated to a database lock due to a particularly vicious (generated) SQL query executed in production:
Another similar real-world example below, with a different set of metrics — Also chased down to a DB Lock.
Make a Change
It is very important to make one change at a time. This is the only way we can tell if a given measure is having the expected outcome. A change can be as simple as a tweak in a config file or more involved such as upgrading hardware on a SQL server, or fixing a particularly bad-performing code section or functionality.
Measure Performance After Change
After a change has been made, we need to test our hypotesis, and see if we have moved forward or not. We should gather the same metrics that we gathered for our baseline performance, for an equivalent period.
Once we have measured performance after the change, we need to evaluate key metrics we set out to improve and see if there was a positive change, while at the same time ensuring we have not made other metrics worse — remember that often times performance tuning is a balancing act. If we have reached our goals, the problem is resolved. If not, we need to go back to the Elaboration of Hypothesis phase, and repeat the process.
In our next post in this series, we will review specific SQL Server and IIS Performance metrics.