The following is a good place to get started with how to go about collecting basic information for monitoring and/or troubleshooting an App Fabric (AF) cache setup (Host/Cluster and clients). It starts by pointing to some resources on the different available logging features and then, via a sample scenario, goes over the decision taken to implement a logging solution. Additionally, it answers a frequent customer question on what are the basic recommended performance counters to collect.

What App Fabric Cache offers

Here is a quick walk through on how to better format and generate the log generation for ease of management, some of the links within are pre-release so you may refer to the following more updated ones, server log sink settings and client log sink settings. These should give a fairly good idea on the logging capabilities offered in AF Cache, please review the given links, as its knowledge will help further reading. With these concepts, the discussion and planning on what is the best suitable logging solution for your specific implementation can start.

A Sample Scenario

Assuming that memory pressure issues are a concern on the host side. The default event trace level of ERROR would not be enough as it would be necessary to have a more detailed sense of what objects are being cached on the host. This can be done by overriding the default host log sink to collect information level logs, enabling more detailed log analysis in the case of memory related errors. 5 different levels are given: No Tracing (-1), Error (0), Warning (1), Information(2), Verbose(3). In this sample, the Information level will be taken.

At that point the next decision will be to determine if the configuration setting should be performed via code or XML. In this sample, the organization decides that their Infrastructure personnel can handle the required changes via XML and no programmers will be required (no code needed) and hence the XML route is the simplest.

Next is the type of logging – as the same infrastructure team will also be analyzing the logs, a file-based log sink is agreed upon (versus console or ETW). ETW could also work but the simplicity of the AF Cache login was chosen instead (for the sake of this sample). Since the logs will be written into an existing central shared location on the network, the NETWORK SERVICE account is given rights to the share (in the case of a cluster, each host NETWORK SERVICE account will have to be added to have write access to this share). NOTE that at this point AF Cache cannot run as a Network account and access errors will be raised in the case the logs cannot be written.

In the case of a crash causing the logs to be overwritten, the process-specific character ($) is agreed upon and it is to be used within the log name. Also, the log generation interval is settled for every hour (dd-hh).

Similarly, since memory pressure on the webservers (AF cache client) is also a concern, the client logs sink needs similar changes. The final custom type attribute for client and host for the fabric object will then look similar to the following:

<customType

className=”System.Data.Fabric.Common.EventLogger,FabricCommon”

sinkName=”System.Data.Fabric.Common.FileEventSink,FabricCommon”

sinkParam=\\CentralLogs\\AFCache\\Server1-$/dd-hh

<!– For the client machines the log name are modified: sinkParam=”\\CentralLogs\\AFCache\\Client1-$/dd-hh” –>

defaultLevel=”2″

/> 

Logs are a good way to collect application specific or, as in the case above, scenario specific information that will allow ad-hoc or error-driven analysis. Similarly, collecting performance counters can give a window in the internal operations of not just the particular application (AF cache) but also the overall system.

Performance Counters

As such, customers often ask if there is a recommended set of counters that will help analyze the most common problems. The following is a list of the recommended performance counters to collect – the simplest way to use it:

1. Export an empty performance monitor (PerfMon) data collection set

2. Edit the resulting XML file with the list below on the counter and counterdisplayname object

3. Re-importing it into a Data Collector set template (see here for details on this operations)

Here is a link to further detail on the available performance counters for AF Cache

<Counter>\AppFabric Caching:Host\Cache Miss Percentage</Counter>

<Counter>\AppFabric Caching:Host\Total Client Requests</Counter>

<Counter>\AppFabric Caching:Host\Total Client Requests /sec</Counter>

<Counter>\AppFabric Caching:Host\Total Data Size Bytes</Counter>

<Counter>\AppFabric Caching:Host\Total Evicted Objects</Counter>

<Counter>\AppFabric Caching:Host\Total Eviction Runs</Counter>

<Counter>\AppFabric Caching:Host\Total Expired Objects</Counter>

<Counter>\AppFabric Caching:Host\Total Get Requests</Counter>

<Counter>\AppFabric Caching:Host\Total Get Requests /sec</Counter>

<Counter>\AppFabric Caching:Host\Total GetAndLock Requests</Counter>

<Counter>\AppFabric Caching:Host\Total GetAndLock Requests /sec</Counter>

<Counter>\AppFabric Caching:Host\Total Memory Evicted</Counter>

<Counter>\AppFabric Caching:Host\Total Notification Delivered</Counter>

<Counter>\AppFabric Caching:Host\Total Object Count</Counter>

<Counter>\AppFabric Caching:Host\Total Read Requests</Counter>

<Counter>\AppFabric Caching:Host\Total Read Requests /sec</Counter>

<Counter>\AppFabric Caching:Host\Total Write Operations</Counter>

<Counter>\AppFabric Caching:Host\Total Write Operations /sec</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\# Gen 0 Collections</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\# Gen 1 Collections</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\# Gen 2 Collections</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\# of Pinned Objects</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\% Time in GC</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\Large Object Heap size</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\Gen 0 heap size</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\Gen 1 heap size</Counter>

<Counter>\.NET CLR Memory(DistributedCacheService)\Gen 2 heap size</Counter>

<Counter>\Memory\Available MBytes</Counter>

<Counter>\Process(DistributedCacheService)\% Processor Time</Counter>

<Counter>\Process(DistributedCacheService)\Thread Count</Counter>

<Counter>\Process(DistributedCacheService)\Working Set</Counter>

<Counter>\Processor(_Total)\% Processor Time</Counter>

<Counter>\Network Interface(*)\Bytes Received/sec</Counter>

<Counter>\Network Interface(*)\Bytes Sent/sec</Counter>

<Counter>\Network Interface(*)\Current Bandwidth</Counter>

<CounterDisplayName>\AppFabric Caching:Host\Cache Miss Percentage</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Client Requests</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Client Requests /sec</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Data Size Bytes</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Evicted Objects</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Eviction Runs</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Expired Objects</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Get Requests</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Get Requests /sec</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total GetAndLock Requests</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total GetAndLock Requests /sec</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Memory Evicted</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Notification Delivered</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Object Count</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Read Requests</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Read Requests /sec</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Write Operations</CounterDisplayName>

<CounterDisplayName>\AppFabric Caching:Host\Total Write Operations /sec</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\# Gen 0 Collections</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\# Gen 1 Collections</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\# Gen 2 Collections</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\# of Pinned Objects</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\% Time in GC</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\Large Object Heap size</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\Gen 0 heap size</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\Gen 1 heap size</CounterDisplayName>

<CounterDisplayName>\.NET CLR Memory(DistributedCacheService)\Gen 2 heap size</CounterDisplayName>

<CounterDisplayName>\Memory\Available MBytes</CounterDisplayName>

<CounterDisplayName>\Process(DistributedCacheService)\% Processor Time</CounterDisplayName>

<CounterDisplayName>\Process(DistributedCacheService)\Thread Count</CounterDisplayName>

<CounterDisplayName>\Process(DistributedCacheService)\Working Set</CounterDisplayName>

<CounterDisplayName>\Processor(_Total)\% Processor Time</CounterDisplayName>

<CounterDisplayName>\Network Interface(*)\Bytes Received/sec</CounterDisplayName>

<CounterDisplayName>\Network Interface(*)\Bytes Sent/sec</CounterDisplayName>

<CounterDisplayName>\Network Interface(*)\Current Bandwidth</CounterDisplayName>

In summary

Both logs and performance counters collected together are the first step in being ready to analyze errors or monitor for specific concerns or conditions (i.e. memory pressure) for AppFabric Caching. Since this is a big subject, I will look into further exploring the reasons behind the performance counters recommendation in a future blog.

Author: Jaime Alva Bravo

Reviewers: Mark Simms, James Podgorski