Caucho maker of Resin Server | Application Server (Java EE Certified) and Web Server


 

Resin Documentation

home company docs 
app server 
 Resin Server | Application Server (Java EE Certified) and Web Server
 

health meters


Health meters are a simple way to create visually pleasing graphs in /resin-admin.

Configuration

health.xml

Meters are configured as part of health.xml using CanDI to create and update Java objects. Refer to health checking configuration for a full description of health.xml. Resin 4.0.17 and later includes a full compliemnt of pre-configured JMX meters in health.xml.

Example: importing health.xml into resin.xml
<resin xmlns="http://caucho.com/ns/resin"
       xmlns:resin="urn:java:com.caucho.resin">
  <cluster-default>  
    ...
    <!--
       - Admin services
      -->
    <resin:DeployService/>
    
    <resin:if test="${resin.professional}">
      <resin:AdminServices/>
    </resin:if>

    <!--
       - Configuration for the health monitoring system
      -->
    <resin:if test="${resin.professional}">
      <resin:import path="${__DIR__}/health.xml" optional="true"/>
    </resin:if>
    ...
  </cluster-default>
</resin>

Note: <resin:AdminServices/> (or more precisely just <resin:StatsService/>) is required to support health meters and graphing.

Meter names

Health meters are named using a concatenation of keys separated by pipe (|) characters, loosely organized from least specific to most specific. Since meter statistics are shared between each member in a Resin cluster, Resin will automatically prefix each meter name with the cluster node index to insure the name is unique between cluster members.

The pipe character in the name provides a secondary benefit of helping to enhance the /resin-admin UI by categorizing meters into drill downs. Consider the following example.

Example: meter naming
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">

  <health:JmxDeltaMeter>
    <name>JVM|Compilation|Compilation Time</name>
    <object-name>java.lang:type=Compilation</object-name>
    <attribute>TotalCompilationTime</attribute>
  </health:JmxDeltaMeter>

</cluster>

In this example, JVM|Compilation|Compilation Time provides the base of the name. For cluster node index 0, Resin prefixes the name with 00|. /resin-admin will then use the cluster index and first two keys to create drill downs to logically organized meters for display.

Graphs: 00|JVM|Compilation|Compilation Time, Time:1 Hour

JMX meters

Virtually any local numeric JMX MBean attribute can be graphed using JMX meters.

Statistical Analysis

Detecting Anomalies

Meters alone are useful for manual inspection in resin-admin since every meter can be graphed. However Resin provides an extremely useful automatic analysis tool called AnomalyAnalyzer. AnomalyAnalyzer looks at the current meter value, checking for deviations from the average value. So unusual changes like a spike in blocked threads can be detected.

Standard Anomaly detection

The default health.xml configures some general anomaly analysis. In general, anomaly detection can tell you when something went wrong in the server. It looks for unusual spikes of behavior by recording an average baseline for a value and then looking for deviations.

  • "File Descriptor Count" - counts open files and open TCP sockets. Examples include denial of service attacks (TCP) or open file leaks.
  • "JVM Thread Count" - detects thread spawning. Examples include services that spawn too many threads, a likely application bug.
  • "JVM Runnable Count" - detects active threads. Examples include CPU spikes or infinite looping code, a likely application bug.
  • "JVM Waiting Count" - detects threads waiting for other threads. Examples include synchronization bottlenecks like deadlocks or livelocks, likely application or library bugs.
  • "JVM Blocked Count" - detects threads waiting for other threads. Examples include synchronization bottlenecks like deadlocks or livelocks, likely application or library bugs.
  • "Database|Connection Active" - detects database connection spikes. Examples include database problems.
  • "HTTP|Request Time" - detects spikes in HTTP requests. Examples include problems in the application, e.g. blocking. Useful to take a thread dump to debug further.
  • "HTTP|Ping Time" - detects spikes in HTTP requests. Examples include problems in the application, e.g. blocking. Useful to take a thread dump to debug further.
  • "Port|Throttle Disconnect Count" - detects throttling of attempted TCP connections. Either DOS attacks or an overloaded system.
  • "HTTP|400" - detects spikes in redirects.
  • "HTTP|500" - detects spikes in server exceptions. Indication of application bugs.
  • "Cluster|Message Read Count" - detects overloads in cluster messages. Indication of a Resin cluster issue.
  • "Cluster|Message Write Count" - detects overloads in cluster messages. Indication of a Resin cluster issue.

Reacting to Anomalies

The <health-event> attribute of AnomalyAnalyzer allows us to tie health actions to a detected anomaly by using the <health:IfHealthEvent> condition.


Copyright © 1998-2015 Caucho Technology, Inc. All rights reserved. Resin ® is a registered trademark. Quercustm, and Hessiantm are trademarks of Caucho Technology.