Learn about Health Management in IBM WebSphere Application Server and how to create the health policies.

What is Health Management?

Health Management is part of WebSphere Virtual Enterprise environment, which is integrated into WebSphere Application Server 8.5.

WebSphere 8.5 is integrated with Operational policies, which leverage Health Policies.

Health Management is a policy-driven approach to monitoring the WebSphere enterprise application server usage and able to respond to the problem areas before the outage occurs.

Health Management has two elements:

  1. health controller
  2. health policies

What is Reaction Mode?

Health Policies include the health condition, which you want to monitor in your environment. It reacts when your defined requirements are not met.

There are two reaction modes.

  1. Automatic mode: System will take action when health policy violation is detected.

Ex, if you configure to monitor memory usage and would like to restart JVM when message usage is 85%, then the system will restart targeted JVM when JVM heap size reaches 85%.

  1. Supervised mode: System will create runtime task when health policy violation is detected. This requires manual intervention for WebSphere administrator to approve or deny the runtime task action.

What is Health Conditions?

Health Condition is the object or metrics you want to monitor your environment.

There is eight predefined health condition available in WebSphere 8.5. You do have an option to create custom health condition.

  • Age-based condition – this condition will monitor the defined JVM and take action when reaches a configured age threshold.

Ex:

You can configure this condition to restart JVM if it’s running for 15 days. Acceptable value for this situation is in Days or Hours as shown below.

  • Excessive request timeout condition – this condition will take action when the request timeout percentage exceeds the defined value. Acceptable value is in percentage as shown below.

  • Excessive response time condition – this will monitor the time it takes for a request to complete and take action if the time exceeds the defined threshold.

Ex:

You can configure this condition to take a thread dump when response time for a request is one minute. Acceptable value is in Milliseconds, Seconds and Minutes as shown below.

  • Memory condition: excessive memory usage – monitors the memory usage of JVM and take action if it exceeds the threshold value.

Ex:

You can configure this condition to take JVM heap dump and restart JVM when memory usage exceeds the threshold. Acceptable value for JVM heap size is in percentage and offending period in Seconds and Minutes as shown below.

  • Memory condition: memory leak – this will look for memory leaks on JVM and take action.

This got three detection levels.

  1. Fast (false alarms)
  2. Standard (some false alarms)
  3. Slow (fewer false alarms)
  • Storm drain condition – monitor the significant drop in the average response time and take action like generate thread dump and restart JVM.

This got two Detection level.

  1. Standard (some false alarms)
  2. Slow (fewer false alarms)
  • Workload condition- this condition will detect once a JVM has served a configured number of requests.

Ex:

You can configure to restart JVM once it serves 20000000 requests.

  • Garbage collection percentage condition – this monitor percentage of time spent in garbage collection for a defined period and take action once exceeds the threshold. Acceptable value is percentage and sampling period as shown below.

What is Health Action?

Health Action is the health policy action to be run once exceeds the configured threshold.

There is seven predefined health action available in WebSphere 8.5.

  • Restart Server- to restart JVM
  • Take thread dumps– to take thread dumps of JVM
  • Take JVM heap dumps– to take JVM heap dumps
  • Generate an SNMP trap- generate SNMP trap for troubleshooting
  • Place server in maintenance mode- stop new client requests and serve only active session
  • Place server in maintenance mode and break affinity – stop new and existing action session
  • Place out of maintenance mode – ready to accept new requests

You do have an option to create custom health action.

How to Create Health Policies?

Health policies can be created in four easy steps.

  1. Define health policy general properties- here to provide the name of policy and select the health condition
  2. Define health policy health condition properties- here to provide threshold of health condition chosen and configuring required actions to be taken when health condition breaches
  3. Specify members to be monitored- select JVM, Clusters, Dynamic clusters, on-demand routers or Cell as target of health policies
  4. Confirm health policy creation- review health policies configuration and confirm to create

Let’s create one health policy as follows.

  • Login into WebSphere 8.5 ND DMGR Console
  • Click Operational policies >> Health Policies
  • Click New
  • Provide Name – Test_Policy
  • Select Health condition as workload condition (we can test this condition quickly)
  • Click Next
  • Enter Total requests as 1000 for testing purpose
  • Select Reaction mode as Automatic
  • Add Action Restart server and Take thread dumps

  • Click Next
  • Select Filter by as Servers/Nodes
  • Add server1 as target member
  • Click on Next
  • Review the configuration and click Finish

Now, let’s test by accessing application running on targeted JVM (server1).

Once JVM serves 1000 request, it should take a thread dump and restart. You can use JMeter to put the load so testing can be done quickly.

What is Health Controller?

Health controller controls the health policies and monitors the system. Health monitoring must be enabled in Health Controller to monitor policies.

Health controller itself has configurable properties like how often it should run and sometimes to restart the server.

This allows allowing you to restrict restart server during business peak hours.

What is Health Policy Target?

Health Policy or Action target can be JVM’s, Clusters, Dynamic clusters, on-demand routers or Cells.

I hope this helps to understand better. If you are an interested in learning DevOps, then check out this fundamental course.