Service Check - Investigate
Overview of Service Checks Investigate mode
Clicking on ‘Investigate’ will load a modal window with eight tabs (some tabs will be hidden based on the Service Check).
Investigate - Info Tab
The first of those tabs is the 'Info' tab as below:
The Info tab is a one-stop shop for all information relating to the Service. Each field means:
- Host Name: The unique name of the Host with this Service Check.
- Network Address: The address used to contact this Host.
- Description: Description of the Host.
- Service Check Status: The status of the Service Check, i.e. 'OK', 'CRITICAL', 'WARNING' or 'UNKNOWN'. Also displays how long the Service Check has been in the given state, i.e. 'OK for 2 days ..'. Will also display an Acknowledged text if this Service Check has been acknowledged.
- Status information: The output of the Service Check. This can be multiline output so the area is scrollable.
- Performance Data: If the Service Check returns data in a 'performance data' ('perfdata') format, it will be displayed here.
- Current Attempt: The current attempt number, as Opsview Cloud decides if this is in a Soft or Hard state. This number will be between 1 and the number defined in the 'Max Attempts' field.
- Max Attempts: The number of attempts required for the Service Check to be converted from a 'SOFT' state to a 'HARD' state.
- State Type: Will be either Hard or Soft, based on Current Attempt and Max Attempts.
- Last Check: The date and time of the last check of this Service Check, i.e. the last time the Service Check was run against the host, or the last time a result was received.
- Check Type: Whether the Service Check is an Active Check, Passive, SNMP Trap or SNMP Polling type.
- Monitored By: The name of the Monitoring Cluster that is monitoring the Host and its Service Checks. If monitored by a cluster, the cluster name will be returned here instead of the individual cluster node.
- Latency: The time it took Opsview Cloud in seconds to execute the Service Check.
- Execution Time: The time in seconds it took Opsview Cloud to execute the Service Check and get a result.
- Next Scheduled Check: The date and time of the next scheduled execution of the Service Checks. This is an approximate value based on the check interval. This could be 'Unknown' for passive checks
- Last Status Change: The date and time of the last status change, for example, when the Service Check changed from 'CRITICAL' back to 'OK'.
- Is This Service Check Flapping?: A 'Yes' or 'No' label relating to Flap Detection, which is configured within the 'Notifications' tab of the edit window for the Service Check. See Section Details Tab: Advanced for more information. If the Service Check is marked as 'flapping', this field will change to 'Yes'.
- In Scheduled Downtime?: A 'Yes' or 'No' label relating to whether the Service Check is in a state of downtime or not. If the Service Check is in an active period of downtime (i.e. the current date and time falls within a downtime periods configured date and time), the label will read 'Yes' and a "dialogue" icon will display the comment and expire time for the Downtime .
- Last Update: The date and time of when a result was received for this Service Check.
- Active Checks: An 'Enabled' or 'Disabled' label relating to whether active checks are currently allowed for this Host. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check 'Actions tab'.
- Passive Checks: An 'Enabled' or 'Disabled' label relating to whether passive checks are currently allowed for this Service Check. This is always enabled
- Notifications: An 'Enabled' or 'Disabled' label relating to whether Notifications are currently enabled or disabled for this Service Check. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check 'Actions tab'.
- Event Handler: An 'Enabled' or 'Disabled' label relating to whether an Event Handler is currently allowed for this Service Check. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check 'Actions tab'.
- Flap Detection: An 'Enabled' or 'Disabled' label relating to whether flap detection is currently enabled or disabled on this Service Check. This is configured via the 'Actions' tab. For more information see 'Investigate mode: Service Check 'Actions tab'.
Removing an Acknowledgement
When an Acknowledgement has been set on a Service Check it will be shown next to the Status Information as well as a "dialogue" icon displaying the comment for the Acknowledgement. You can also easily remove this Acknowledgement.
Start by clicking on the menu for the Service Check you wish to remove the acknowledgement from and then selecting Investigate:
Click on the underlined 'Acknowledged' text or the trash can and then confirm the deletion.
Investigate - Actions Tab
The second of the tabs within the 'Investigate' view is the 'Actions' tab as below:
The Actions tab allows you to change certain settings relating to the Service Check such as whether active checks are enabled, or whether flap detection is enabled.
There is a form below the 'toggle buttons' panel, that allows for forcing the status of the Service Check, for example, to change the Service Check from an 'OK' to a 'CRITICAL' state with a user-defined 'output':
Clicking 'Submit' will then submit the new Status command to Opsview.
Investigate - Graphs Tab
The graph tab will be available if this Service Check returns performance metric information.
When the graph tab is first opened, you will see a graph of all the metrics of this service check. Only the first 30 metrics will be shown.
The graph will draw all the performance values of each metric over time. The default range is 1 day. The y-axis will be automatically sized based on the highest and lowest values in the chart.
You can hover over the cursor over the graph to get balloons that highlight what the metric values were at particular points in time. The balloons can be quite busy, so you can toggle it off by going to the circular graph menu in the top right and clicking on "Toggle balloon".
The graph legend panel will show the graph name, as well as the min, average and max values of all the visible plots.
When you hover over the graph, the values will change to the current value of the graphs under the cursor.
When you hover over a metric, all other metrics will be de-emphasised so you can concentrate of that particular line. If you click on a metric the graph line will be hidden - this is useful if you want to reduce the noise from that metric but still compare against multiple metrics.
You can adjust the height of the legend by dragging the bar at the top of the legend panel. You can close the panel by clicking on the close control in the middle of the bar.
You can change the date range of the graph by clicking on the range buttons in the top right. The ranges are:
1d - 1 day
1w - 1 week
1m - 1 month (30 days)
1y - 1 year
The refresh button will update the data, when pressed.
You can choose specific dates in the start and end date fields. This will draw the graph from the beginning of the day of the start, to the end of the day of the end. The refresh button will be disabled as you cannot refresh historical data.
You can zoom into a graph by click and dragging within the graph.
When you let go, the graph will zoom immediately into that time range, updating the start/end time fields appropriately. A new call will be made to Opsview and the graph will be redrawn if there is more granular data retrieved.
From the circular graph menu, you can:
Toggle balloons - this will enable or disable the balloons over each value
Download as ... - to download the graph as an image, either PNG, JPG, SVG or PDF
Save as ... - to save the graph data as a CSV, XLS or JSON format
Annotate ... - to draw on top of the graph any annotations you want
View in Graph Centre - this will display the graphs in Graph Centre where you can compare with other hosts
Copy to Dashboard - this will copy the graph into a new dashlet when you next load dashboard
Investigate - Troubleshoot Tab
The fourth of the tabs within the 'Investigate' view is the 'Troubleshoot' tab as below:
The 'Troubleshoot' tab allows you to test the Service Check, as it would be run by Opsview Cloud, via the user interface. Combined with the 'Macro Help' and 'Plugin Help' windows, you can modify arguments and click 'Submit' to test if the Service Check will work.
Simply modify the arguments using the text entry box at the top as per the plugin help file and click 'Submit' to test various combinations that the plugin may allow.
This tab will only appear for Active Check types.
Note - running the Service Check in this manner will apply Exceptions set on that Host, but will not apply Timed Exceptions.
Investigate - Notifications Tab
The fifth of the tabs within the 'Investigate' view is the 'Notifications' tab as below:
This tab will show all Notifications sent relating this Service Check.
- Time: The date and time the Notification was sent.
- Status: The status of the Service Check or Host check at the point of the Notification; i.e. CRITICAL, DOWN, etc.
- # or Profiles: The number of Notification Profiles to whom the specific Notification was sent. The number is clickable, at which point a new modal window will appear displaying the User, Profile Name and Notification Methods used to notify the User. These Notification Methods are displayed as icons, which have a description in the tooltip when the mouse is hovered over the icon:
- Notification Type: The type of notification. Most of the time, this will be 'Normal', but other types include: 'Acknowledgement','Flapping Start','Flapping Stop'.
- Information: The output of the service check at the time of the notification.
The list of Notifications can be exported by clicking on the 'Export' button, at which point you are prompted to choose one of three export formats: csv, json and xml. When the format is selected, the Notifications list will be generated in the given format and downloaded to the user's desktop/device via the browser.
Investigate - Events Tab
The sixth of the tabs within the 'Investigate' view is the 'Events' tab as shown below:
Essentially a different way of analyzing the history of a Service Check, the Events tab allows Users to choose a date using the date picker on the left hand side, which then re-populates the bar graph with the events (if any) for the chosen date. In the screen above, we have lots of events around 12:00, but no events between 13:00 till 05:00 the next day.
By default, the bar graph is displayed 'full tab', with the event checker minimized. The mouse can be hovered over the bars which will reveal the number of events in that given state, i.e. 1 'WARNING' events in the above example. When one or more bars are clicked, the event checker will be populated with the events from the selected bars:
In the above example we have clicked on the 1 WARNING events bar, which has loaded the event checker with the specified WARNING events. All other bars have a reduced opacity to show they are not selected. You can click additional bars to add to the checker.
To clear the event checker and minimize it we can simply re-click on the '1' bar which will deselect it. When the event checker is empty it will automatically minimize.
Within the event bar, located in the top right, is a 'downwards' arrow. When moused-over, this arrow will reveal three contextual menu options:
- Download as ...: Allows you to choose from one of four formats: png, jpg, svg or pdf.
- Annotate: When selected, allows you to draw and annotate the bar graph. Once annotated, the bar graph can be downloaded using the 'Download as' button.
- Print: Allows you to print the bar graph as an image.
Investigate - Notes Tab
The 'Notes' tab is the second to last tab within the Investigate mode window:
The Notes tab for a Service Check is very similar to the one for Host Groups and Hosts, in that it allows you to enter text in a WYSIWYG editor which can be seen and edited by other users of Opsview Cloud (who have permission to view the relevant Host/Service Check). This is a great way to leave notes about what the Service Check is i.e. 'Interface throughput monitor. This is Tim's Tyres router, they are located in London, UK and have an internal subnet of 192.168.1.0/24 with the router's IP being 192.168.1.254.'.
To see these notes within the database, they are kept within the opsview databases serviceinfo table.
- for hostgroup and host notes these are also shown within the respective opsview tables of hostgroupinfo and hostinfo
Investigate - History Tab
The 'History' tab is the last tab within the Investigate mode window:
The History tab will show the history of the Service Check within a tabular format. The Host UP and DOWN information will also be shown here to display why a service check state may have changed.
The 'Status' and 'Type' columns can be filtered via the columns contextual menu as below:
To filter on the date and time, you can use the filter toolbar at the top of the table. To apply the entered date and time parameters, you should click on the 'search' icon. To clear the entered results and reset the values in the fields, click on the 'cross' icon.
The history list can be exported by clicking on the 'Export' button, at which point you are prompted to choose one of three export formats: csv, json and xml. When the format is selected, the Notifications list will be generated in the given format and downloaded to the User's desktop/device via the browser.
Updated 3 months ago