Before discussing how to write the content of the event notification when the monitor is configured, it is necessary to clarify the following logic:
After the monitor detection rule takes effect, it performs a series of aggregate data processing on the system business data and retains it in the form of events. These event records can be understood as carriers of abnormal signals emitted by the current monitor's detection object, and the event titles and content discussed in this article are part of these event records. If stakeholders believe that the current anomaly is an urgent matter and needs to estimate the risk and respond in a timely manner, they can send these event records in the form of alarms.
In this transmission process, the three methods in the alarm configuration: non-aggregation, rule aggregation, and intelligent aggregation will process the event title and content accordingly, and finally become the abnormal alarm notification received by the stakeholders. (as shown in the image below).
Back to the original problem, when we use the monitor to detect all kinds of data, we expect that when an abnormal event occurs, the notified object can obtain the detailed context information at the time of the abnormality as soon as possible. This requires creators to pay attention to and understand how to define and edit the title and content of the event that needs to be notified when configuring the monitor.
Theme variables are one of the core elements of editing titles and content. The following are the template variables supported by the observation cloud to help render dynamic copy.
The title of the event is not procrastinating, that is, it can clarify the main point in one sentence. This way, when you receive an event notification, you can get a general idea of what the event is about when you see the title at first glance. Such as:
There is an exception in the status of the members in the consul cluster.
In addition to text-only titles like the ones described above, we can also insert template variables within the titles. Such as:
Host } Less than 10% of the available memory
The link trace error rate for the service is too high, with an error rate of }%.
When specifying the content of the event, we can do so with the help of template syntax. Next, we will use a few common scenarios to show the actual editing effect of event content notifications.
Let's say the monitorby
configuredregion
withhost
Based on the template variables in the table above, we can edit a basic list of event contents
Event Title:
Monitor } found } to be faultyEvent Content:
Region:}Host:}Host:}Level:}Detection Value:}Monitor:} (Alarm Policy:}).Well, produce
error
After the event, the rendered event output is as follows:
Output Event Title:
Monitor Monitor 001 found a faultOutput Event Content:
Region: hangzhouHost: web-001Host: web-001 level: error detection value: 9012345 Monitor: Monitor 001 (Alarm Policy: Team 001)Except for such asScenario 1In addition to directly displaying the field values in the event, we can also use template functions to further process the field values. The template function can be used to optimize the output of the notification content of the event and integrate the necessary information.
Its combined usage form is:
In Observing Clouds, the list of template functions available to us is as follows:
Or a referenceScenario 1, in this case using a template function (ieThe event title and content are written as follows:
Event Title:
Monitor } found } to be faultyEvent Content:
Object:}Time:}Level:}Detection Value:}Then, after the error event is generated, the rendered event output is as follows:
Output Event Title:
Monitor My monitor found that region:hangzhou, host:web-001 is faultyOutput Event Content:
Detection object: region:hangzhou, host:web-001 detection time: 2022-01-01 01:23:45 Fault level: important detection value: 901235Based on the existing configuration page of the observation cloud, we can use the template branching syntax, assuming that we expect to describe exceptions at different levels in the same event content box
if else
) to achieve it.
The grammar is roughly written as follows:
Urgent issues, please deal with them immediately!Important issues, please deal with possible problems, if you have time to deal with data interruptions, please deal with them immediately!No problem!Here's an example effect:
Level:}Host:}Content: Elasticsearch JVM heap memory usage is }%Suggestion: The current collection of JVM garbage cannot keep up with the generation of JVM garbagePlease check the business status in timeLevel:}Host:}Content: Elasticsearch JVM heap memory alarm has been restoredThere is also a situation where event followers want to see the content related to the events generated under the monitor, and also want to implement additional data queries, regardless of whether such data is related to the current monitor configuration rules. At this point, using only template variables is not sufficient for rendering needs. We can do this with an embedded dql query function.
This function supports the detection time range (that is,check_start_time
withend_time
Normally, the first piece of data from the query can be used as a template variable in the template as follows:
A field: }Let's say, we need a query
host
The field is"my_server"
and assign the first piece of data todql_data
Variables:
The editing effect is:
") %host os:}It can be used in templates from now on
Outputs the specific fields in the query results.
Sometimes, the dql statement that needs to be executed needs to be passed with parameters.
Let's say the monitorby
conditionsregion
withhost
, and the content of the event is written as follows:
", region, host) %hostinfo:ip:}os: }Since the event contains only
region
withhost
Template variables are used to label different data and do not contain more information such as IP address, operating system, etc. Well, using inline dql can passregion
withhost
Use it as a dql query parameter to obtain the corresponding data and use it
and other output related information.
Regardless of the above scenarios, our final output goal is to use the template variables provided by the observation cloud to customize and accurately capture the contextual event information at the abnormal moment, so that relevant followers can react in time.