Sharp eyed to break the ES pseudo slow query

Mondo Culture Updated on 2024-01-29

Service phenomenon

The TP99 performance of the service interface is degraded.

ES phenomenon

ygc: The time consumption is extremely abnormal, the peak value is 200+ times, the time taken is 7s+full GC: The number is abnormal, the number is 1 but it is frequent, and the STW 5s slow query: There are 5+ slow queries

1.Phenomenally, the application is causing the JVM memory usage to grow for some reason, triggering frequent YGC and then triggering the FGC (just a bold guess at this point).

2.In this case, the JVM configuration of ES is 40G JVM memory, and the CMS garbage ** device is used. 40G of memory using CMS garbage ** performance is obviously not as suitable as G1 (Reference.

3.Find ES O&M classmate's garbage ** device is modified from CMS to G1

Tips: Not all ES are suitable for G1, and the full GC of G1 for many large queries will cause the GC mode to degenerate into serial scanning of the entire heap, resulting in pauses at the level of tens of seconds or even minutes. This long-term pause not only affects user queries, but also easily causes communication timeouts between nodes, causing the master and datanode to leave the cluster and affecting the stability of the cluster. )

GC change after modification to G1:

ygc: The time required is extremely normal, the peak time is 35+, and the time taken is 800msFull GC: Normal, the number of queries is 0, and there are 10+ slow queries

After the JVM garbage ** of ES was adjusted, the performance of the service interface of the Jeff interface was not solved because of the solution of the GC problem.

Through communication with the students on the ES side, I learned that the refresh of this ES cluster is extremely abnormal, refresh: 2W+.

Slow query statements in ES monitoring are not slow to be executed individually

Reason: The interaction with ES in the application uses 31.9.The release version of the spring-data-elasticsearch package, ES data synchronization works through the s**e method in the API to save data, as shown in the following figure, the version of the s**e operation will perform a refresh operation after each s**e.

org.springframework.dataspring-data-elasticsearch3.1.9.release

Why does every refresh have an impact on the query, let's catch up with the fashion today, let GPT reply to us and try:

1.Upgrade the version of spring-data-elasticsearch to 4x, because the high version of spring-data-elasticsearch is incompatible with the low version and the cost of changing the low version is large, all parts of the project that involve API operations need to be changed.

2.The s**e operation is operated by operation instead (the currently selected scheme has few changes).

Slow queries are gone.

The number of refreshes has also dropped.

Finally, the performance of the business service interface is normal.

Teachers often say that we are always influenced by empirical ideas and opportunism, and that the fundamental solution to this problem is to seek truth from facts, and practice is the criterion of truth.

Author: Jingdong Logistics Wang Yijie.

*:JD Cloud Developer Community Self-ape said tech **Please indicate**.

Related Pages