The online payment inter-bank clearing system (hereinafter referred to as online banking) is one of the important application systems of the second-generation payment system of the central bank, which mainly supports the processing of online inter-bank retail payment business, and provides customers with 7 24-hour real-time payment services. The online banking system has been widely used in multiple scenarios such as inter-bank payment and inter-bank account inquiry, so that customers can handle multiple inter-bank businesses without leaving home and understand the business processing results in real time.
With the rapid development of online banking business, the phenomenon of overtime in online banking of participants also occurs from time to time. The problem of online banking timeout has seriously affected the payment experience of consumers, and consumers' real-time transfer and online payment business may have a long waiting time, and even transaction failures and payment business cancellation may occur.
When there is a sudden failure of a node of the payment and clearing bank, the payment and clearing system of the People's Bank of China or the collection and clearing bank of the online banking business, it will lead to a large number of concurrency of the online banking overtime, which will affect different ranges of users, and the recovery time of this impact is uncertain.
1. Analysis of the reasons for overtime in online banking business
In view of the problem of overtime in the online banking business, we visited various participants in the jurisdiction on the spot, and investigated in detail the main reasons for the overtime of the online banking business in each period through exchanges and discussions, analysis of logs, etc., and now summarizes the reasons for the overtime of the online banking business as follows.
(1) Detailed analysis of the reasons for overtime in online banking business
1.The first timeout spike occurred between 02:30 and 03:00, with a timeout rate of 018%, 15 times out of this time period, of which 14 times were overtime for participant 2, accounting for 9333%。After investigation, the reason for the timeout was that the participant restarted the signature server one by one during this time period, causing some services to time out.
2.The timeout rate reaches 018% and 012%, the timeouts in this time period were 183 and 91 respectively, of which participant 2 overtime was 182 and 90 respectively, accounting for 9945% and 9890%。After investigation, the reason for the timeout was that the participant's application upgrade and restart service could not be completed within the maintenance window during the liquidation window on March 28, resulting in the timeout of some online banking services from 06:00 to 07:00, and the online banking business returned to normal after 07:00.
3.During the period from 09:30 to 10:00, the highest point of the timeout of this statistics was reached, and the timeout rate reached 024%, 865 times out of this time period, of which 491 times out of participants accounted for 5676%;Participant 2 overtime 374, accounting for 4324%。After investigation, the reasons for the timeout were caused by the freeze of the application process of the online banking system in the participant's first line and the abnormality of the online banking host of the participant's second bank.
4.From 16:30 to 17:00, the timeout rate is 015%, a total of 394 overtime, especially during the period from 16:30 to 17:00 on February 11, the overtime rate of the online banking business of the three participants in Jilin Province was as high as 1705% (Participant 1.)44% (participant two) and 1586% (participant three). After investigation, the reason for this timeout was the abnormality of the bank's online banking system, which led to an increase in the overtime rate of the institutions under its jurisdiction.
5.From 21:00 to 21:30, the overtime rate reaches 006%, 83 times out of this time period, of which 82 times out of participants were overtime, accounting for 9880%。
The reason for the analysis is that the participant processes batch business in the bank during this period, resulting in a high overtime rate of online banking.
(2) Comprehensive analysis of the reasons for overtime in online banking business
After investigation and statistics, we have summarized the reasons for overtime in online banking business into the following categories.
1.Human-manipulation. For example, during business operation, participants perform operations such as system upgrades and server restarts, resulting in a timeout of the online banking business.
2.The performance optimization of the system in the commercial bank is insufficient. For example, the account information query interface of the core system of a commercial bank is timed out, the processing and waiting phenomenon of the online banking database, and the bookkeeping response time is long, resulting in the timeout of the online banking business.
3.Sudden failures. For example, network interruptions, process freezes, middleware exceptions, and other problems lead to the overall business interruption of the online banking system or payment system.
4.Other reasons. The main reason is that the online banking business overtime is caused by occasional problems. For example, network jitter causes the online banking timeout of an individual transaction, or the time for the incoming account packet to reach the receiving bank exceeds 18 seconds, and there is not enough time to process the packet, resulting in the online banking timeout.
Figure 1 shows the proportion of online banking service timeouts caused by various reasons
2. Preliminary study on the solution to the overtime of online banking business
Through the above analysis, we conclude that the server performance and network bandwidth of the current online banking system can meet the timeliness requirements of online banking business processing, and the main reasons for the overtime of online banking business are sudden failures and human operations. All participants should pay more attention, strengthen operation and maintenance management, standardize the operation and maintenance process, and control the overtime rate of online banking at a low level.
(1) Avoid overtime of online banking business caused by manual operation
Participating institutions should try their best to arrange operations that affect the normal operation of online banking services, such as application upgrades and system restarts, within the maintenance window, and should fully assess the maintenance duration and possible impacts, make technical preparations and emergency plans in advance, reduce the impact of non-emergency maintenance on online banking services, and avoid situations where upgrades and changes have not been completed when the clearing window is closed.
(2) Targeted resource optimization and structure improvement
1.Participating institutions should continue to optimize the resources of the bank's online banking system, including improving the performance of the core database, cleaning up the historical data of the database, optimizing the core account query interface, adjusting the processing process of the bank's online banking incoming account business, and canceling the query of the status of the secondary account.
2.Participating institutions should further improve the architecture of the intra-bank online banking system, and further optimize the application platform and the high-availability solution of the pre-bank online banking. For example, the writing mode of application logs is changed from synchronous to asynchronous, so that when the log process is abnormal, the normal processing of transactions by the main process will not be affected, and the log process auto-start mechanism and health check mechanism are designed to realize automatic system recovery after the process is found to be dead.
(3) Strengthen the daily monitoring of the online banking system
1.Participating institutions should continue to monitor the operation of online banking business, strengthen the frequency of online banking system inspections, and at the same time, increase the database processing waiting alarm information, message queue queuing alarm information and network alarm functions, etc., so as to detect problems as early as possible, deal with them in a timely manner, and shorten the duration of failures.
2.Participating institutions should strengthen the transaction monitoring of related systems such as core systems, card systems, and enterprise service buses, especially monitor the use of core system resources during batch processing, limit the resource occupancy rate of batch business processing, and ensure that the system has sufficient resources for real-time business processing while batch business processing.
(4) Refine the traceability process of online banking business overtime
The full-cycle business processing of each online banking business involves multiple links such as the sender, the PBOC's online banking platform and the receiver, and the factors affecting the timeliness of online banking business processing are also complex. In order to accurately locate the time period and fault point of the overtime problem of online banking business, an overtime warning mechanism should be established in each link, the accurate time consumption of business processing should be counted, and the overtime alarm threshold should be refined, so as to facilitate tracking and tracing the overtime problem.