Looking back on 30 years of database veterans Oracle was also a nightmare for DBAs

Mondo Technology Updated on 2024-01-30

A while ago, when the after-sales staff of an original database factory chatted with me, it seems that our old Oracle DBA can better understand and tolerate the current situation of domestic databases. I said of course, today's young DBAs have only seen a super bull x Oracle database, which is like we have been abused by Oracle to the fullest. They don't know that data products will go through an unbearable childhood before they can get better and better, and the same is true for domestic databases.

Oracle's bitter past

It's true that Oracle wasn't born to be as good as it is now, and when I first started using Oracle, Oracle was just like a scumbag who still loves and hates. Even if you don't mention the unbearable 90s, it's just a relatively reliable Oracle 92 Speaking of the era, the past of Oracle at that time was also very poignant for DBAs.

The oracle database runs smoothly and never goes downThis is not suitable for the scene of more than ten or twenty years ago。At that time, there were too many things like databases going down, restarting, hanging, and being too slow to use. It is not uncommon for the database to be restarted and not shut down for an hour or half an hour. And because I didn't know much about the principles of the oracle database at that time, I didn't even know why I couldn't turn it off or get up, and I didn't dare to operate it casually, and even some users didn't dare to close or restart the database easily.

Helplessness in the face of inexplicable hanging, or a database that is too slow to use is still often remembered. CBC latches, buffer busy waits, latch free, 、..., which problem will keep the DBA busy for half a day. Not to mention whether it is a shared pool, library cache lock, or ora-4031 issue.

ORA-1591 error

When I think of Oracle's unbearable past, the first thing that comes to mind is the ORA-1591 bug. The DBA that can tell what the problem is at once is estimated to be at least 40 years old, because since Oracle 10G, Oracle has a strong ability to automatically handle distributed transaction failures.

And for Oracle 8i, 9i, and even earlier usersIt's a nightmare.

At that time, XA distributed transactions based on Tuxedo were quite popular, and dblink was also a common technology used by developers. For two-phase commit, if a distributed transaction fails, the database system or XA manager should automatically roll back or commit the transaction. However, the early oracle database had poor processing power in this regard, and because the database itself was not mature enough, the proportion of distributed transaction failures was quite high.

If a distributed transaction fails and the data dictionary related to dba 2pc xxx is incomplete, the database cannot automatically roll back or commit the distributed transaction. If it happens that another application needs to access the pending data, the ORA-1591 error appears.

In general, if there is a problem with the database, restarting the database can solve the problem. However, the error of ora-1591, the distributed transaction failure engine, does not disappear when the database is restartedMany users repeatedly restart the database without being able to solve this problem. In this case, you can only manually complete the data of the system dictionary such as dba 2pc pending, and then use rollback force or commit force to force rollback or commit related local transactions. This is the problem I encountered most often after being woken up in the middle of the night almost twenty years ago. Most users can solve the problem in a dozen or twenty minutes under remote guidance. This problem with Oracle is annoying, but it also makes me make a fortune from time to time.

ORA-1578 issue

Ora-1578 is also a common issue for Oracle users. At that time, there were too many Oracle database bugs to count, and the most headache was the bug that could cause logical bad blocks, at first I thought that the bad blocks were caused by hardware, but later I found that the chance of oracle bugs generating logical bad blocks was far greater than that of hardware. Once a logical bad block occurs, you must use DBMS Repair to fix it, and the only tool that cannot repair the bad block is to use this tool to force the block to be marked as a bad block, otherwise the SQL error will be reported when the full table scan is performed. At that time, the app was so badly written that it was almost impossible to avoid a full-table scan.

Some logical bad blocks are not reported in the form of ORA-1578, but sometimes appearora-600[kcbgtcr_x]It also means that the application scans for logical bad blocks.

Once a buddy made 9 for the operator2.0.2 to 92.0.5 upgrades. The next day after the upgrade, the database began to report a large number of ora-600 errors. My buddy was still called to the scene by the user in his sleep, and sat in front of the terminal under the gaze of the leaders of the IT department, big and small, and his mind was blank.

Afterwards, he said to me with palpitations that his mind was blank at the time and he didn't know how to analyze this problem, but under the gaze of so many Party A leaders, he couldn't sit there stupidly and do nothing, so he kept typing the so-called SQL statements in SQLPLUS to delay time. Because he called me ** when he was on the road, he knew that all his work on the field was without any hope, and the only thing he could count on was that I was in the distance to help him check metalink.

I was lucky that day, after he typed on the keyboard for more than ten minutes, I found a similar bug, and my ** rescued him. Neither of us was sure at the time if it was this bug that caused the 600 error, but there was no choice. Fortunately, after playing this patch, the error actually disappeared.

The Oracle DBA difference

In those unbearable years, as an Oracle DBAIn addition to having excellent technology, you must also have extraordinary courage。Just like my buddy, under the gaze of a bunch of leaders, he was able to type orders for more than ten minutes without stopping. If this quality were to be replaced, I would definitely not be able to.

In the case of dealing with complex problems, DBAs must have several unique skills. For example, you can read and analyze alert logs, analyze system state dumps, perform hanganalyze analysis, and view operating system logs. The database is hanging, sqlplus can't log in to the database, do you still have a way to do hang's diagnosis?People who don't have mastered the skills of sqlplus -prelim 'as sysdba' will be numb when they encounter such problems

If I encounter an ORA error or an ORA-600 error, can I roughly determine the range of the fault based on the error number?In those days, I dare say I was a master at this, and having to go through the error segment table several times a day made me very sensitive to these mistakes.

Two days ago, a friend asked an error message in the group, I haven't been doing operation and maintenance on the front line for many years, but based on 20 years of mechanical memory, without turning over the data, I quickly gave a general direction, but I didn't expect it to be right. For the old DBA who has been tormented by Oracle database for many years, there are some specialties in these aspects.

Summary

Oracle is not born to be a good database, and downtime, bugs, bad blocks, GCS GES, shared pools, hanging, and unavailability have also plagued DBAs from time to time. However, the problematic Oracle has cultivated a large number of extremely high-level DBAs. Twenty years ago, I often helped some foreign DBAs solve some difficult problems on MSN. Once a foreigner asked me, you should be the best Oracle DBA in China. I replied, "It's okay, it's far worse than the best DBA." He finally said: "China's DBA is too good".

Now we are facing the domestic database is also like the oracle that is not super awesome as many problems, a while ago I heard a friend comment on the domestic database, "independent research and development is all bugs, based on open source is not new", very disdainful of the domestic database. A complex basic software system such as a database must be continuously improved in the process of using a large number of users in order to eliminate a large number of software bugs and eventually gradually mature. Oracle has spent more than 20 years and has become perfect, and we should also give domestic databases a few years to grow.

Author丨White Eel**丨***White Eel's Cave (ID: baishan755) DBAPLUS community welcomes contributions from technical personnel, submission email: editor@dbapluscn

Related Pages