Translator: Jing Yan.
Considering the popularity and success of SQL, this article is more of a paradox study. SQL can be clunky and verbose, but developers often find it the easiest and most straightforward way to extract the data they need. When a query is written correctly, it can be lightning fast, and when a query fails, it can be surprisingly slow. It's been around for decades, but new features are constantly being added.
These contradictions don't matter, because the market has shown that SQL is the first choice for many, even with newer, more powerful options. From the smallest to the largest large corporations, developers everywhere know SQL. They rely on it to organize all their data.
The ** model of SQL is so popular that many non-SQL projects end up adding the SQLish interface because the user needs it. Even the "NoSQL" movement – which was invented to get rid of the old paradigm – ultimately seems to have lost to SQL.
The limitations of SQL may not be enough to discard it entirely. It is also possible that the developer will never set out to migrate all the data out of the SQL. But the problems with SQL are real, enough to put pressure on developers, add delays, and even require redesign of certain projects.
Here are 9 reasons why we want to give up on SQL, even though we know this may not be possible.
The relational model is a fan of **, so we've been building it. This is fine for small or even normal-sized databases, but in the face of truly large-scale databases, the model collapses.
Some people try to solve the problem by combining the old with the new, such as integrating sharding into an old open-source database. Adding layers seems to make the data more manageable and offers unlimited scale. But these added layers can hide dangers. Depending on the amount of data stored in the shard, the processing time for select or join can vary greatly.
Sharding also forces database administrators (DBAs) to consider the possibility that data may be stored on different machines, and perhaps even geographically. Inexperienced administrators can be confused when they start a cross-table search without realizing that the data is stored in different locations. The model sometimes abstracts the location from the view.
Some AWS machines are equipped with 24TB of RAM because some database users need that much. They have so much data in their SQL database, and they run better that way.
SQL may be an "evergreen" language, but it is not particularly well-suited to newer data exchange formats such as JSON, YAML, and XML. All of them support a more layered and flexible format than SQL. The core of SQL Database is still stuck in a relational model, full of all kinds of **.
The market will find ways to cover up this common complaint. It's relatively easy to add different data formats (such as JSON) with the right glue**, but you'll pay the price of lost time.
Some SQL databases are now able to encode and decode more modern data formats such as JSON, XML, GraphQL, or YAML as native features. But internally, the data is often stored and indexed using the same legacy model.
How much time does it take to convert data between these formats?Wouldn't it be easier to store data in a more modern way?Some smart database developers continue to experiment, but oddly enough, they often use some sort of SQL parser.
A database can store data in a table, but requires a programmer to write a ** that processes objects. Much of the work of designing a data-driven application seems to be the best way to figure out the best way to extract data from a database and transform it into something that business logic can handle. The data must then be unassembled by converting the data fields in the object to SQL Upsert. Isn't there a way to keep the data in a ready-to-use format?
The original SQL database was designed for batch analysis and interactive patterns. A streaming data model with a long processing pipeline is a relatively new idea, and it doesn't exactly match.
The main SQL database was designed decades ago, and the model of the time envisioned that the database could run independently and answer queries like some kind of oracle. Sometimes they are quick to react, sometimes they are not. This is how batch processing works.
Some of the latest applications require better real-time performance – not just for convenience, but because the application needs it. In the modern world of streaming, a lack of real-time capabilities won't work.
The latest databases designed for these markets place a strong emphasis on speed and responsiveness. They don't provide the kind of complex SQL queries that would cause delays.
The power of a relational database lies in breaking down data into smaller, more concise tables. However, problems also arise.
Using a join to dynamically reassemble data is often the most computationally expensive part of a job because the database has to process all the data. The problem starts when the data starts to exceed the RAM.
For those who are learning Xi SQL, joining can be confusing. Figuring out the difference between an internal and external join is just the beginning. It's more difficult to find the best way to connect multiple joins together. Internal optimizers may help, but when database administrators ask for a particularly complex combination, they can't do anything about it.
One of the great ideas of the "nosql" movement is to free users from columns. If someone wants to add a new value to an entry, they can choose any tag or name they want. You don't need to update the schema to add new columns.
The SQL defenders see nothing but confusion in this model. They like the order in which the tables come and don't want developers to rush to add new fields. They have some point, but adding new columns can be very expensive and time-consuming, especially in large tables. Putting new data in separate columns and matching them with join adds more time, cost, and complexity.
Database companies and researchers have spent a lot of time developing excellent optimizers that can break down queries and find the best way to sort their operations.
The benefits may be significant, but there is a limit to what the optimizer can do. If the query requires a particularly large or granular response, the optimizer can't just say, "Are you really sure?"."It has to assemble the answers and do what it's told.
Some database administrators don't realize this until the application starts scaling. Early optimizations are sufficient to process test datasets during development. But at critical moments, the optimizer can't do more.
Developers are often faced with a dilemma between users who want faster performance and those who don't want to pay for larger, more expensive hardware. A common solution is to denormalize tables so that there is no need for complex join or cross-table operations.
It's not a bad technical solution, and it often wins because disk space has become cheaper than processing power. But denormalization also discards the best parts of SQL and relational database theory. When the database becomes a long csv file, all these fancy database features are almost gone.
Over the years, developers have been adding new features to SQL, and some of them are excellent. But on the other hand, there are some new features that can cause performance issues. Some developers warn that "you should be especially careful with subqueries, as they slow down all operations". Others say that "choosing a subset like a public table expression, a view, or Windows would be too complicated."
For example, the window function is designed to speed up basic data analysis by speeding up the results of calculations, such as averages. But there are some additional features that many SQL users will discover and use. In most cases, they try new features and only notice these issues when their machines are slow like crawling. Then they'll need some experienced database administrators to explain what's going on and how to fix it.