Picture of a telescope

Big News In Databases — Summer 2018

Don’t fall behind! Here’s the most important database news from the last few months.

SQL’s Rise Continues

Google’s shift towards SQL was already mentioned in my previous newsletter. This time a new article by Google further forwards an argument that Google initiated years ago: in the original Spanner paper (2012) Google said that strong consistency—like that offered by SQL databases—causes fewer problems than the eventual consistency model that is often employed by NoSQL systems. In the F1 paper (2013), Google described eventual consistency as an “unacceptable burden to place on developers”. In their recent article, Google has even put that message into the tile: Why you should pick strong consistency, whenever possible.

Moreover Amazon is not afraid to point out a few advantages when it comes to using SQL and the relational model. They tweeted this from their latest AWS re:Invent conference: SQL on the relational schema is orders of magnitude faster than JSON processing.

The marketing term NoSQL, which was the hippest buzzword just a few years back, is slowly becoming a synonym for a defect. Without SQL, there is something missing. It is no surprise that more and more systems seek to distance themselves from this term by introducing an SQL interface. Recently, a particularly well-established system opted for this path: Elasticsearch.

The adoption of SQL doesn’t stop there. Even traditional SQL databases continue to embrace SQL by introducing more modern SQL features into their dialects. Triggered by the release of MySQL 8.0, I’ve written a comprehensive article about this development.

On My Own Behalf: My Next Training

A lot has happened since SQL-92. My week-long training is the update for experienced SQL users. Check it out now.

Cloud First

Oracle has completed a version number jump from 12.2 to 18 and released their new database, initially to the Oracle Cloud and to Engineered Systems only. In stark contrast to semantic versioning, this release is only at the level of a new patch. In the previous version numbering scheme, the release number would have been, i.e. only the last digit would have been incremented. From a developer’s perspective, the release notes are thus not very exciting.

Still, not everyone is so in love with the cloud. SAP demonstrated a particularly noteworthy failure. Two years after introducing their cloud solution SAP Anywhere, they cancelled this service—apparently, they only ended up with 30 customers in those two years.

Technology and Science

Of course the technology news of the past few month has been dominated by the Spectre and Meltdown security vulnerabilities. Those vulnerabilities cast a shadow on cloud solutions as they often share hardware among several customers. The newly discovered type of security vulnerabilities can break the separation so that a malicious client could spy on other clients that use the same hardware. As these weaknesses exploit a hardware bug, they are not limited to databases.

The so-called “fsyncgate” problem only affects systems that use buffered IO to store data persistently—such as some databases. At its core, the problem is that errors can go unnoticed when they occur while writing changed data back from the operating system cache to the disk. Consequently, these errors cannot be handled by the database. The problem was originally reported by Craig Ringer of 2ndQuadrant—a PostgreSQL consulting firm. The actual root cause might be an omission in the POSIX specification of the fsync() function: it does not explicitly specify the subsequent behavior after an error has been reported. Application and operating system developers therefore did not have common expectations for this scenario. Matthew Wilcox, a Linux kernel hacker working for Microsoft (!), explained this problem from the perspective of the Linux kernel at the PostgreSQL developer conference PgCon… and he apologized for the trouble.

According to the “shoot the messenger” principle, PostgreSQL has been heavily criticized for fsyncgate. Indeed, PostgreSQL suffers from this problem more than other databases as it doesn’t offer direct IO. However, other databases—and more generally all software that uses buffered IO—can be affected by this issue. MySQL with InnoDB in the default configuration on many Unix-like systems is one example.

Otherwise recent news was pretty good for PostgreSQL: DB-Engines.com named PostgreSQL the database of the year 2017. PostgreSQL also ranked first—ex aequo with Firebird—in a Code Quality Comparison of Firebird, MySQL, and PostgreSQL. With four times as many demerit points, MySQL ranks second and thus last in this comparison.

On the academic side, there was a new paper from Google about indexing—still my passion since I wrote SQL Performance Explained. The authors of this paper have put a bold claim in the abstract: “by using neural nets we are able to outperform cache-optimized B-trees by up to 70% in speed while saving an order-of-magnitude in memory”. Of course, that is only true once the network is trained. The training itself takes an extraordinary amount of resources so this idea is only applicable to mostly static data sets. It might be an interesting concept for the Google search engine, but it’s not applicable to everyday indexing.

New Major Releases

There have been some impressive releases in recent months:

MySQL 8.0 (April 2018)

After an unbearably long 19 months since the first “development milestone release (DMR)” in September of 2016, MySQL 8.0 has finally achieved general availability in April 2018. As this release has far-reaching impacts that go well beyond the MySQL user base, I’ll refer you once again to my article “One Giant Leap For SQL: MySQL 8.0 Released”.

The game changing news is clear:

Finally, MySQL has evolved beyond SQL-92 and makes some modern SQL features available to a huge user basis.

After such a long period in development, there are obviously many more features. Mysqlserverteam.com provides a nice overview.

MariaDB 10.3 (May 2018)

The introduction of system versioned tables as described by the SQL standard is particularly noteworthy here. It’s one of the functions everybody would have had a need for in the past, and it always sparks great interest in my “modern SQL” presentations. MariaDB is the first widely used free open-source database to introduce this feature, and thus it’s putting some pressure on its competitors. I’ll publish a more detailed review of MariaDB 10.3 on modern-sql.com soon.

The full release notes are available on mariadb.com.

New and Updated Content on Websites

From Twitter, Keeping it Brief (follow me on Twitter)

SQL Renaissance Ambassador

As the SQL Renaissance Ambassador, it’s my mission to make developers aware of the evolution of SQL in the 21st century. My book “SQL Performance Explained” has been published in five languages and can be read online free of charge at use-the-index-luke.com. My next book is currently in the works and can already be read online as it’s being written (modern-sql.com). I am available as a trainer, speaker and consultant for all companies and developers interested in SQL. You’ll find more info at winand.at.

Connect with Markus Winand

Markus Winand on LinkedInMarkus Winand on XINGMarkus Winand on Twitter