Fast! Name a technology category that: almost 400 different options compete for your attention; which brought in more than $80 billion in revenue last year, but is actually speed up in its growth rate; which, decades after its existence, is still spawning startups with seemingly bottomless amounts venture capital; and that drove the most vacancies of every programming language last year. If you’d guessed “database,” you’d be right.
Why is this decades-old market so incredibly hot now? Even if database carriers like Oracle see slow growth, the category is booming. like i have writtena major reason is the cloud, but the main reason is simply that data is becoming increasingly important to any enterprise, with diverse, unstructured data giving rise to new databases to manage it all.
Evolution meets revolution
This should not be how markets work. Product categories rise and then shrink over time, being replaced by other things. For example, Microsoft has spent billions in the operating system (OS) market, but these days we don’t really care much about the operating system.
On the desktop, things like ChromeOS have made it clear that the browser/web is most important, and on the server, companies are increasingly thinking in terms of serverless.
TO SEE: Cheatsheet: How to Become a Database Administrator (Free PDF) (TechRepublic)
Or remember when app servers, enterprise resource planning (ERP), enterprise content management (ECM) were popular new markets? Companies are still dependent on these products, or a variant thereof, but are not considered growth markets.
Databases should arguably be the same. Relational databases were born in the early 1970s and we had Oracle, Microsoft and IBM build huge companies to sell and support them. We should see this market coming to an end now, but it isn’t.
While these vendors have seen their database revenue growth slow, the market as a whole has been anything but. Some of their customers are increasingly flirting with PostgreSQL, but even more customers are turning to cloud databases. Some are even adopting both, with AWS and other cloud giants offering managed PostgreSQL services.
There is also a profound and persistent rise of so-called “NoSQL” databases. While I like the trend, I don’t really like it because databases like MongoDB, Apache Cassandra, Neo4j, DynamoDB, Redis and others are not embraced for what they are not, but rather for what they are – flexible, horizontally scalable and in capable of managing the explosion of unstructured data.
Indeed, relational databases, with the prominent exception of PostgreSQL, have declined over the past nine years, including the past year, relative to non-relational databases, as measured by DB engines (as illustrated here).
That’s not to say that SQL/relational usage is declining. In reality, SQL Adoptionas measured by job vacancies, continues to increase.
Companies are increasingly interested in developers who can search the databases that have run their businesses for years using comfortable, widely used SQL. SQL is popular because it has been a great workhorse for the enterprise.
At the same time, enterprises are just as obviously looking for developers who can help them explore new data types and sources, often not involving SQL.
In other words, it is not an either/or decision. For companies of any reasonable size, it’s a matter of ‘and’. Companies are simply trying to make the best use of their data and turn to the right database for the job.
Restructuring of the market
Zilliz, the company behind the open source vector database kite, just raised $60 million, to add to the $43 million raised in 2020. Never heard of a vector database? You are not alone. A vector database is intended to manage vector embeddings. According to Zilliz’s Frank Liu:
The increasing ubiquity of unstructured data has led to a steady increase in the use of machine learning models trained to understand [unstructured] data. word2vec, a natural language processing (NLP) algorithm that uses a neural network to learn word associations, is a well-known early example of this. The word2vec model is able to convert individual words (in different languages, not just English) into a list of floating point values or vectors. Because of the way models are trained, vectors that are close together represent words that are similar, hence the term embedding vectors.
As such, vector databases are useful in things like searching for images or searching video, audio, or other forms of unstructured data to understand the content, not the keywords associated with that content.
My point is not to offer a tutorial in vector databases. Rather, it is to show that with the continued growth of structured and especially unstructured data, the database market will continue to grow. At the same time, we will see new approaches to databases emerging.
Disclosure: I work for MongoDB, but the views expressed herein are mine.