Cloud technologies have changed the everyday life of developers in recent years and are being used in more and more applications. One particular cloud-based database technology that I have taken a closer look at in recent weeks is Azure Cosmos DB. In this article, I would like to present the advantages of this technology and show why it is particularly suitable for the development of web applications. I will also explain how it works and show you how to configure Cosmos DB.
Why is Azure Cosmos DB a good choice for application development?
The Azure service provided by Microsoft is highly scalable due to its architecture. This allows the data to be geographically close to the user so that data can be accessed without significant latency. Azure Cosmos DB is a managed service. No server with operating system has to be operated. This means that the focus is on development with Cosmos DB and traditional administration is passé. Furthermore, the guaranteed availability is 99.999%, which makes the database ideal for use in applications that are designed for high availability.
What is Azure Cosmos DB and how does it differ from other databases?
In most cases, Cosmos DB is used with the NoSQL API. Therefore, entities are stored in the form of JSON documents and not in tabular form, as is the case with SQL, for example. This approach provides more flexibility in storing data on partitions, which in turn can be located on different servers. The storage of entities in the form of documents is schema-less. In other words, there is no possibility on the database side to specify which property an entry must contain. This is particularly unfamiliar if you have previously worked a lot with relational databases and are used to the structuring of data in these environments.
Relational database systems are familiar with the option of specifying whether columns should be indexed. Due to the lack of schema described above, this option is not available in Cosmos DB. To ensure that queries are fast, all properties of a document are indexed. No more headaches about primary and secondary indexes.
Both approaches offer a major advantage over conventional database systems: Flexibility. Without a schema, an object can be adapted or extended as required without having to worry about database migrations to bring code and database tables up to the same level. In addition, an OR mapper is no longer required as the objects are stored directly as such in the database.
Which applications benefit from Azure Cosmos DB?
Microsoft’s cloud-based database service is very versatile and can therefore be used in various application scenarios. Whether web applications, apps, games, AI or IoT software – the list of uses is long. Applications that process a lot of data or that record many read and write accesses on a global level benefit in particular. Vertical scaling allows the database to grow with your needs.
You’re interested in setting up your application with Azure Cosmos DB but you don’t have time to do it by yourself?
How is an Azure Cosmos DB database set up and configured?
Like the numerous other Azure services, the Cosmos DB can also be set up and configured via the Azure Portal or the Azure CLI. When doing so, you should pay attention to a few options, some of which cannot be changed afterwards.
Selection of the API
Probably the most important decision when creating an Azure Cosmos DB instance is which API to use. The service offers several interfaces via which the data is transferred and stored.
- NoSQL: Microsoft’s own implementation of a document-based database system. The entities are stored in the form of JSON documents. If you want to store data in the form of JSON documents and do not require a rigid structure, the NoSQL API could be the right choice. It is particularly suitable for applications that require flexibility in data modeling.
- MongoDB: Open source database system for non-relational entities. Stores data in BSON format. If you already have experience with MongoDB or have an application that is optimized for this database, the MongoDB API might be the best choice for you.
- Apache Cassandra: Open-source system that stores the data in a so-called “wide-column store”. If you are mainly interested in storing large amounts of data and having quick access to it, the Apache Cassandra API might be suitable for you.
- Table: A key-value store developed by Microsoft. The Table API is particularly suitable for applications that are designed for key-value storage and require fast access to this data.
- Apache Gremlin: Open source graph database. Based on Apache TinkerPop. If you want to store and analyze data in the form of graphs, the Apache Gremlin API is the right choice.
- PostgreSQL: relational database that supports distributed database tables, distributed queries, etc. with the help of “Citus“. If you already have experience with PostgreSQL or have an application that is optimized for this database, the PostgreSQL API may be the best choice for you. It is particularly suitable for applications that require fixed data structures and the ability to use SQL queries.
There is no such thing as “the best API”. The decision depends on the intended use and the technology used to date.
Choice of capacity mode
When creating a Cosmos DB, there are two capacity modes to choose from. Whether you opt for provisioned throughput or serverless has the greatest impact on billing. Each mode has its advantages and disadvantages and which mode suits the respective application must be decided on a case-by-case basis. The following criteria can help here:
Criterion | Serverless | Provisioned Throughput |
---|---|---|
Suitable for | Applications whose usage behavior is unpredictable. | Applications with constant, predictable traffic. |
How it works | No configuration required; database queries can simply be executed against the container. | The provisioned throughput in the form of request units must be defined in advance for each container. |
Storage space limit | 50 GB (1 TB in future) | Unlimited |
Payment | Payment per hour; only the RUs that have been used | Payment per hour; all RUs that have been provided in advance |
Limit expenses and save costs
To keep costs in control, Microsoft offers two options for creating this Azure service. These options are only available in “Provisioned throughput” capacity mode. Firstly, Microsoft offers a “Free Tier Discount”, which can be activated once for one Azure Cosmos DB account per subscription. The first 1000 RU/s and 25 GB of storage are free of charge.
There is also the “Limit total account throughput” option. It is active by default and supports cost control by not exceeding the defined throughput limit.
Global distribution of data
Depending on the region, different options may be available:
- Geo-Redundancy (only for Provisioned Throughput): activates the global distribution of data by linking two regions together (e.g. Western Europe with Northern Europe)
- Multi-region writes (only for Provisioned Throughput): enables worldwide writing to the database
- Availability Zones: increases the availability of the application by distributing the database across several data centers in the same region
Networking
In the network area, the setting for the connection options should be as restricted as possible. This means that access from all networks should be avoided and access options should be defined using a firewall or private endpoints.
Backup policy
In Azure Cosmos DB, you can choose between two possible backup modes:
- Continuous: continuous backup of the data in each region of the Cosmos DB account, recovery via Azure Portal or CLI independently possible
- Periodic: periodic backup up to one month, restoration by customer service
Encryption
By default, data is encrypted using a service-managed key. This means that the data stored in the database cannot be read without this key. This protection can be extended by configuring a customer-managed key (CMK). However, the use of a customer-managed key has an influence on the consumption of request units.
Conclusion
Azure Cosmos DB helps to develop scalable, globally distributed applications, as the technology was designed precisely for this scenario. However, the extensive configuration options can be overwhelming at first. So far, I have used the Cosmos DB with the SQL API and think the approach of storing entities as JSON documents is very practical when accessing the data programmatically. In this scenario, you no longer need OR mappers to control data access and keep an eye on database migrations. However, you have to be careful that you can continue to process old constellations when further developing the objects, as there is no validation of the schema on the database side.