What You Need to Know About Debezium


If you’re looking for an application for change data capture which includes speed, durability, significant history in production deployments across a variety of use cases, then Debezium may be for you. This open-source platform provides streaming from a wide range of both relational and NoSQL based databases to Kafka or Kinesis.  There are many advantages to change data capture which has been covered elsewhere on this site.

So now, let’s talk about what you need to know about Debezium and we’ll cover a few examples in the future.

What is Debezium?

Debezium is a open-source platform whose main purpose is to provide a data streaming platform for change data capture (CDC).

You typically use CDC for duplicating data in real time between different databases. In essence, CDC allows data uploading in increments so that your hardware won’t use too much power for bulk updates.

Debezium monitors your databases and streams every low-level change committed from the source database to Kafka or Kinesis. Once available in Kafka or Kinesis, you now have the option of deploying stream processing applications to act and possibly transform these incoming database changes.

One great thing about Debezium is that your application won’t worry about rolled-back changes. So, you can continue right where you left off in a project without a hitch.

What’s more, it runs on Kafka and Kafka Connect. You can trust these platforms to deliver durability and proficiency in handling large volumes of data.

Why Use Debezium?

Here are some reasons why you should use Debezium in no particular order:

See also  Kafka Connect mySQL Examples

1.   To Invalidate Cache

Adding Debezium into your database gives you the option to invalidate the cache automatically.

For example, if the cache uses a different process, such as Redis or Memcache, you can apply a simple cache invalidation logic to a different process. By doing so, you’re streamlining the main application.

Alternatively, you can modify the cache invalidation code to suit your needs. One instance could be adding updated data in the change events that transmit information to update the recorded caches.

2.   To Simplify Monolithic Applications

Sometimes, you may continue doing work even after you have saved new changes. When this happens, you’re essentially performing dual-writes, which means the application continues to write in different systems outside of the main transaction.

CDC simplifies monolithic applications by using individual threads. As a result, you’ll be able to save and update data as you commit them to the database. Not only does this make application logic simpler, but it also prevents you from missing events or encountering a failed commitment.

3.   To Share Databases

Sharing databases means every application you use is aware of any changes from other applications. Although the common solution is to use a message bus, it’s prone to dual-writes or data inconsistencies.

With Debezium, sharing databases becomes easier. Instead of relying on different applications to monitor committed changes, Debezium does this for you. In other words, Debezium acts as a vehicle to let each application monitor and capture changes.

To connect and stream from databases, Debezium uses source connectors. Some of the connectors the platform uses are Db2, MySQL, PostgreSQL, MongoDB, and Oracle. It also uses Cassandra and Vitess as connectors, but they’re still subject to changes.

See also  GCP Kafka Connect Google Cloud Storage Examples

Conclusion

Debezium is an excellent platform for CDCs. Not only does it simplify your applications, but it also handles your data well by streaming row-level changes in the database.

So, whenever you’re handling large amounts of data in existing applications which need to be integrated into a larger architecture, consider adding Debezium to loosely couple integrations by providing a near real-time streaming integration point.

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

Leave a Comment