Skip to main content

Kafka Connect: Geotab contributes to the open source community

For two years, Geotab’s Data Platform team has reaped the benefits of Kafka’s real-time data feed capabilities. We’re giving back by contributing our development expertise to the future of the open-source community.

Wenyang Liu

By Wenyang Liu

July 18, 2022

3 minute read

Data and Analytics photo

As one of the world-leading companies in the internet of things (IoT) and connected vehicles, Geotab processes billions of telematics data points daily aggregated from millions of vehicles worldwide, all feeding into a data warehouse in real-time for data analysis and to help organizations transform insight into business suggestions. 

 

The Geotab Big Data Ingestion Platform (BIP) is a reliable distributed data ingestion infrastructure developed by the Geotab’s data platform team. It supports such a large volume of data, and data science products. It can stream data at a rate of 500 Mb per second (Mbps), and offers the capacity to scale up. 

 

Kafka is an open-source data operations and message queueing platform that uses publisher and subscriber models to handle real-time data feeds. BIP has been using Kafka as the central message queuing component since 2020. Over the years, the team has developed multiple sources and sink connectors to meet the diverse data ingestion needs at Geotab. To contribute back to the community, we open-sourced a new Kafka source connector — GCS Source Connector, which is available on GitHub. 

The Geotab Big Data Ingestion Platform

The Big Data Ingestion Platform (BIP) was originally known as the Geotab High Availability Data Platform, for good reason. It is an exceedingly efficient high-availability data ingestion platform that was built on Kubernetes. The Geotab data ecosystem has grown into a vital information resource for our company, our clients and our partners. The BIP was built to address important concerns like data security, collection, availability and reliability. 

 

All the data the BIP collects is transferred through dedicated data pipelines into our Google BigQuery data warehouse. As a common data ingestion entry point, enabling the Data and Analytics team to control what data comes into BigQuery, enabling data quality control, and guaranteeing delivery. It also facilitates a seamless user onboarding experience. 

Geotab harnessed the strengths of Kafka to develop the BIP

Kafka is a distributed event streaming platform. One of its benefits to data-driven companies like Geotab is that it identifies the relationships between data sets, which helps with data clustering. It is a user-friendly, open-source platform that helps developers to integrate with other platforms and monitor data streaming performance. Geotab chose Kafka because it provides a highly scalable solution that is used extensively in the Internet of Things (IoT), fleet management, and broader automotive industries. 

 

With the use of Kafka, the BIP leverages Kafka Connect to readily configure different data sources and sinks for our pipelines. The GCS source connector was developed to provide an efficient and fault-tolerant way for us to ingest data from Google Cloud Storage buckets. Our GCS source connector implementation also provides a base that can be easily built on to support many file formats or custom business logic.

 

Geotab’s data ecosystem has grown into a vital resource for Geotab, our clients, and our partners.Like the millions of fleet vehicles that Geotab and our partners support every day, terabytes of data are collected and processed about them is vast and valuable. The Geotab Data Platform team works with cutting-edge data platform solutions which meet the scalability, flexibility, reliability and security demands of the fleet management and connected vehicle industry segments. 

 

Our team strives every day to keep pace with the top data platform solutions from North America and around the world that enable the Internet of things (IoT) and telematics infrastructure. Geotab stores data in public cloud services currently operated by Google (Compute Engine), located in the USA, Europe, and Asia. Geotab, our partners, and our customers all benefit from the open-source communities related to Kafka and other data management tools and platforms. Our Data Platform team can innovate faster, and ensure our platform and solutions will be scalable enough to support our growing customer base. We are happy to contribute back to these open-source communities and encourage our partners and customers to do the same wherever possible.   

A long-term commitment to Kafka and open-source technology

Geotab’s Data Platform team had great success building out the BIP on the Kafka Community platform, and we look forward to continuing to contribute to the open-source ecosystem that supports it. We continue to grow our team of big data and analytics professionals and expand our relationships with integrator partners that have telematics data science practices. 

 

Is your organization looking for a reliable, high-performance, customizable Google Cloud Storage Kafka source connector?  Need to transport high-volume data streams into a data lake or a data warehouse? Evaluate our Kafka Connect GCS Source Connector on GitHub. We look forward to seeing your feedback in the Kafka Community. 

 

For information about how your organization can leverage analytics to improve your fleet performance, read our blog post about the Geotab Analytics Lab. Or visit our Data Insight Solutions page to see how your business or your customers could leverage Geotab’s rich data insights from millions of vehicles around the globe.

  

With contributions from Jiawei Wu and Amy Zhang.

Subscribe to get industry tips and insights


Wenyang Liu
Wenyang Liu

Wenyang Liu is a Senior Data Platform Development Team Lead at Geotab.

Subscribe to get industry tips and insights

View last rendered: 11/21/2024 10:28:25