Introducing the aggregation in Kafka and explained this in easy way to implement the Aggregation on real time streaming.

Image for post
Image for post

In order to aggregate the stream we need do two steps operations.

  1. Group the stream — groupBy(k,v) (if Key exist in stream) or groupByKey() — Data must partitioned by key.

groupBy or groupByKey uses the through() internally to group the stream by different task on different partitions.

2. Aggregate — count(), reduce(), aggregate()

count() — Simple method used to count the elements

Streaming APIs-

  • Key preserving APIs — mapValues(), flatMapValues(), groupByKey() — Use when you change the Values only of the…

Store, KTable, GlobalKTable,

If application requires aggregation, Kafka uses the store in order to aggregate the stream for further processing. The use case like word count, live trend of any event and live voting can be consider for the candidate for aggregation.

The Stream processing basically requires to consider for below point as a part of solution development.

  • Time sensitivity
  • Decoupling
  • Data format evaluation
  • Reliability/Fault Tolerant
  • Scalability
  • Distributive

Why Aggregation?

As stream is flow of event which possess below features, so it requires aggregation.

  • Unbounded
  • Continuous flow
  • Endless Flow

Use-case case discussion:

Pin validation:

  • Customer Check-in is an event stream
  • Customer check-in system…

About kafka Streaming

Kafka DSL-Streaming

Event Stream — Continuous flow of events, unbounded dataset and immutable data records.

Streaming Operations — Stateless, State full and window based. Used for transform, aggregate, filter and enrich the stream.

Image for post
Image for post

Kafka sends the events though network which requires the message to be serialize before sending over the network. Publisher API provides the serializer like IntegerSerializer, StringSerializer etc, same sense of deserializer.

Serializer is used by publisher while Deserializer by consumer.

If the default serializer may not full-fill the need specially in case of any custom object required to pass through network. For example employee, customer objects etc. In this case we can build the customer serailizer and configure as serializer.

Example:

Lets consider a json based message need to send to Kafka topic, then follow the below steps.

  • Create Json

A sample Streaming Application

Concept: While doing the Kafka stream client development, need to take step wise approach. Steps can be-

  • Understand the event stream type and NFR
  • Identify and model Event
  • Transport event
  • Process the event stream

Example:

  • Customer Check-in is an event stream
  • Customer check-in system generate the event
  • Model the event
  • Transport the event though HTTP, JMS, DB etc
  • Process and publish the event

A sample stream client application

Follow the below steps in order to build the Stream client application.

  1. Create the stream configuration
public static Properties getStreamConfigProps(){ Properties props=new Properties(); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, IAppConfigs.BOOTSTAP_SERVER); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG…

This story talks about the event and processing of events using Kafka.

Event- Shows the change in state of object. The unbounded form of flow of event is called event stream. It shows something happened in the source of the event.

Streaming- The unbounded form of source of event

Producer- The source that produces the event or event stream

Consumer- To continuously capture and analyze the events stream.

There are three types consumers in Kafka

  • Consumer API- Very simple use case like read data, parse and route
  • Stream API- For complex logics requires like reading pulse rate, fraud detection or…

This story talk about advance feature of Kafka like -

Interceptor

Transaction

Partitioning

Idempotent Producer

Compression

Batching

Interceptor

Helps to change the behaviour of Kafka client application without changing the code. It can be used for the message enrichment or to any this before sending the message or after the message received from Broker.

In order to implement the logic for interceptor in application we need to use the ProducerInterceptor interface on a class and use the class where ever is required.

This interface has two methods to be implemented on the class wherein we can write the logic to change…


This talks about the Apache Kafka Architecture, its components and the internal structure for Kafka for its components.

Why Kafka?

  • Multiple producers
  • Multiple consumers
  • Disk based persistence
  • Offline messaging
  • Messaging replay
  • Distributed
  • Super Scalable
  • Low-latency
  • High volume
  • Fault tolerance
  • Real-time Processing

Apache Kafka Architecture

Single Cluster: Apache Kafka

Image for post
Image for post

Components:

  • Broker
  • Producer
  • Consumer
  • Message
  • Zookeeper
  • Topic

Broker:

  • Core component of Kafka messaging system
  • Hosts the topic log and maintain the leader and follower for the partitions with coordination with Zookeeper
  • Kafka cluster consists of one or more broker
  • Maintains the replication of partition across the cluster

Producer:

  • Publishes the message to a topic(s)
  • Messages…

Prerequisites

Producer Flow

Event is published as ProducerRecord to the topic which resides in and managed by broker. The ProducerRecord is formed and pushed by Producer. It is followed in a step of steps that is depicted as below.

  • Create the Configuration Properties
  • Create the Producer as below
  • Send the message

How does Kafka process the record and send the event to broker?

The publisher API creates the ProducerRecord and call the send() method.

Steps Like:

  • Create the ProducerRecord
  • Serialization
  • Partitioning
  • Success and failed scenarios
Image for post
Image for post

A complete example is written at the end of the story.

Kafka Producer Configuration

bootstrap.servers
key.serializer
value.serializer
client.id
ack

  1. ack=0…

Messaging: Application Integration

Messaging is the backbone of Enterprise wherein the applications talks to each other over integration layer. There are two basic factors are considered as a main driving factors for implementation of the integration layer.

1. Protocol

2. Design Principle

Protocol are like HTTP/ HTTPS/JMS.

I will take little more time to talk about the Design principle. It can be one of the following. The implementation for mentioned below are considered on case to case basis. There is no hard line can be drawn in the selection of the application integration design.

Only one type of Metric publisher

Image for post
Image for post
Single-Direct Metrics publisher.

Narayan Kumar

Middleware Expert in Design and Development, Working on Kafka, Eager to investigate and learn — Software Engineer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store