We are running a streaming application on Flink 1.5.2 with BEAM 2.7.0.
We’ve noticed that the checkpoint size appears to be increasing at a slow, gradual rate (see screenshot) over the course of many months and are not certain as to why this is happening.
We take a checkpoint every 5 minutes and have an allowed lateness period of 30 minutes.
Does anyone have any idea why this is happening, and are there any tools we can use to help us debug
what state is being accumulated in this checkpoint? I’m assuming there is something that is supposed to discard prior state but in this case it does not appear to be happening.