How to guarantee exactly-once delivery sinks for RDBMS? How to do a WAL for transactional sinks?
This post has NOT been accepted by the mailing list yet.
My output will sink to MySQL every 15min, but because the quantity is huge so when sink to MySQL, the system would be suffered backPressure of high, so the exactly-once checkpoint can not be completed. But at that time, the TM is killed for some reason. When recovery, the sink operator will recover from the latest completed snapshot, which is not include any sink data. So most data will output twice in that situation.
For this problem, I know a method named 'transactional sinks'. But I don't know how to achieve it. I should maintain a MapState to include a 'key' data in my sink operator, is it right?
Anyone knows how to do it in details?