On Tuesday, 6 June 2017 16:22:31 CEST rhashmi wrote:
> because of parallelism i am seeing db contention. Wondering if i can merge
> sink of multiple windows and insert in batch.
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-Sin > gle-Sink-For-All-windows-tp13475p13525.html Sent from the Apache Flink User
> Mailing List archive. mailing list archive at Nabble.com.
Ah, I think now I get your problem. You could manually implement batching inside your SinkFunction, The SinkFunction would batch values in memory and periodically (based on the count of values and on a timeout) send these values as a single batch to MySQL. To ensure that data is not lost you can implement the CheckpointedFunction interface and make sure to always flush to MySQL when a snapshot is happening.
1- flink call invoke method of SinkFunction to dispatch aggregated information. My follow up question here is .. while snapshotState method is in process, if sink received another update then we might have mix records, however per document all update stop during checkpoint. i assume this works the same way.
"As soon as the operator receives snapshot barrier n from an incoming stream, it cannot process any further records from that stream until it has received the barrier n from the other inputs as well. Otherwise, it would mix records that belong to snapshot n and with records that belong to snapshot n+1."
"Streams that report barrier n are temporarily set aside. Records that are received from these streams are not processed, but put into an input buffer".
2- snapshotState method call when "checkpoint is requested". is there an interface that provide when checkpoint complete .. I meant.. I will add my flush logic right after completion of snapshot & before flink resume the stream. With this approach we can assure that we update state only if the checkpoint was successful.
Yes, CheckpointListener will enable you to listen for completed checkpoints. I think that you should put the the values in state before returning from the snapshot method, though, to prevent data loss.
And regarding your other question: yes, when a snapshot is ongoing the invoke() method will not be called.