Use Single Sink For All windows

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Use Single Sink For All windows

rhashmi
This post was updated on .
Is it possible to return all windows update to single Sink (aggregated collection).

The reason i am asking because we are using mysql for sink. I am wondering if i can update all of the them in single batch so as to avoid possible contention.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

Aljoscha Krettek
Hi,

Do you mean to write all the windows to the sink (MySQL) at the same time or simply that you want to write windows, as they come, using the one single sink instance?

Best,
Aljoscha

> On 4. Jun 2017, at 08:47, rhashmi <[hidden email]> wrote:
>
> Is it possible to return all windows update to single Sink (aggregated
> collection).
>
> The reason i am asking because we are using mysql for sink. I am wondering
> if i can update all of the them in single batch so as to avoid possibly
> avoid contention.
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-Single-Sink-For-All-windows-tp13475.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

rhashmi
In reply to this post by rhashmi
because of parallelism i am seeing db contention. Wondering if i can merge sink of multiple windows and insert in batch.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

Nico Kruber
How about using asynchronous I/O operations?

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
asyncio.html


Nico

On Tuesday, 6 June 2017 16:22:31 CEST rhashmi wrote:

> because of parallelism i am seeing db contention. Wondering if i can merge
> sink of multiple windows and insert in batch.
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-Sin
> gle-Sink-For-All-windows-tp13475p13525.html Sent from the Apache Flink User
> Mailing List archive. mailing list archive at Nabble.com.


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

Aljoscha Krettek
Ah, I think now I get your problem. You could manually implement batching inside your SinkFunction, The SinkFunction would batch values in memory and periodically (based on the count of values and on a timeout) send these values as a single batch to MySQL. To ensure that data is not lost you can implement the CheckpointedFunction interface and make sure to always flush to MySQL when a snapshot is happening.

Does that help?

> On 8. Jun 2017, at 11:46, Nico Kruber <[hidden email]> wrote:
>
> How about using asynchronous I/O operations?
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
> asyncio.html
>
>
> Nico
>
> On Tuesday, 6 June 2017 16:22:31 CEST rhashmi wrote:
>> because of parallelism i am seeing db contention. Wondering if i can merge
>> sink of multiple windows and insert in batch.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-Sin
>> gle-Sink-For-All-windows-tp13475p13525.html Sent from the Apache Flink User
>> Mailing List archive. mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

rhashmi
Thanks Aljoscha for your response.

I would give a try..
 


1- flink call invoke method of SinkFunction to dispatch aggregated information. My follow up question here is .. while snapshotState method is in process, if sink received another update then we might have mix records, however per document all update stop during checkpoint. i assume this works the same way.  


https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html

"As soon as the operator receives snapshot barrier n from an incoming stream, it cannot process any further records from that stream until it has received the barrier n from the other inputs as well. Otherwise, it would mix records that belong to snapshot n and with records that belong to snapshot n+1."

"Streams that report barrier n are temporarily set aside. Records that are received from these streams are not processed, but put into an input buffer".



2- snapshotState method call when "checkpoint is requested". is there an interface that provide when checkpoint complete .. I meant.. I will add my flush logic right after completion of snapshot & before flink resume the stream. With this approach we can assure that we update state only if the checkpoint was successful.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

rhashmi
I think CheckpointListener?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Use Single Sink For All windows

Aljoscha Krettek
Yes, CheckpointListener will enable you to listen for completed checkpoints. I think that you should put the the values in state before returning from the snapshot method, though, to prevent data loss.

And regarding your other question: yes, when a snapshot is ongoing the invoke() method will not be called.

> On 12. Jun 2017, at 19:12, rhashmi <[hidden email]> wrote:
>
> I think CheckpointListener?
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-Single-Sink-For-All-windows-tp13475p13653.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Loading...