回复:Question regarding configuring number of network buffers

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

回复:Question regarding configuring number of network buffers

wangzhijiang999
Hi Ray,

For your question : Does that say that each parallel task inside the TaskManager talk to all parallel tasks inside the same TaskManager or to all parallel tasks across all task managers? 
Each task will talk to all parallel upstream and downstream tasks that both include the same TaskManager and across different task managers.
The consumer and producer tasks may be deployed in the same TaskManager or different TaskManagers.
For the case of same TaskManager, the local data shuffle is directly done by memory copy and the required buffers can be determined by #slots-per-TM^2.
For the case of across TaskManagers, the remote data shuffle is done by network transport and only one tcp connection between two TaskManagers can be reused by all the internal tasks. So the required buffers can be determined by #TMs.

Considering both cases, the formular is #slots-per-TM^2 * #TMs, hope it can help you.

Cheers,
Zhijiang 
------------------------------------------------------------------
发件人:Ray Ruvinskiy <[hidden email]>
发送时间:2017年6月7日(星期三) 23:59
主 题:Question regarding configuring number of network buffers

The documentation provides the formula #slots-per-TM^2 * #TMs * 4 to determine the number of network buffers we should configure. The documentation also says, “A logical network connection exists for each point-to-point exchange of data over the network, which typically happens at repartitioning- or broadcasting steps (shuffle phase). In those, each parallel task inside the TaskManager has to be able to talk to all other parallel tasks.” Does that say that each parallel task inside the TaskManager talk to all parallel tasks inside the same TaskManager or to all parallel tasks across all task managers? Intuitively, I would assume the latter, but then wouldn’t the formula for determining the number of network buffers be more along the lines of (#slots-per-TM * #TMs)^2?

 

Thanks,

 

Ray

 


Reply | Threaded
Open this post in threaded view
|

Re: 回复:Question regarding configuring number of network buffers

Nico Kruber
yes, also please consider the new and simplified network buffer configuration
from 1.3 onwards:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/
config.html#configuring-the-network-buffers

Nico

On Thursday, 8 June 2017 05:26:57 CEST Zhijiang(wangzhijiang999) wrote:

> Hi Ray,
> For your question
> : Does that say that each parallel task inside the TaskManager talk to all
> parallel tasks inside the same TaskManager or to all parallel tasks across a
> ll task managers? Each task will talk to all parallel upstream and
> downstream tasks that both include the same TaskManager and across
> different task managers.The consumer and producer tasks may be deployed in
> the same TaskManager or different TaskManagers.For the case of same
> TaskManager, the local data shuffle is directly done by memory copy and the
> required buffers can be determined by #slots-per-TM^2.For the case of
> across TaskManagers, the remote data shuffle is done by network transport
> and only one tcp connection between two TaskManagers can be reused by all
> the internal tasks. So the required buffers can be determined by #TMs.
> Considering both cases, the formular is #slots-per-TM^2 * #TMs, hope it can
> help you.
> Cheers,Zhijiang -----------------------------------------------------------
> -------发件人:Ray Ruvinskiy <[hidden email]>发送时间:2017年6月7日(星期三)
> 23:59收件人:[hidden email] <[hidden email]>主 题:Question
> regarding configuring number of network buffers The documentation provides
> the formula #slots-per-TM^2 * #TMs * 4 to determine the number of network
> buffers we should configure. The documentation also says, “A logical
> network connection exists for each point-to-point exchange of data over the
> network, which typically happens at repartitioning- or broadcasting steps
> (shuffle phase). In those, each parallel task inside the TaskManager has to
> be able to talk to all other parallel tasks.” Does that say that each
> parallel task inside the TaskManager talk to all parallel tasks inside the
> same TaskManager or to all parallel tasks across all task managers?
> Intuitively, I would assume the latter, but then wouldn’t the formula for
> determining the number of network buffers be more along the lines of
> (#slots-per-TM * #TMs)^2? Thanks, Ray


signature.asc (201 bytes) Download Attachment