Hi Till/ flink-devs,
I am trying to understand why adding slots in the task manager is having no impact in performance for the test pipeline.
Here is my flink-conf.yaml
akka.ask.timeout: 10000 s
akka.lookup.timeout: 100 s
I have an EMR cluster of 8 task manager …for now I wanted to use one taskmanager with multiple slots.
So I start cluster with
$FLINK_HOME/bin/yarn-session.sh -d -n 1 -s 8 -tm 57344
then I run my flink job with –p 8
I see only 1 cpu core is being used in this task manager
In yarn web ui , I see the node that is chosen to be the taskmanager (as I gave –n 1) ip-10-202-4-14.us-west-2.compute.internal:8041
Under memory avail : 56GB
Under vcores used : 1
Under vcores Avail: 31
I was expecting to see vcores used to be 8
Any clue or hint why I am seeing this, also I am not seeing any performance gain(using flink kinesis connector to read a json file and sink to s3 )
It is getting capped with the same performance I have seen with one slot.
Can you give us more information about what your Flink job is doing and the distribution of the Kinesis data/keys? The distribution of work depends a lot on that. Example: If you are using a kafka source with a single partition (I think they are called shards in Kenisis) in the datastream api and do not do a keyBy or some other partitioning operation (rebalance, broadcast, etc.) in the job, then all processing will happen in a single task slot. Similarly if you do a keyBy operation and everything has the same key, then you will get a very similar distribution of work.
On Thu, May 25, 2017 at 5:31 PM, Sathi Chowdhury <[hidden email]> wrote:
Are you seeing 8 slots in the JobManager UI?
How many shards do you have in your kinesis stream?
On Fri, May 26, 2017 at 3:14 PM, Jason Brelloch <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|