Error deploying task manager after failure in Yarn

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Error deploying task manager after failure in Yarn

Anil
I'm using Flink 1.4.2 and running Flink on Yarn.  Job runs with a parallelism
of 2. Each task manager is allocated 1 core. When the container memory
exceeds the allocated memory yarn kills the container as expected.

{"debug_level":"INFO","debug_timestamp":"2018-12-04
15:52:29,276","debug_thread":"flink-akka.actor.default-dispatcher-17","debug_file":"YarnFlinkResourceManager.java",
"debug_line":"545","debug_message":"Diagnostics for container
container_1528884788062_18043_01_000002 in state COMPLETE : exitStatus=Pmem
limit exceeded (-104) diagnostics=Container
[pid=29271,containerID=container_1528884788062_18043_01_000002] is running
beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory
used; 13.4 GB of 2.1 GB virtual memory used. Killing container.

The job manager then tries to start a new task manager , but fails with the
following error. Why is the job manager not able to allocated a new task
manager when there's a lot of resource in the cluster.  Flink tries to
re-deploy the it 5 times as per set restart strategy and then fails the job.
Can someone point me in the correct direction here to debug the issue.
Thanks!

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Not enough free slots available to run the job. You can decrease the
operator parallelism or increase the number of slots per TaskManager in the
configuration. Task to schedule: < Attempt #5 (Source: Custom Source ->
from: (zoneId, cityId, time_stamp) -> select: (DeliveryZoneFromId(zoneId) AS
zone, CityFromCityId(cityId) AS city, +(CAST(time_stamp), 19800000) AS
time_stamp) -> to: Row -> Sink: Unnamed (2/2)) @ (unassigned) - [SCHEDULED]
> with groupID < cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group <
SlotSharingGroup [cbc357ccb763df2852fee8c4fc7d55f2] >. Resources available
to scheduler: Number of instances=1, total number of slots=1, available
slots=0
        at
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:263)
        at
org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:142)
        at
org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$1(Execution.java:440)
        at
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at
java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at
org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:438)
        at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:503)
        at
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:900)
        at
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:854)
        at
org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1175)
        at
org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
        at
org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/