Flink restart strategy on specific exception

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink restart strategy on specific exception

Ali, Kasif

Hello,

 

Looking at existing restart strategies they are kind of generic. We have a requirement to restart the job only in case of specific exception/issues.

What would be the best way to have a re start strategy which is based on few rules like looking at particular type of exception or some extra condition checks which are application specific.?

 

Just a background on one specific issue which invoked this requirement is slots not getting released when the job finishes. In our applications, we keep track of jobs submitted with the amount of parallelism allotted to it.  Once the job finishes we assume that the slots are free and try to submit next set of jobs which at times fail with error  “not enough slots available”.

 

So we think a job re start can solve this issue but we only want to re start only if this particular situation is encountered.

 

Please let us know If there are better ways to solve this problem other than re start strategy.

 

Thanks,

Kasif

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: Flink restart strategy on specific exception

Till Rohrmann
Hi Kasif,

I think in this situation it is best if you defined your own custom RestartStrategy by specifying a class which has a `RestartStrategyFactory createFactory(Configuration configuration)` method as `restart-strategy: MyRestartStrategyFactoryFactory` in `flink-conf.yaml`.

Cheers,
Till

On Thu, Nov 22, 2018 at 7:18 AM Ali, Kasif <[hidden email]> wrote:

Hello,

 

Looking at existing restart strategies they are kind of generic. We have a requirement to restart the job only in case of specific exception/issues.

What would be the best way to have a re start strategy which is based on few rules like looking at particular type of exception or some extra condition checks which are application specific.?

 

Just a background on one specific issue which invoked this requirement is slots not getting released when the job finishes. In our applications, we keep track of jobs submitted with the amount of parallelism allotted to it.  Once the job finishes we assume that the slots are free and try to submit next set of jobs which at times fail with error  “not enough slots available”.

 

So we think a job re start can solve this issue but we only want to re start only if this particular situation is encountered.

 

Please let us know If there are better ways to solve this problem other than re start strategy.

 

Thanks,

Kasif

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices