I’m making a little research about the easiest way to deploy link job to k8s cluster and manage its lifecycle by k8s operator. The list of solutions is below:
If you are using smth that is not listed above please share! Any share about how specific solution works is greatly appreciated.
Thanks in advance
On Fri, May 28, 2021, 10:09 Ilya Karpov <[hidden email]> wrote:
In reply to this post by idkfaon
At my company we're currently using the GCP k8s operator (2nd on your list). Our usage is very moderate, but so far it works great for us.
We appreciate that when upgrading the application, it triggers automatically a savepoint during shutdown and resumes from it when restarting. It also allows to take savepoints at regular intervals (we take one per day currently).
We're using it with Flink 1.12.4 and AWS EKS.
Getting the Flink metrics and logs exported to our monitoring system worked out of the box.
Configuring IAM roles and K8s service account for saving checkpoints and savepoints to S3 required a bit more fiddling although we got it working.
Happy to share code snippet about any of that if that's useful :)
It was last updated with Flink 1.11 in mind, so there is currently no built-in support for the reactive scaling mode recently added in Flink 1.13.
One worrying point though is that the maintainers of the repo seem to have become silent in March this year. There is a small and active community around it though and issues and PRs keep on arriving and are waiting for feed-back. It's all free and OSS, so who are we to complain? Though it's still an important attention point.
Hope this helps,
On Fri, 28 May 2021, at 9:09 AM, Ilya Karpov wrote:
thank you so much to sharing your experience! GCP k8s operator looks promising (currently i’m trying to build it and run helm chart. An issue with k8s version 1.18+ is road block right now, but I see that there is a solution), and also seems like flink team also refers to it this implementation.
In your setup did you solve the problem of visualising list of in-progress jobs?
> One worrying point though is that the maintainers of the repo seem to have become silent in March this year.
lyfts implementation (haven’t tried it yet) seems to be even more abandoned (last release 20/04/2020).
Thanks for the kind feed-back.
We hit the first issue you mention related to K8s 1.18+, we then updated the controller-gen version to 0.2.4 in the makefile as described in the ticket you linked, and then ran "make deploy", which worked around the issue for us.
I'm not aware of the 2nd issue you refer to related to in-progress job? In case that helps, we access the Flink-UI by simply opening a port-forward on port 8081 on the job manager, which among other things shows the currently running jobs.
On Mon, 31 May 2021, at 12:00 PM, Ilya Karpov wrote:
|Free forum by Nabble||Edit this page|