How to prevent from launching 2 jobs at the same time

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to prevent from launching 2 jobs at the same time

aldu29
Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David
Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

Dian Fu
Hi David,

The jobs are identified by job id, not by job name internally in Flink and so It will only check if there are two jobs with the same job id. 

If you submit the job via CLI[1], I'm afraid there are still no built-in ways provided as currently the job id is generated randomly when submitting a job via CLI and the generated job id has nothing to do with the job name. 
However, if you submit the job via REST API [2], it did provide an option to specify the job id when submitting a job. You can generate the job id by yourself.

Regards,
Dian


在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

Zili Chen
The situation is as Dian said. Flink identifies jobs by job id instead of job name.

However, I think it is still a valid question if it is an alternative Flink identifies jobs by job name and
leaves the work to distinguish jobs by name to users. The advantages in this way includes a readable
display and interaction, as well as reduce some hardcode works on job id, such as we always set
job id to new JobID(0, 0) in standalone per-job mode for getting the same ZK path.

Best,
tison.


Dian Fu <[hidden email]> 于2019年9月23日周一 上午10:55写道:
Hi David,

The jobs are identified by job id, not by job name internally in Flink and so It will only check if there are two jobs with the same job id. 

If you submit the job via CLI[1], I'm afraid there are still no built-in ways provided as currently the job id is generated randomly when submitting a job via CLI and the generated job id has nothing to do with the job name. 
However, if you submit the job via REST API [2], it did provide an option to specify the job id when submitting a job. You can generate the job id by yourself.

Regards,
Dian


在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

aldu29
Hi,

Thanks for your replies.
Yes, it could be useful to have a way to define jobid. Thus, I would have been able to define the jbid based on the name for example. At the moment we do not use the REST API but the cli to submit our jobs on Yarn.
Nevertheless, I can implement a little trick: at startup query the Rest API and throw an Exception if a job with the same same is running.
Question: is there a way to retrieve the Job manager uri from my code or should I provide it as parameter ?
thx.
David

On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:

> The situation is as Dian said. Flink identifies jobs by job id instead of
> job name.
>
> However, I think it is still a valid question if it is an alternative Flink
> identifies jobs by job name and
> leaves the work to distinguish jobs by name to users. The advantages in
> this way includes a readable
> display and interaction, as well as reduce some hardcode works on job id,
> such as we always set
> job id to new JobID(0, 0) in standalone per-job mode for getting the same
> ZK path.
>
> Best,
> tison.
>
>
> Dian Fu <[hidden email]> 于2019年9月23日周一 上午10:55写道:
>
> > Hi David,
> >
> > The jobs are identified by job id, not by job name internally in Flink and
> > so It will only check if there are two jobs with the same job id.
> >
> > If you submit the job via CLI[1], I'm afraid there are still no built-in
> > ways provided as currently the job id is generated randomly when submitting
> > a job via CLI and the generated job id has nothing to do with the job name.
> > However, if you submit the job via REST API [2], it did provide an option
> > to specify the job id when submitting a job. You can generate the job id by
> > yourself.
> >
> > Regards,
> > Dian
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> >
> > 在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:
> >
> > Hi,
> >
> > What is the best way to prevent from launching 2 jobs with the same name
> > concurrently ?
> > Instead of doing a check in the script that starts the Flink job, I would
> > prefer to stop a job if another one with the same name is in progress
> > (Exception or something like that).
> >
> > David
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

Till Rohrmann
Hi David,

you could use Flink's RestClusterClient and call #listJobs to obtain the list of jobs being executed on the cluster (note that it will also report finished jobs). By providing a properly configured Configuration (e.g. loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will automatically detect where the JobManager is running (e.g. via ZooKeeper if HA is enabled or it picks up the configured JobManager address from the configuration).

Of course, you could also provide the JobManager address as a parameter.

Cheers,
Till

On Mon, Sep 23, 2019 at 9:08 AM David Morin <[hidden email]> wrote:
Hi,

Thanks for your replies.
Yes, it could be useful to have a way to define jobid. Thus, I would have been able to define the jbid based on the name for example. At the moment we do not use the REST API but the cli to submit our jobs on Yarn.
Nevertheless, I can implement a little trick: at startup query the Rest API and throw an Exception if a job with the same same is running.
Question: is there a way to retrieve the Job manager uri from my code or should I provide it as parameter ?
thx.
David

On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:
> The situation is as Dian said. Flink identifies jobs by job id instead of
> job name.
>
> However, I think it is still a valid question if it is an alternative Flink
> identifies jobs by job name and
> leaves the work to distinguish jobs by name to users. The advantages in
> this way includes a readable
> display and interaction, as well as reduce some hardcode works on job id,
> such as we always set
> job id to new JobID(0, 0) in standalone per-job mode for getting the same
> ZK path.
>
> Best,
> tison.
>
>
> Dian Fu <[hidden email]> 于2019年9月23日周一 上午10:55写道:
>
> > Hi David,
> >
> > The jobs are identified by job id, not by job name internally in Flink and
> > so It will only check if there are two jobs with the same job id.
> >
> > If you submit the job via CLI[1], I'm afraid there are still no built-in
> > ways provided as currently the job id is generated randomly when submitting
> > a job via CLI and the generated job id has nothing to do with the job name.
> > However, if you submit the job via REST API [2], it did provide an option
> > to specify the job id when submitting a job. You can generate the job id by
> > yourself.
> >
> > Regards,
> > Dian
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> >
> > 在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:
> >
> > Hi,
> >
> > What is the best way to prevent from launching 2 jobs with the same name
> > concurrently ?
> > Instead of doing a check in the script that starts the Flink job, I would
> > prefer to stop a job if another one with the same name is in progress
> > (Exception or something like that).
> >
> > David
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

aldu29
Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <[hidden email]> wrote:

> Hi David,
>
> you could use Flink's RestClusterClient and call #listJobs to obtain the
> list of jobs being executed on the cluster (note that it will also report
> finished jobs). By providing a properly configured Configuration (e.g.
> loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
> automatically detect where the JobManager is running (e.g. via ZooKeeper if
> HA is enabled or it picks up the configured JobManager address from the
> configuration).
>
> Of course, you could also provide the JobManager address as a parameter.
>
> Cheers,
> Till
>
> On Mon, Sep 23, 2019 at 9:08 AM David Morin <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Thanks for your replies.
> > Yes, it could be useful to have a way to define jobid. Thus, I would have
> > been able to define the jbid based on the name for example. At the moment
> > we do not use the REST API but the cli to submit our jobs on Yarn.
> > Nevertheless, I can implement a little trick: at startup query the Rest
> > API and throw an Exception if a job with the same same is running.
> > Question: is there a way to retrieve the Job manager uri from my code or
> > should I provide it as parameter ?
> > thx.
> > David
> >
> > On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:
> > > The situation is as Dian said. Flink identifies jobs by job id instead of
> > > job name.
> > >
> > > However, I think it is still a valid question if it is an alternative
> > Flink
> > > identifies jobs by job name and
> > > leaves the work to distinguish jobs by name to users. The advantages in
> > > this way includes a readable
> > > display and interaction, as well as reduce some hardcode works on job id,
> > > such as we always set
> > > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > > ZK path.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Dian Fu <[hidden email]> 于2019年9月23日周一 上午10:55写道:
> > >
> > > > Hi David,
> > > >
> > > > The jobs are identified by job id, not by job name internally in Flink
> > and
> > > > so It will only check if there are two jobs with the same job id.
> > > >
> > > > If you submit the job via CLI[1], I'm afraid there are still no
> > built-in
> > > > ways provided as currently the job id is generated randomly when
> > submitting
> > > > a job via CLI and the generated job id has nothing to do with the job
> > name.
> > > > However, if you submit the job via REST API [2], it did provide an
> > option
> > > > to specify the job id when submitting a job. You can generate the job
> > id by
> > > > yourself.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > > [2]
> > > >
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > > >
> > > > 在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:
> > > >
> > > > Hi,
> > > >
> > > > What is the best way to prevent from launching 2 jobs with the same
> > name
> > > > concurrently ?
> > > > Instead of doing a check in the script that starts the Flink job, I
> > would
> > > > prefer to stop a job if another one with the same name is in progress
> > > > (Exception or something like that).
> > > >
> > > > David
> > > >
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to prevent from launching 2 jobs at the same time

Theo Diefenthal
My simple workaround for it: I start the applications always from the same machine via CLI and just make a file-system-lock around execution of the check-if-task-is-already-running and task-launching part. This of course is a possible single-point-of-failure to rely on one machine starting the jobs but works in my current environment.

Best regards
Theo

----- Ursprüngliche Mail -----
Von: "David Morin" <[hidden email]>
An: "user" <[hidden email]>
Gesendet: Montag, 23. September 2019 17:21:17
Betreff: Re: How to prevent from launching 2 jobs at the same time

Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <[hidden email]> wrote:

> Hi David,
>
> you could use Flink's RestClusterClient and call #listJobs to obtain the
> list of jobs being executed on the cluster (note that it will also report
> finished jobs). By providing a properly configured Configuration (e.g.
> loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
> automatically detect where the JobManager is running (e.g. via ZooKeeper if
> HA is enabled or it picks up the configured JobManager address from the
> configuration).
>
> Of course, you could also provide the JobManager address as a parameter.
>
> Cheers,
> Till
>
> On Mon, Sep 23, 2019 at 9:08 AM David Morin <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Thanks for your replies.
> > Yes, it could be useful to have a way to define jobid. Thus, I would have
> > been able to define the jbid based on the name for example. At the moment
> > we do not use the REST API but the cli to submit our jobs on Yarn.
> > Nevertheless, I can implement a little trick: at startup query the Rest
> > API and throw an Exception if a job with the same same is running.
> > Question: is there a way to retrieve the Job manager uri from my code or
> > should I provide it as parameter ?
> > thx.
> > David
> >
> > On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:
> > > The situation is as Dian said. Flink identifies jobs by job id instead of
> > > job name.
> > >
> > > However, I think it is still a valid question if it is an alternative
> > Flink
> > > identifies jobs by job name and
> > > leaves the work to distinguish jobs by name to users. The advantages in
> > > this way includes a readable
> > > display and interaction, as well as reduce some hardcode works on job id,
> > > such as we always set
> > > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > > ZK path.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Dian Fu <[hidden email]> 于2019年9月23日周一 上午10:55写道:
> > >
> > > > Hi David,
> > > >
> > > > The jobs are identified by job id, not by job name internally in Flink
> > and
> > > > so It will only check if there are two jobs with the same job id.
> > > >
> > > > If you submit the job via CLI[1], I'm afraid there are still no
> > built-in
> > > > ways provided as currently the job id is generated randomly when
> > submitting
> > > > a job via CLI and the generated job id has nothing to do with the job
> > name.
> > > > However, if you submit the job via REST API [2], it did provide an
> > option
> > > > to specify the job id when submitting a job. You can generate the job
> > id by
> > > > yourself.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > > [2]
> > > >
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > > >
> > > > 在 2019年9月23日,上午4:57,David Morin <[hidden email]> 写道:
> > > >
> > > > Hi,
> > > >
> > > > What is the best way to prevent from launching 2 jobs with the same
> > name
> > > > concurrently ?
> > > > Instead of doing a check in the script that starts the Flink job, I
> > would
> > > > prefer to stop a job if another one with the same name is in progress
> > > > (Exception or something like that).
> > > >
> > > > David
> > > >
> > > >
> > > >
> > >
> >
>