Apache Airflow - Question about checkpointing and re-run a job

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Airflow - Question about checkpointing and re-run a job

M Singh
Hi:

I have a Flink job and sometimes I need to cancel and re run it.  From what I understand the checkpoints for a job are saved under the job id directory at the checkpoint location. If I run the same job again, it will get a new job id and the checkpoint saved from the previous run job (which is saved under the previous job's id dir) will not be used for this new run. Is that a correct understanding ?  If I need to re-run the job from the previous checkpoint - is there any way to do that automatically without using a savepoint ?

Also, I believe the internal job restarts do not change the job id so in those cases where the job restarts will pick the state from the saved checkpoint.  Is my understanding correct ?

Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Apache Airflow - Question about checkpointing and re-run a job

M Singh
Folks - Please let me know if you have any advice on this question.  Thanks

On Saturday, November 16, 2019, 02:39:18 PM EST, M Singh <[hidden email]> wrote:


Hi:

I have a Flink job and sometimes I need to cancel and re run it.  From what I understand the checkpoints for a job are saved under the job id directory at the checkpoint location. If I run the same job again, it will get a new job id and the checkpoint saved from the previous run job (which is saved under the previous job's id dir) will not be used for this new run. Is that a correct understanding ?  If I need to re-run the job from the previous checkpoint - is there any way to do that automatically without using a savepoint ?

Also, I believe the internal job restarts do not change the job id so in those cases where the job restarts will pick the state from the saved checkpoint.  Is my understanding correct ?

Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Apache Airflow - Question about checkpointing and re-run a job

Congxian Qiu
Hi
Yes, checkpoint data locates under jobid dir. you can try to restore from the retained checkpoint[1]

M Singh <[hidden email]> 于2019年11月18日周一 上午2:54写道:
Folks - Please let me know if you have any advice on this question.  Thanks

On Saturday, November 16, 2019, 02:39:18 PM EST, M Singh <[hidden email]> wrote:


Hi:

I have a Flink job and sometimes I need to cancel and re run it.  From what I understand the checkpoints for a job are saved under the job id directory at the checkpoint location. If I run the same job again, it will get a new job id and the checkpoint saved from the previous run job (which is saved under the previous job's id dir) will not be used for this new run. Is that a correct understanding ?  If I need to re-run the job from the previous checkpoint - is there any way to do that automatically without using a savepoint ?

Also, I believe the internal job restarts do not change the job id so in those cases where the job restarts will pick the state from the saved checkpoint.  Is my understanding correct ?

Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Apache Airflow - Question about checkpointing and re-run a job

M Singh
Thanks Congxian for your answer and reference.  Mans

On Sunday, November 17, 2019, 08:59:16 PM EST, Congxian Qiu <[hidden email]> wrote:


Hi
Yes, checkpoint data locates under jobid dir. you can try to restore from the retained checkpoint[1]

M Singh <[hidden email]> 于2019年11月18日周一 上午2:54写道:
Folks - Please let me know if you have any advice on this question.  Thanks

On Saturday, November 16, 2019, 02:39:18 PM EST, M Singh <[hidden email]> wrote:


Hi:

I have a Flink job and sometimes I need to cancel and re run it.  From what I understand the checkpoints for a job are saved under the job id directory at the checkpoint location. If I run the same job again, it will get a new job id and the checkpoint saved from the previous run job (which is saved under the previous job's id dir) will not be used for this new run. Is that a correct understanding ?  If I need to re-run the job from the previous checkpoint - is there any way to do that automatically without using a savepoint ?

Also, I believe the internal job restarts do not change the job id so in those cases where the job restarts will pick the state from the saved checkpoint.  Is my understanding correct ?

Thanks

Mans