1.6 UI issues

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

1.6 UI issues

Juan Gentile

Hello!

 

We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all.

 

Has anyone had this issue? Any clues as to why?

 

Thank you,

Juan

Reply | Threaded
Open this post in threaded view
|

Re: 1.6 UI issues

Yun Tang
Hi Juan

From our experience, you could check the jobmanager.log first to see whether existing similar logs below:
max allowed size 128000 bytes, actual size of encoded class akka.actor.Status$Success was xxx bytes

If you see these logs, you should increase the akka.framesize to larger value (default value is '10485760b') [1].

Otherwise, you could check the gc-log of job manager to see whether the gc overhead is too heavy for your job manager, consider to increase the memory for your job manager if so.

Best
Yun Tang


From: Juan Gentile <[hidden email]>
Sent: Wednesday, October 31, 2018 22:05
To: [hidden email]
Subject: 1.6 UI issues
 

Hello!

 

We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all.

 

Has anyone had this issue? Any clues as to why?

 

Thank you,

Juan

Reply | Threaded
Open this post in threaded view
|

Re: 1.6 UI issues

Juan Gentile

Hello Yun,

 

We haven’t seen the error in the log as you mentioned. We also checked the GC and it seems to be okay. Inspecting the UI we found the following error:

 

{"errors":["Could not retrieve the redirect address of the current leader. Please try to refresh."]}

 

We suspect we are running into the same issue as described here (http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/akka-timeout-td14996.html) but we are not so sure.

 

Have you encountered this issue before?

 

Thank you,

 

From: Yun Tang <[hidden email]>
Date: Thursday, 1 November 2018 at 12:31
To: Juan Gentile <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: 1.6 UI issues

 

Hi Juan

 

From our experience, you could check the jobmanager.log first to see whether existing similar logs below:

max allowed size 128000 bytes, actual size of encoded class akka.actor.Status$Success was xxx bytes

If you see these logs, you should increase the akka.framesize to larger value (default value is '10485760b') [1].

Otherwise, you could check the gc-log of job manager to see whether the gc overhead is too heavy for your job manager, consider to increase the memory for your job manager if so.

Key Default Description; jobmanager.heap.size "1024m" JVM heap size for the JobManager. taskmanager.heap.size "1024m" JVM heap size for the TaskManagers, which are the parallel workers of the system.

ci.apache.org

Best

Yun Tang

 


From: Juan Gentile <[hidden email]>
Sent: Wednesday, October 31, 2018 22:05
To: [hidden email]
Subject: 1.6 UI issues

 

Hello!

 

We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all.

 

Has anyone had this issue? Any clues as to why?

 

Thank you,

Juan

Reply | Threaded
Open this post in threaded view
|

Re: 1.6 UI issues

Dawid Wysakowicz-2

Hi Juan,

It doesn't look similar to the issue linked to me. What cluster setup are you using? Are you running HA mode?

I am adding Till to cc, who might be able to help you more.

Best,

Dawid

On 02/11/2018 17:26, Juan Gentile wrote:

Hello Yun,

 

We haven’t seen the error in the log as you mentioned. We also checked the GC and it seems to be okay. Inspecting the UI we found the following error:

 

{"errors":["Could not retrieve the redirect address of the current leader. Please try to refresh."]}

 

We suspect we are running into the same issue as described here (http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/akka-timeout-td14996.html) but we are not so sure.

 

Have you encountered this issue before?

 

Thank you,

 

From: Yun Tang [hidden email]
Date: Thursday, 1 November 2018 at 12:31
To: Juan Gentile [hidden email], [hidden email] [hidden email]
Subject: Re: 1.6 UI issues

 

Hi Juan

 

From our experience, you could check the jobmanager.log first to see whether existing similar logs below:

max allowed size 128000 bytes, actual size of encoded class akka.actor.Status$Success was xxx bytes

If you see these logs, you should increase the akka.framesize to larger value (default value is '10485760b') [1].

Otherwise, you could check the gc-log of job manager to see whether the gc overhead is too heavy for your job manager, consider to increase the memory for your job manager if so.

Key Default Description; jobmanager.heap.size "1024m" JVM heap size for the JobManager. taskmanager.heap.size "1024m" JVM heap size for the TaskManagers, which are the parallel workers of the system.

ci.apache.org

Best

Yun Tang

 


From: Juan Gentile [hidden email]
Sent: Wednesday, October 31, 2018 22:05
To: [hidden email]
Subject: 1.6 UI issues

 

Hello!

 

We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all.

 

Has anyone had this issue? Any clues as to why?

 

Thank you,

Juan


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 1.6 UI issues

Till Rohrmann
Hi Juan,

could you share the cluster entrypoint logs with us? They should contain more information about the internal server error.

Just to make sure, you are using Flink 1.6.2, right?

Cheers,
Till

On Thu, Nov 8, 2018 at 3:29 PM Dawid Wysakowicz <[hidden email]> wrote:

Hi Juan,

It doesn't look similar to the issue linked to me. What cluster setup are you using? Are you running HA mode?

I am adding Till to cc, who might be able to help you more.

Best,

Dawid

On 02/11/2018 17:26, Juan Gentile wrote:

Hello Yun,

 

We haven’t seen the error in the log as you mentioned. We also checked the GC and it seems to be okay. Inspecting the UI we found the following error:

 

{"errors":["Could not retrieve the redirect address of the current leader. Please try to refresh."]}

 

We suspect we are running into the same issue as described here (http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/akka-timeout-td14996.html) but we are not so sure.

 

Have you encountered this issue before?

 

Thank you,

 

From: Yun Tang [hidden email]
Date: Thursday, 1 November 2018 at 12:31
To: Juan Gentile [hidden email], [hidden email] [hidden email]
Subject: Re: 1.6 UI issues

 

Hi Juan

 

From our experience, you could check the jobmanager.log first to see whether existing similar logs below:

max allowed size 128000 bytes, actual size of encoded class akka.actor.Status$Success was xxx bytes

If you see these logs, you should increase the akka.framesize to larger value (default value is '10485760b') [1].

Otherwise, you could check the gc-log of job manager to see whether the gc overhead is too heavy for your job manager, consider to increase the memory for your job manager if so.

Key Default Description; jobmanager.heap.size "1024m" JVM heap size for the JobManager. taskmanager.heap.size "1024m" JVM heap size for the TaskManagers, which are the parallel workers of the system.

Best

Yun Tang

 


From: Juan Gentile [hidden email]
Sent: Wednesday, October 31, 2018 22:05
To: [hidden email]
Subject: 1.6 UI issues

 

Hello!

 

We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all.

 

Has anyone had this issue? Any clues as to why?

 

Thank you,

Juan


image001.png (180K) Download Attachment
image002.png (118K) Download Attachment
image002.png (118K) Download Attachment