I am trying to find a way to ship files from aws s3 for a flink streaming job, I am running on AWS EMR. What i need to ship are following:
1) application jar
2) application property file
3) custom flink-conf.yaml
4) log4j application specific
Please let me know options.
I'm not sure if I understand your question correctly. You have jar and configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using those? Can you simply download those things (whole directory containing those) to the machine that will be starting the Flink job?
I have been doing the same process as you mentioned so far, now I am migrating the deployment process using AWS CDK and AWS Step Functions, kind of like the CICD process.
I added a download step of jar and configs (1, 2, 3 and 4) from S3 using command-runner.jar (AWS Step); it loaded that into one of the Master nodes (out of 3). In the next step when I launched Flink Job it would not find build because Job is launched in some other yarn node.
I was hoping just like Apache spark where whatever files we provide in --files are shipped to yarn (s3 to yarn workfirectory), Flink should also have a solution.
have you tried yarn-ship-files  or yarn-ship-archives ? Maybe, that's what you're looking for...
I tried to ship my property file. Example: -yarn.ship-files s3://applib/xx/xx/1.0-SNAPSHOT/application.properties \
6:21:37.163 [main] ERROR org.apache.flink.client.cli.CliFrontend - Invalid command line arguments.
org.apache.flink.client.cli.CliArgsException: Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files
at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:244) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:223) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_292]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_292]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) [hadoop-common-2.10.0-amzn-0.jar:?]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) [flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) [flink-dist_2.11-1.11.0.jar:1.11.0]
Caused by: java.io.FileNotFoundException: JAR file does not exist: -yarn.ship-files
at org.apache.flink.client.cli.CliFrontend.getJarFile(CliFrontend.java:740) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFrontend.java:717) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
at org.apache.flink.client.cli.CliFrontend.getPackagedProgram(CliFrontend.java:242) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
... 8 more
Could not build the program from JAR file: JAR file does not exist: -yarn.ship-files
Currently, Flink only supports shipping files from the local machine where job is submitted.
There are tickets  tracking the efforts that shipping files from remote paths, e.g., http, hdfs, etc. Once the efforts are done, adding s3 as an additional supported schema should be straightforward.
Unfortunately, these efforts are still in progress, and are more or less staled recently.
On Thu, May 27, 2021 at 12:23 AM Vijayendra Yadav <[hidden email]> wrote:
Thank You Xintong, I will look for these updates in the near future.
On Wed, May 26, 2021 at 6:40 PM Xintong Song <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|