 Classic List Threaded 11 messages Open this post in threaded view
|

 Hi Flinker,I try to implement a quadratic distribution i.e. I would like to choose an element from a dataset with probability proportional to it's squared value. In Python this would look like this: s = numpy.cumsum(residual**2) x = numpy.random.rand() * s[-1]return residual[numpy.sum(x > s)] With Flink it is somewhat more complicated, I gave it a try: import util.Random val X = DataSource(XFile, CsvInputFormat[Float]) val Y = DataSource(YFile, CsvInputFormat[Float]) // take square of them val X_2 = X map { x => (x*x, x) } // calc sum of squares val X_sum = X_2 reduce { (x1, x2) => (x1._1 + x2._1, 0) } map { x => x._1 } // choose random value in our range val y = X_sum map { Random.nextFloat * _ } // make cummulative sum and find value we search for val center = X_2 map {     x => (0.0f, x._1, x._2) //sum, x^2, x} reduce {    (x1, x2) =>      if(x1._1 > y){// already found value we searched for       x1      } else {       if(x1._1 + x2._2 > y){// this is the value we search for          (x1._1 + x2._2, x2._2, x2._3)        } else {          (x1._1 + x2._2, x1._2, x2._3) // just go on with cummulative sum      }    } } map { _._3 } // we just need the initial value val output = center //map { x => println(x); x } val sink = output.write("/tmp/test", CsvOutputFormat[Float], "Center output") My problem here is now, I need to get the information stored in y into the reduce statement to gather the center value. Unfortunately I have no idea how to achieve that. If somebody knows a way I would be rather thankful. If someone would know a easier way to solve this problem too! Many thanks in advance!Cheers Max
Open this post in threaded view
|

## Re: Quadratic distribution

 Hi,for the Java API there are the so-called broadcast variables. Those can be used to set the output of an operation as an additional input of another operator. The feature is not available in the Scala API though? Or am I wrong here? I'm right now working on bringing the Scala API to feature parity with the Java API.AljoschaOn Wed, Aug 13, 2014 at 5:51 PM, Maximilian Alber wrote: Hi Flinker,I try to implement a quadratic distribution i.e. I would like to choose an element from a dataset with probability proportional to it's squared value. In Python this would look like this: s = numpy.cumsum(residual**2) x = numpy.random.rand() * s[-1]return residual[numpy.sum(x > s)] With Flink it is somewhat more complicated, I gave it a try: import util.Random val X = DataSource(XFile, CsvInputFormat[Float]) val Y = DataSource(YFile, CsvInputFormat[Float]) // take square of them val X_2 = X map { x => (x*x, x) } // calc sum of squares val X_sum = X_2 reduce { (x1, x2) => (x1._1 + x2._1, 0) } map { x => x._1 } // choose random value in our range val y = X_sum map { Random.nextFloat * _ } // make cummulative sum and find value we search for val center = X_2 map {     x => (0.0f, x._1, x._2) //sum, x^2, x} reduce {    (x1, x2) =>      if(x1._1 > y){// already found value we searched for       x1      } else {       if(x1._1 + x2._2 > y){// this is the value we search for          (x1._1 + x2._2, x2._2, x2._3)        } else {          (x1._1 + x2._2, x1._2, x2._3) // just go on with cummulative sum      }    } } map { _._3 } // we just need the initial value val output = center //map { x => println(x); x } val sink = output.write("/tmp/test", CsvOutputFormat[Float], "Center output") My problem here is now, I need to get the information stored in y into the reduce statement to gather the center value. Unfortunately I have no idea how to achieve that. If somebody knows a way I would be rather thankful. If someone would know a easier way to solve this problem too! Many thanks in advance!Cheers Max
Open this post in threaded view
|

## Re: Quadratic distribution

 Thanks for the quick reply.Ok, but is there a way to get the only element out of a DataSet into a variable?Mit freundlichen Grüßen, Max! On Thu, Aug 14, 2014 at 11:13 AM, Aljoscha Krettek wrote: Hi,for the Java API there are the so-called broadcast variables. Those can be used to set the output of an operation as an additional input of another operator. The feature is not available in the Scala API though? Or am I wrong here? I'm right now working on bringing the Scala API to feature parity with the Java API.Aljoscha On Wed, Aug 13, 2014 at 5:51 PM, Maximilian Alber wrote: Hi Flinker,I try to implement a quadratic distribution i.e. I would like to choose an element from a dataset with probability proportional to it's squared value. In Python this would look like this: s = numpy.cumsum(residual**2) x = numpy.random.rand() * s[-1]return residual[numpy.sum(x > s)] With Flink it is somewhat more complicated, I gave it a try: import util.Random val X = DataSource(XFile, CsvInputFormat[Float]) val Y = DataSource(YFile, CsvInputFormat[Float]) // take square of them val X_2 = X map { x => (x*x, x) } // calc sum of squares val X_sum = X_2 reduce { (x1, x2) => (x1._1 + x2._1, 0) } map { x => x._1 } // choose random value in our range val y = X_sum map { Random.nextFloat * _ } // make cummulative sum and find value we search for val center = X_2 map {     x => (0.0f, x._1, x._2) //sum, x^2, x} reduce {    (x1, x2) =>      if(x1._1 > y){// already found value we searched for       x1      } else {       if(x1._1 + x2._2 > y){// this is the value we search for          (x1._1 + x2._2, x2._2, x2._3)        } else {          (x1._1 + x2._2, x1._2, x2._3) // just go on with cummulative sum      }    } } map { _._3 } // we just need the initial value val output = center //map { x => println(x); x } val sink = output.write("/tmp/test", CsvOutputFormat[Float], "Center output") My problem here is now, I need to get the information stored in y into the reduce statement to gather the center value. Unfortunately I have no idea how to achieve that. If somebody knows a way I would be rather thankful. If someone would know a easier way to solve this problem too! Many thanks in advance!Cheers Max
Open this post in threaded view
|

## Re: Quadratic distribution

 No, unfortunately that's not possible right now because a DataSet only represents an Execution that is run when the program is executed. So while building  your program by chaining together operations the actual data is not yet available. I hope that helps but the whole thing can be a bit confusing. So just ask if you need clarification.Cheers,AljoschaOn Thu, Aug 14, 2014 at 3:01 PM, Maximilian Alber wrote: Thanks for the quick reply.Ok, but is there a way to get the only element out of a DataSet into a variable? Mit freundlichen Grüßen, Max! On Thu, Aug 14, 2014 at 11:13 AM, Aljoscha Krettek wrote: Hi,for the Java API there are the so-called broadcast variables. Those can be used to set the output of an operation as an additional input of another operator. The feature is not available in the Scala API though? Or am I wrong here? I'm right now working on bringing the Scala API to feature parity with the Java API.Aljoscha On Wed, Aug 13, 2014 at 5:51 PM, Maximilian Alber wrote: Hi Flinker,I try to implement a quadratic distribution i.e. I would like to choose an element from a dataset with probability proportional to it's squared value. In Python this would look like this: s = numpy.cumsum(residual**2) x = numpy.random.rand() * s[-1]return residual[numpy.sum(x > s)] With Flink it is somewhat more complicated, I gave it a try: import util.Random val X = DataSource(XFile, CsvInputFormat[Float]) val Y = DataSource(YFile, CsvInputFormat[Float]) // take square of them val X_2 = X map { x => (x*x, x) } // calc sum of squares val X_sum = X_2 reduce { (x1, x2) => (x1._1 + x2._1, 0) } map { x => x._1 } // choose random value in our range val y = X_sum map { Random.nextFloat * _ } // make cummulative sum and find value we search for val center = X_2 map {     x => (0.0f, x._1, x._2) //sum, x^2, x} reduce {    (x1, x2) =>      if(x1._1 > y){// already found value we searched for       x1      } else {       if(x1._1 + x2._2 > y){// this is the value we search for          (x1._1 + x2._2, x2._2, x2._3)        } else {          (x1._1 + x2._2, x1._2, x2._3) // just go on with cummulative sum      }    } } map { _._3 } // we just need the initial value val output = center //map { x => println(x); x } val sink = output.write("/tmp/test", CsvOutputFormat[Float], "Center output") My problem here is now, I need to get the information stored in y into the reduce statement to gather the center value. Unfortunately I have no idea how to achieve that. If somebody knows a way I would be rather thankful. If someone would know a easier way to solve this problem too! Many thanks in advance!Cheers Max
Open this post in threaded view
|

## Re: Quadratic distribution

Open this post in threaded view
|

## Re: Quadratic distribution

Open this post in threaded view
|

## Re: Quadratic distribution

Open this post in threaded view
|

## Re: Quadratic distribution

Open this post in threaded view
|