Id depends on the the underlying resources you are planing for your jobs. I mean memory and processing will play a principal role about this answer. keep in mind you are capable to break down your job in a number of parallel tasks by environment or even by an specific taks within your pipeline.
I have scale my jobs up to around 1M per second without any trouble but keep in mind it depends as well on how complex look your pipeline likewise. Unlike the capabilities of parallel processing likely your pipeline applies enrichment and transformation reading data from an external ecosystem that can bring additional overload over your benchmark. Briefly, you should demo your scenario and get your own conclusions , but without any doubt Flink is capable of many stream processing records.
Add a few more points to the previous two answers. There is no clear answer to this question. In addition to resource issues, it depends on the size of the messages you are dealing with and the complexity of the logic. If you don't consider a lot of extra factors, look at the performance of the framework purely. You can go online to find some benchmark programs, test and cross-check. Here is a very popular benchmark for stream computing, you can refer to it for testing.