google cloud dataflow - How to "rate-limit" a PCollection in Apache Beam? -
i have seems common problem can't figure out beam recommended solution is.
i have stream of raw events , i'm looking 2 separate events fulfill condition within sliding window (of 60 minutes) "trigger" alert.
that easy enough slidingwindows, problem due sliding nature, alert potentially in multiple windows. how pcollection outputs such alert once (within timeframe/cooldown duration)?
i first thought recent stateful processing feature solution, realized works within window. side-inputs. seems me need way of breaking windows , treat alert "firings" in 1 (possible session-)window. docs don't mention kind of way reassign elements new windows
i ended re-windowing strategy, similar @kenn suggested.
so have alerts sliding windowed collection re-window session windows
.apply(window.remerge()) .apply(window.into(sessions.withgapduration(duration.standardhours(1)))) in windowed collection, can groupby , alerts of session, within can apply cooldown logic of emitting alert every hour.
Comments
Post a Comment