ℹAbout Str:::lab Studio
Str:::lab Studio is a browser-based SQL Studio that connects to any Flink SQL Gateway. It provides a complete development, monitoring, and administration experience — without the overhead of a full platform deployment. Works with local Docker, Kubernetes, and any self-hosted Flink cluster.
What it is
A zero-dependency single-file HTML application served by nginx. All SQL execution is driven by the Flink SQL Gateway REST API. Results stream live into the browser. Session state is managed client-side in localStorage.
Who it's for
Streaming data engineers and platform teams who want a proper IDE experience for Apache Flink SQL — query editing, live result streaming, real-time job monitoring, checkpoint tracking, and professional reporting — all from a browser.
Open source
Created by Nestor A. A — open source project under the MIT license. Community contributions and issue reports welcome at github.com/coded-streams/strlabstudio.
Kube Operator
The Str:::lab Studio Kube Operator manages Studio deployments as first-class Kubernetes CRDs. Install via Helm from strlabstudio-operator.
⚡Quickstart
git clone https://github.com/coded-streams/strlabstudio
cd strlabstudio
docker compose up -d
open http://localhost:3030
Starts: JobManager, TaskManager(s), SQL Gateway, CORS proxy, and the Studio nginx container. On first load, a Tips & Concepts modal walks you through key features.
python3 -m http.server 8080
# Open studio/index.html — use Direct Gateway mode
helm repo add strlabstudio https://coded-streams.github.io/strlabstudio-operator/charts
helm install strlabstudio-operator strlabstudio/strlabstudio-operator \
--namespace flinksql-system --create-namespace
kubectl apply -f - <<EOF
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: my-studio
namespace: flink
spec:
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
EOF
Connection Modes
| Mode | Best For | Authentication |
|---|---|---|
| Proxy Mode | Local Docker or Kubernetes clusters. Routes through nginx reverse proxy — zero config required. | None |
| Direct Gateway | Remote clusters, Confluent Cloud, Ververica, AWS Managed Flink. Enter gateway URL directly. | Bearer token, Basic Auth, or None |
| Admin Session 🛡 | Platform operators and administrators. Full cluster job visibility, advanced reports, cross-session oversight. | Admin passcode (default: admin1234) |
🏗Architecture
Str:::lab Studio is a static HTML file served by nginx. All SQL execution happens via the Flink SQL Gateway REST API. The browser manages all session state client-side.
State & Session Management
| State | Storage | Survives Reload? |
|---|---|---|
| Editor tabs & SQL content | localStorage (auto-saved) | ✅ Yes |
| Query history | localStorage (auto-saved) | ✅ Yes |
| Theme preference | localStorage | ✅ Yes |
| Admin passcode | localStorage (encrypted default) | ✅ Yes |
| SQL Gateway session handle | In-memory only | ❌ Reconnect required |
| Result rows | In-memory only | ❌ Re-run query |
| Job–session mapping | In-memory (session lifetime) | ❌ Lost on reload |
| Full workspace | JSON file (export/import) | ✅ Via import |
✦All Features
Editor
Multi-Tab Editor
Write SQL across unlimited named tabs. Each tab has isolated editor content, results, logs, and performance metrics. Double-click to rename.
Run Selection
Highlight any portion of SQL and click ▶ Selection to run only that fragment. Essential for multi-statement tabs.
Format & Explain
Auto-format normalises whitespace. EXPLAIN Plan shows Flink's logical and physical execution plan before job submission.
Duplicate Submission Guard
The IDE tracks running INSERT INTO statements. If you attempt to submit the same pipeline twice, execution is blocked with a warning. Check the Job Graph before resubmitting.
Sessions
Session Persistence
Sessions never expire from the IDE side due to inactivity — the heartbeat keeps them alive. Only explicit disconnect or connection loss ends a session.
Load Existing Sessions
After testing the connection, the IDE fetches all existing sessions from the gateway and presents them in a dropdown — no UUID copy-paste needed.
Session-Scoped Jobs
Each regular session sees only the jobs it submitted. This prevents clutter from other users' pipelines. Admin sessions see all jobs across all sessions.
Job Tagging
Every INSERT INTO pipeline automatically tags the Flink job name with the session label via pipeline.name, making jobs traceable in the Flink UI and Str:::lab Studio.
Job Graph
Live DAG
Real-time directed acyclic graph of any running Flink job. Nodes show operator type, parallelism, and record counts updating live.
Node Detail Modal
Double-click any node for full metrics: status, records in/out/s, backpressure, JVM heap, subtask table. Includes a live throughput streaming chart.
Cancel Job — with Confirmation
Cancel button shows a confirmation dialog ("Are you sure?") before sending the cancel signal to the JobManager. Cannot be undone after confirmation.
Vertex Timeline
Timeline view shows each operator's state transitions (CREATED → RUNNING → FINISHED) as a Gantt-style chart. Click any bar for subtask status detail.
Performance Monitoring
Per-Query Timing
Every statement timed from submission to first result. Bar charts show relative query durations with row count and status.
Checkpoint Health
Checkpoint tab shows total/completed/failed counts, last checkpoint duration and state size, and a history sparkline. Configures with one snippet.
Cluster Resources
Cluster tab shows CPU cores, physical memory, JVM heap, Flink version, task manager count, and slot utilisation gauges from the JobManager REST API.
Live Throughput
Throughput tab shows records in/out per second as animated sparklines with min/max labels. Toggle ▶ Live for auto-refresh every 2 seconds.
Reports
Session Report (Standard)
Generate a PDF report of your session: queries executed, timing history, job list, cluster info, and checkpoint metrics. Includes the session ID and timestamp.
Admin Report — Technical
Deep technical report: all active sessions, per-session job attribution, vertex-level metrics, JVM heap and CPU stats, checkpoint durations. For engineering teams.
Admin Report — Business / Management
Self-explaining metrics report with plain-language annotations. Focuses on resource utilisation, job health percentages, and throughput summaries. No SQL jargon.
Job Graph Report
Per-job PDF report from the Job Graph tab — captures the DAG topology, vertex states, record counts, backpressure levels, and checkpoint history for a single job.
Themes & UX
Project Manager
Save, load, run, and delete named Str:::lab projects. Each project stores all editor tabs, SET configuration, catalog context, and metadata. Export to JSON for sharing or backup. Tracks storage usage per project.
UDF Manager
Browse all registered functions, register JAR-based or Python UDFs, create SQL UDFs without writing a single line of Java, and insert professional UDF templates directly into the editor.
Tips & Concepts Modal
On first connect, a paginated modal walks through IDE features, Flink architecture, window types, state, connectors, and performance tips — like IntelliJ's "Tip of the Day".
6 Themes
Dark (default), Light, Monokai, Dracula, Nord, Contrast. Preference persisted across sessions.
🛡Admin Session
The Admin Session mode provides a privileged view of the entire Flink cluster — across all users and sessions. Access it from the connect screen by selecting the Admin mode and entering the admin passcode.
Capabilities
- Full cluster job visibility — see ALL jobs running on the cluster, not just the ones submitted in this session.
- Per-session activity — the Sessions Activity modal shows each registered session's submitted jobs, query count, and running job status.
- Cross-session job attribution — jobs are tagged with their session name (visible in the Job Graph select dropdown).
- Cancel any job — admins can cancel any running job on the cluster, with a confirmation dialog.
- Admin-grade PDF reports — choose between a Technical Report (engineering detail) or Business/Management Report (self-explanatory metrics). Both carry the admin's name and session ID.
- Change admin passcode — click the 🛡 ADMIN badge in the topbar to open Admin Settings where you can update your display name and change the passcode.
admin1234. Change it immediately after first login via the Admin Settings modal (click the 🛡 badge in the topbar). The new passcode is persisted in browser localStorage.
Session isolation model
| Feature | Regular Session | Admin Session |
|---|---|---|
| Jobs visible in Job Graph | Only jobs submitted in this session | All jobs on the cluster |
| Cancel job | Only own jobs | Any job on the cluster |
| Session Activity modal | Not available | Full cross-session breakdown |
| Report type | Standard session report | Technical or Business/Management report |
| Job tagging | Tagged with session name | Tagged as admin session |
💡Tips & Concepts Modal
On first connection, Str:::lab Studio displays a paginated Tips & Concepts modal — inspired by JetBrains "Tip of the Day". It covers 16 topics across 6 categories:
| Category | Topics Covered |
|---|---|
| Getting Started | Running your first query, keyboard shortcuts |
| IDE Tips | Multi-tab workflows, run selection, workspace export, Performance tab, Job Graph |
| Flink Architecture | JobManager + TaskManagers, slots, dataflow graphs, SQL Gateway |
| Flink Concepts | Watermarks, Tumble/Hop/Session windows, state, checkpointing, exactly-once, connectors |
| Performance Tips | MiniBatch aggregation, avoiding duplicate pipeline submissions |
| Admin | Admin session capabilities and cluster visibility |
Users can navigate tips with Prev/Next buttons or click category tags to jump directly to that section. A "Don't show on startup" checkbox suppresses the modal on future connections. The modal can be reopened at any time via the topbar — the ℹ About button also includes a tips launcher.
✂Snippets Library
30+ built-in SQL templates accessible from the toolbar. Click any snippet to insert into the active editor tab.
| Tag | Snippets |
|---|---|
| SETUP | Common Flink SQL Mistakes, Use Default Catalog, Create In-Memory Catalog, Use Hive Catalog |
| CONFIG | ⚡ Recommended Streaming Config (all-in-one), Set Runtime Mode, Set Parallelism, Set State TTL |
| DDL | Show Catalogs, Show Databases, Show Tables, Show Full Tables, Describe Table, Show Create Table, Check If Table Exists, Show Jobs |
| WINDOW | Tumble Window, Hop Window, Session Window, Cumulate Window |
| CONNECTOR | Kafka Source, Datagen Source, Elasticsearch Sink, MinIO / S3 Sink, Print Sink |
| PERF | Enable MiniBatch, Local-Global Aggregation, Idle Source Timeout |
| PATTERN | Top-N per Group (ROW_NUMBER), Deduplication, Temporal Join, Interval Join |
🔌Connector JARs
NoClassDefFoundError
Connector JAR versions must match your Flink runtime exactly. Every INSERT job will fail with a classloading error if the version is wrong.
| Flink Version | Kafka Connector JAR |
|---|---|
1.20.x | flink-sql-connector-kafka-3.3.0-1.20.jar |
1.19.x ← recommended | flink-sql-connector-kafka-3.3.0-1.19.jar |
1.18.x | flink-sql-connector-kafka-3.3.0-1.18.jar |
1.17.x | flink-sql-connector-kafka-3.2.0-1.17.jar |
1.16.x | flink-sql-connector-kafka-3.1.0-1.16.jar |
.\scripts\download-connectors.ps1
# Auto-detects Flink version from running container.
# Override: .\scripts\download-connectors.ps1 -FlinkVersion 1.19.1
./scripts/download-connectors.sh
# Override: ./scripts/download-connectors.sh --flink-version 1.19.1
FROM flink:1.19.1-scala_2.12-java11
USER root
RUN apt-get update && apt-get install -y libgomp1 && rm -rf /var/lib/apt/lists/*
COPY ./connectors/*.jar /opt/flink/lib/
USER flink
🔀Full Pipeline Demo
A complete multi-stage streaming pipeline built entirely in Str:::lab Studio: datagen source → Kafka → enrichment with branching → windowed aggregation → live preview.
Session Config — Snippets → ⚡ Recommended Streaming Config
SET 'execution.runtime-mode' = 'streaming';
SET 'parallelism.default' = '1';
SET 'pipeline.operator-chaining' = 'false';
SET 'execution.checkpointing.interval' = '10000';
DDL Tab — Create all source and sink tables
CREATE TEMPORARY TABLE trade_gen (
trade_id STRING, symbol STRING, side STRING,
quantity DOUBLE, price DOUBLE, exchange STRING,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH ('connector' = 'datagen', 'rows-per-second' = '5');
Producer Tab — Start datagen → Kafka (Job 1)
INSERT INTO trade_events_kafka
SELECT trade_id, symbol, side, quantity, price, exchange, event_time
FROM trade_gen;
Job Graph tab auto-opens. Studio tags the Flink job as [session-name] Str:::lab Studio Pipeline.
Enrichment Tab — Branching pipeline (Job 2)
EXECUTE STATEMENT SET BEGIN
INSERT INTO enriched_trades_kafka
SELECT trade_id, symbol, side,
ROUND(quantity * price, 2) AS trade_value,
CASE WHEN quantity * price >= 100000 THEN 'CRITICAL'
WHEN quantity * price >= 10000 THEN 'HIGH'
ELSE 'LOW' END AS risk_level, event_time
FROM trade_events_source;
INSERT INTO branch_alerts_sink
SELECT trade_id, symbol, ROUND(quantity * price, 2) AS trade_value, event_time
FROM trade_events_source WHERE quantity * price >= 10000;
END;
Aggregation Tab — Windowed summary (Job 3)
INSERT INTO trade_summary_kafka
SELECT window_start, window_end, symbol, side,
COUNT(*) AS trade_count,
ROUND(SUM(quantity * price), 2) AS total_volume
FROM TABLE(TUMBLE(TABLE enriched_trades_source,
DESCRIPTOR(event_time), INTERVAL '10' SECOND))
GROUP BY window_start, window_end, symbol, side;
Preview Tab — Live SELECT from Kafka (rows stream into Results)
SELECT trade_id, symbol, side, trade_value, risk_level, event_time
FROM enriched_trades_view;
scan.startup.mode = 'earliest-offset' and a consumer group ID for SELECT queries.
☸️Str:::lab Studio Kube Operator
The Str:::lab Studio Kube Operator manages Str:::lab Studio deployments as first-class Kubernetes CRDs. Deploy and lifecycle-manage Studio instances with a single kubectl apply. Built with Kopf (Python).
Repository: github.com/coded-streams/strlabstudio-operator
CRD Reference — StrlabStudio
Short name: kubectl get fss · Group: codedstreams.io/v1alpha1
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
spec.gateway.host | string | ✅ | — | Flink SQL Gateway hostname |
spec.gateway.port | integer | — | 8083 | SQL Gateway port |
spec.jobmanager.host | string | — | localhost | JobManager REST hostname |
spec.jobmanager.port | integer | — | 8081 | JobManager REST port |
spec.image | string | — | codedstreams/strlabstudio:latest | IDE Docker image |
spec.replicas | integer | — | 1 | Pod count (1–10) |
spec.service.type | string | — | ClusterIP | ClusterIP, NodePort, or LoadBalancer |
spec.service.port | integer | — | 80 | Service port |
spec.service.nodePort | integer | — | — | NodePort (30000–32767) |
spec.resources.requests | object | — | {cpu:50m, memory:64Mi} | Pod resource requests |
spec.resources.limits | object | — | {cpu:200m, memory:128Mi} | Pod resource limits |
helm repo add strlabstudio https://coded-streams.github.io/strlabstudio-operator/charts
helm repo update
helm install strlabstudio-operator strlabstudio/strlabstudio-operator \
--namespace flinksql-system --create-namespace
kubectl get pods -n flinksql-system
kubectl get crd strlabstudio.codedstreams.io
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: dev-studio
namespace: flink
spec:
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: dev-studio
namespace: flink
spec:
gateway:
host: flink-sql-gateway
jobmanager:
host: flink-jobmanager
service:
type: NodePort
nodePort: 30303
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: prod-studio
namespace: flink
spec:
image: codedstreams/strlabstudio:v1.0.22
replicas: 2
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
service:
type: LoadBalancer
port: 80
resources:
requests: {cpu: "100m", memory: "128Mi"}
limits: {cpu: "500m", memory: "256Mi"}
🚀Deployment Options
Docker Compose
Full stack in one command — Studio, Flink cluster, SQL Gateway, Kafka.
docker compose up -d
Standalone HTML
Serve studio/index.html from any static server. Use Direct Gateway mode.
python3 -m http.server 8080
Str:::lab Studio Kube Operator
CRD-based deployment via Helm. GitOps-friendly. No PVC required.
helm install strlabstudio-operator \
strlabstudio/strlabstudio-operator
Cloud Flink (Remote)
Connect to Confluent Cloud, Ververica Cloud, or AWS Managed Flink with Direct Gateway + bearer token auth.
⬡Project Manager
The Project Manager lets you save, load, run, and organise your Str:::lab work as named projects.
Access it from the topbar via the ⬡ Projects button.
Projects are stored in browser localStorage and can be exported and imported as JSON files.
What a project stores
| Field | Description |
|---|---|
| SQL tabs | All editor tab titles and SQL content, snapshotted at save time |
| SET config | All SET ... statements from query history — runtime mode, checkpointing, mini-batch, etc. |
| Catalog / Database | Active USE CATALOG and USE context at save time |
| Metadata | Name, description, created/modified timestamps, run count, version tag |
Three tabs
Projects tab
Lists all saved projects with size, tab count, last modified, and run count. Actions per project: Load (restore to editor), ▶ Run (load + execute), ⬇ Export (download JSON), 🗑 Delete. Search bar filters by name or description.
New Project tab
Create a named project with optional description. Checkboxes control whether to snapshot current editor tabs and current SET config from history. The active project is shown in the topbar.
Storage tab
Shows a progress bar of localStorage usage (5 MB limit). Per-project storage breakdown. Buttons to export all, import from file, or clear all projects. Export before clearing.
Workflow
- Write your SQL across tabs. Run
SETstatements to configure the session. - Open ⬡ Projects → + New Project. Name it, tick snapshot tabs + config, click ⬡ Create Project.
- The project appears in the list with an ACTIVE badge. The topbar shows the project name.
- Continue working. Click 💾 Save Current State (modal footer) to update the active project.
- To restore: open ⬡ Projects → Load on any project. This replaces the editor tabs.
- To share: click ⬇ on any project row to download a
.jsonfile. Import on any other Studio instance.
⨍UDF Manager
The UDF Manager is a dedicated feature for the complete User-Defined Function
lifecycle — from building to uploading to registering to using — without leaving Str:::lab Studio.
Access it from the topbar via the ⨍ UDFs button.
It has five tabs.
What are UDFs?
UDFs extend Flink SQL with custom logic that built-in functions cannot express. Flink supports four types:
Scalar (UDF)
One row in → one value out. Data masking, classification, format conversion. Can be written in pure SQL (no JAR), Java, Scala, or Python.
Table (UDTF)
One row in → many rows out. Used with CROSS JOIN LATERAL TABLE(...) to explode arrays or parse delimited fields.
Aggregate (UDAGG)
Many rows in → one value out. Maintains an accumulator per group. Supports retraction for streaming tables. Example: VWAP.
Async Table
Non-blocking external enrichment. Issues many concurrent requests without blocking the pipeline. Ideal for REST API or HBase lookups.
📚 Library Tab — Browse & Quick-Insert
Click ⟳ Refresh to run SHOW FUNCTIONS and SHOW USER FUNCTIONS
against the active session. Results split into User-Defined (blue) and Built-in.
Use the search bar and type filter. Click any function name to insert it at the cursor
in the editor with parentheses ready.
⬆ Upload JAR Tab — Deploy Directly to the Cluster
Str:::lab Studio can upload a JAR file directly to the Flink JobManager via the same REST endpoint
that the Flink Web UI uses: POST /jars/upload. No SSH, no manual file copying.
Once uploaded, the JAR is available to all TaskManagers immediately.
- Drag-and-drop your
.jarfile onto the drop zone, or click to browse. - Click ⬆ Upload JAR to Flink Cluster — a live progress bar tracks the upload.
- The JARs on cluster list refreshes automatically. Click Use → on any entry to pre-fill the Register tab.
- Optionally delete JARs from the cluster using the Delete button on each row.
8081) must be proxied at
/jobmanager-api/ in your nginx config. If you see a 404 error, check that the proxy is
configured. Self-hosted Docker clusters using the included docker-compose.yml have this
configured automatically.
⬡ Maven / Gradle Tab — Dependency & Build Config
Maven and Gradle resolve your UDF's dependencies and package everything into a single deployable JAR. This tab generates the correct build configuration and commands so you never have to guess at dependency scopes or version strings.
<dependencies>
<!--
Flink APIs — scope="provided"
These JARs are already on the cluster. DO NOT bundle them.
"provided" means: compile against them, but exclude from the output JAR.
-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Your own third-party deps — these ARE bundled (default scope) -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>32.1.3-jre</version>
</dependency>
</dependencies>
<!-- Maven Shade Plugin: creates the shaded (fat) JAR -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.5.1</version>
<executions>
<execution>
<phase>package</phase>
<goals><goal>shade</goal></goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>shaded</shadedClassifierName>
</configuration>
</execution>
</executions>
</plugin>
<!-- Build command: -->
<!-- mvn clean package -DskipTests -->
<!-- Output: target/my-udfs-1.0.0-shaded.jar -->
plugins {
id 'java'
// Shadow plugin: creates the fat JAR
id 'com.github.johnrengelman.shadow' version '8.1.1'
}
dependencies {
/*
* compileOnly = Maven "provided"
* These JARs are already on the Flink cluster.
* DO NOT use 'implementation' for Flink APIs — it bundles them
* into your JAR and causes ClassLoader conflicts.
*/
compileOnly "org.apache.flink:flink-table-api-java:${flinkVersion}"
compileOnly "org.apache.flink:flink-table-common:${flinkVersion}"
// Your own third-party deps — use 'implementation' (bundled)
implementation 'com.google.guava:guava:32.1.3-jre'
}
shadowJar {
archiveClassifier = 'shaded'
mergeServiceFiles()
exclude 'META-INF/*.SF', 'META-INF/*.DSA', 'META-INF/*.RSA'
}
// Build command:
// ./gradlew shadowJar
// Output: build/libs/my-udfs-1.0.0-shaded.jar
| Concept | Explanation |
|---|---|
provided / compileOnly | Flink's own JARs (flink-table-api-java, flink-streaming-java, etc.) exist on every TaskManager. Mark them provided so they compile your code but are not bundled into the output JAR. Bundling them causes ClassCastException at runtime. |
| Shaded / fat JAR | Your UDF JAR must contain all third-party dependencies (e.g. Guava, Jackson, HTTP client) inside a single file. Maven Shade Plugin and Gradle Shadow Plugin achieve this. The cluster just needs one file. |
| Relocation | If two JARs on the classpath contain the same class (dependency conflict), use Maven Shade <relocation> or Gradle Shadow relocate() to rename one package. Common with Guava and Jackson. |
| Flink version match | The flink-table-api-java version in your pom.xml must match the cluster's Flink version exactly. A mismatch causes AbstractMethodError or silent wrong behaviour. |
| Python UDFs | Python UDFs don't use Maven or Gradle. They require PyFlink installed on all TaskManagers (pip install apache-flink==<version>). Register them with LANGUAGE PYTHON pointing to a module path. |
pom.xml or
build.gradle — fill in your group ID, artifact ID, Flink version, Java version,
and any extra dependencies, then copy the config and the build commands. The tab explains
exactly which file to upload after building.
+ Register Tab — Activate a JAR Function
After uploading a JAR (via the Upload JAR tab or manually), use this tab to write the
CREATE FUNCTION statement that makes the class available as a SQL function.
- Enter the function name (what you will call in SQL).
- Enter the fully-qualified class path (e.g.
com.yourcompany.udf.MaskEmail). - Choose Language: Java, Scala, or Python.
- Choose Scope: Temporary (session only), Temporary System (all catalogs), or Permanent (catalog-persisted).
- Click ⚡ Execute Registration to send the
CREATE FUNCTIONto the gateway, or Insert into Editor to review first.
✎ SQL UDF Tab — No JAR, No Java
Write a UDF entirely in Flink SQL. No JAR, no build step. Supported in Flink 1.17+.
CREATE TEMPORARY FUNCTION classify_risk(score DOUBLE)
RETURNS STRING LANGUAGE SQL AS $$
CASE
WHEN score >= 0.8 THEN 'CRITICAL'
WHEN score >= 0.5 THEN 'WARNING'
WHEN score >= 0.2 THEN 'LOW'
ELSE 'NORMAL'
END
$$;
SELECT event_id, score, classify_risk(score) AS tier
FROM risk_events;
CREATE TEMPORARY FUNCTION fmt_bytes(b BIGINT)
RETURNS STRING LANGUAGE SQL AS $$
CASE
WHEN b >= 1073741824 THEN CONCAT(CAST(b/1073741824 AS STRING), ' GB')
WHEN b >= 1048576 THEN CONCAT(CAST(b/1048576 AS STRING), ' MB')
WHEN b >= 1024 THEN CONCAT(CAST(b/1024 AS STRING), ' KB')
ELSE CONCAT(CAST(b AS STRING), ' B')
END
$$;
SELECT node_id, fmt_bytes(bytes_transferred) AS readable_size
FROM network_events;
CREATE TEMPORARY FUNCTION safe_region(region STRING, fallback STRING)
RETURNS STRING LANGUAGE SQL AS $$
CASE
WHEN region IS NULL OR TRIM(region) = '' THEN fallback
ELSE UPPER(TRIM(region))
END
$$;
SELECT event_id, safe_region(region, 'UNKNOWN') AS normalised_region
FROM events;
⊞ Templates Tab — Production-Ready Examples
| Group | Templates included |
|---|---|
| Scalar Functions | SQL scalar (no JAR), Java scalar with open/close lifecycle, Python scalar with PyFlink decorator |
| Table Functions (UDTF) | Java SplitTags CSV exploder, SQL JSON array UNNEST pattern |
| Aggregate Functions (UDAGG) | Volume-weighted average price (VWAP) with full accumulator and retraction support |
| Async & Lookup Functions | Non-blocking async HTTP enrichment UDF, temporal lookup join with JDBC caching |
| Utility & Best Practices | SHOW FUNCTIONS reference, temporary vs permanent scope guide, full best-practices template |
Complete JAR UDF Workflow
ScalarFunction, TableFunction, or AggregateFunction.pom.xml or build.gradle.mvn clean package -DskipTests or ./gradlew shadowJar.Demo — Fraud Detection Pipeline with SQL UDFs
The UDF Manager ships with a complete demo SQL file: udf-fraud-detection-demo.sql. It demonstrates all five tabs using a real-time payment fraud detection scenario — 50 transactions per second, 7 SQL UDFs, 4 running Flink jobs, 5 Kafka topics.
| Tab | Content | UDFs used |
|---|---|---|
| Tab 1 — Setup | Session config: runtime mode, checkpointing, mini-batch | — |
| Tab 2 — Register UDFs | 7 CREATE TEMPORARY FUNCTION statements (SQL, no JAR) | All 7 defined here |
| Tab 3 — Tables | datagen source + 4 Kafka sinks + 1 Kafka source (cleansed) | — |
| Tab 4 — Verify | SHOW USER FUNCTIONS + live datagen SELECT with all UDFs | All 7 previewed |
| Tab 5 — Pipelines 1 & 2 | PII masking → Kafka; risk classification → Kafka | mask_card, mask_email, normalize_currency, classify_risk, amount_band, velocity_flag |
| Tab 6 — Pipelines 3 & 4 | TUMBLE(1 min) fraud KPI window; SLA scoring pipeline | classify_risk, amount_band, sla_tier, velocity_flag |
| Tab 7 — Live Monitor | All 7 UDFs in one SELECT — enable ✦ Colour Describe for row coloring | All 7 applied |
-- Register all 7 SQL UDFs — no JAR, no Java required
CREATE TEMPORARY FUNCTION classify_risk(score DOUBLE)
RETURNS STRING LANGUAGE SQL AS $$
CASE WHEN score >= 0.80 THEN 'CRITICAL'
WHEN score >= 0.55 THEN 'HIGH'
WHEN score >= 0.30 THEN 'MEDIUM'
ELSE 'LOW' END $$;
CREATE TEMPORARY FUNCTION mask_card(card STRING)
RETURNS STRING LANGUAGE SQL AS $$
CASE WHEN card IS NULL OR CHAR_LENGTH(card) < 8 THEN '****-****-****-****'
ELSE CONCAT(SUBSTRING(card,1,4),'-****-****-',
SUBSTRING(card,CHAR_LENGTH(card)-3,4)) END $$;
CREATE TEMPORARY FUNCTION mask_email(email STRING)
RETURNS STRING LANGUAGE SQL AS $$
CASE WHEN email IS NULL OR POSITION('@' IN email) <= 2 THEN '**@***.***'
ELSE CONCAT(SUBSTRING(email,1,2),'***',
SUBSTRING(email,POSITION('@' IN email))) END $$;
CREATE TEMPORARY FUNCTION amount_band(amt DOUBLE)
RETURNS STRING LANGUAGE SQL AS $$
CASE WHEN amt >= 10000 THEN 'WHALE'
WHEN amt >= 2500 THEN 'LARGE'
WHEN amt >= 500 THEN 'MEDIUM'
WHEN amt >= 50 THEN 'SMALL'
ELSE 'MICRO' END $$;
-- Verify all 7 are registered:
SHOW USER FUNCTIONS;
-- Pipeline 1: Cleanse raw events — mask PII, normalise fields
-- UDFs: mask_card(), mask_email(), normalize_currency()
INSERT INTO cleansed_sink
SELECT
txn_id,
customer_id,
mask_card(card_number) AS card_masked,
mask_email(email) AS email_masked,
ROUND(amount_usd, 2) AS amount_usd,
normalize_currency(currency_raw) AS currency,
merchant_id,
CASE merchant_country
WHEN 0 THEN 'US' WHEN 1 THEN 'EU'
WHEN 2 THEN 'UK' WHEN 3 THEN 'APAC'
WHEN 4 THEN 'LATAM' ELSE 'MEA'
END AS merchant_country,
CASE channel_raw
WHEN 0 THEN 'ONLINE' WHEN 1 THEN 'MOBILE'
WHEN 2 THEN 'POS' ELSE 'ATM'
END AS channel,
ROUND(risk_score, 4) AS risk_score,
latency_ms,
event_ts
FROM raw_transactions;
-- Pipeline 3: TUMBLE(1 min) fraud KPI aggregation
-- UDFs used inside COUNT(CASE ...) aggregations
INSERT INTO fraud_kpi_sink
SELECT
TUMBLE_START(event_ts, INTERVAL '1' MINUTE) AS window_start,
TUMBLE_END(event_ts, INTERVAL '1' MINUTE) AS window_end,
merchant_country,
channel,
COUNT(*) AS total_txns,
ROUND(SUM(amount_usd), 2) AS total_volume_usd,
ROUND(AVG(risk_score), 4) AS avg_risk_score,
-- UDF inside aggregation: count CRITICAL transactions per window
COUNT(CASE WHEN classify_risk(risk_score) = 'CRITICAL' THEN 1 END) AS critical_count,
COUNT(CASE WHEN classify_risk(risk_score) = 'HIGH' THEN 1 END) AS high_count,
-- UDF inside aggregation: count whale transactions per window
COUNT(CASE WHEN amount_band(amount_usd) = 'WHALE' THEN 1 END) AS whale_count,
ROUND(
CAST(COUNT(CASE WHEN classify_risk(risk_score) IN ('CRITICAL','HIGH') THEN 1 END) AS DOUBLE)
/ NULLIF(COUNT(*), 0) * 100.0
, 2) AS fraud_rate_pct
FROM cleansed_source
GROUP BY TUMBLE(event_ts, INTERVAL '1' MINUTE), merchant_country, channel;
-- Tab 7: All 7 UDFs in a single live SELECT
-- Enable "✦ Colour Describe" — risk_tier and sla_tier drive row coloring:
-- 🔴 Red = CRITICAL 🟡 Yellow = HIGH/MEDIUM 🟢 Green = LOW
SELECT
txn_id,
customer_id,
card_masked,
ROUND(amount_usd, 2) AS amount_usd,
currency,
merchant_country,
channel,
ROUND(risk_score, 3) AS risk_score,
classify_risk(risk_score) AS risk_tier,
amount_band(amount_usd) AS amount_band,
velocity_flag(latency_ms) AS velocity,
sla_tier(latency_ms, risk_score) AS sla_tier,
latency_ms,
event_ts
FROM cleansed_source;
📝Changelog
- Brand rename — product renamed from FlinkSQL Studio to Str:::lab Studio across all UI, docs, localStorage keys (
strlabstudio_*), export filenames, and Kubernetes CRDs; Apache Flink trademark attribution added to footer - PDF report heading fixed —
window.openinterceptor rewrites any residual brand references in the generated PDF before printing, including the report H1 heading and all inline text - LinkedIn banner — 1200×627px SVG brand image generated for social media (Str:::lab Studio wordmark with feature pills and mock terminal)
- ⬡ Project Manager — create, load, run, save, and delete named Str:::lab projects; stores SQL tabs, SET config, catalog context, run count; export/import as JSON; per-project storage breakdown; name uniqueness enforced on create and import
- Tooltips on all Project Manager buttons, checkboxes, and tab headers
- JAR Upload fix — FormData now sends
application/x-java-archiveContent-Type (fixes Flink HTTP 500); configurable JobManager URL override field for non-standard ports - Maven / Gradle tab — generates complete
pom.xml/build.gradlewith correct provided/compileOnly scopes, shade/shadow plugin, and annotated build commands
- ⬡ Project Manager — create, load, run, save, and delete Str:::lab projects; each project stores SQL tabs, SET config, catalog/database context, and metadata; export/import as JSON; storage usage tracking with per-project breakdown
- JAR Upload fix — FormData now sends
application/x-java-archiveContent-Type on the jarfile part (fixes HTTP 500 from Flink JobManager); configurable JobManager URL override field added for direct/remote clusters on non-standard ports - _getJmBase() rewrite — correctly resolves JobManager URL in proxy mode (nginx swap), direct mode (port swap to 8081), and remote mode (user override field); no longer crashes with red error on tab open when not connected
- Maven / Gradle tab — generates complete
pom.xml/build.gradlewith correct provided/compileOnly scopes, shade/shadow plugin, and build commands; explains why Flink JARs must not be bundled
- Admin Session — privileged connect mode with full cluster job visibility, admin badge in topbar, session activity modal, technical and business/management PDF reports
- Admin passcode management — default
admin1234, changeable via Admin Settings modal after connecting - Job scoping — regular sessions only see their own jobs; admin sees all cluster jobs with session attribution
- Duplicate submission guard — INSERT INTO / pipeline queries blocked if same statement already running
- Job tagging —
pipeline.nameSET injected automatically before INSERT statements with session label prefix - Cancel job confirmation modal — explicit "Are you sure?" dialog before sending cancel signal to JobManager
- Tips & Concepts modal — 16 tips across 6 categories: IDE features, Flink architecture, window types, state, connectors, performance, admin
- Load existing sessions — connect screen auto-fetches sessions from gateway after Test Connection for dropdown selection
- Session persistence — heartbeat keeps sessions alive indefinitely; no idle expiry unless explicitly disconnected
- Results display fixed —
NOT_READY/EOShandling; Kafka SELECT rows reliably stream into Results tab - Mouse wheel zoom on Job Graph (20%–300%) with zoom-toward-cursor and % indicator
- Catalog context fix — sidebar refresh no longer hijacks active USE CATALOG context
- Named workspace export, session isolation, Cancel Job button, full session export
- ⚡ Recommended Streaming Config snippet, INSERT auto-redirect to Job Graph
- Job Graph DAG — drag-to-pan, animated edges, fault highlighting, node double-click detail
- Tab rename (double-click), per-session workspace isolation
- Initial build: SQL editor, session management, Results / Log / Operations tabs
- Snippet library, query history, catalog browser, performance tracking panel
- Dark/light theme toggle, workspace export/import, Job Graph tab first visual pipeline monitoring