About Str:::lab Studio
Str:::lab Studio is a browser-based SQL IDE that connects directly to any Apache Flink SQL Gateway. It provides a complete development, monitoring, and administration experience — without the overhead of a full platform deployment. Works with local Docker, Kubernetes, and any self-hosted or managed Flink cluster.
What it is
A zero-dependency, single-file HTML application served by nginx. All SQL execution is driven by the Flink SQL Gateway REST API. Results stream live into the browser. Session state is managed client-side in localStorage.
Who it is for
Streaming data engineers and platform teams who want a proper IDE experience for Apache Flink SQL — query editing, live result streaming, real-time job monitoring, checkpoint tracking, ML inference pipelines, and professional reporting.
Open source
Created by Nestor A. A — open source under the MIT license. Community contributions and issue reports welcome at github.com/coded-streams/strlabstudio.
Kube Operator
The Str:::lab Studio Kube Operator manages Studio deployments as first-class Kubernetes CRDs. Install via Helm from strlabstudio-operator.
Quickstart
git clone https://github.com/coded-streams/strlabstudio
cd strlabstudio
docker compose up -d
open http://localhost:3030
Starts: JobManager, TaskManager(s), SQL Gateway, CORS proxy, and the Studio nginx container. On first load, a Tips & Concepts modal walks through key features.
python3 -m http.server 8080
# Open studio/index.html — use Direct Gateway mode
helm repo add strlabstudio https://coded-streams.github.io/strlabstudio-operator/charts
helm install strlabstudio-operator strlabstudio/strlabstudio-operator \
--namespace flinksql-system --create-namespace
kubectl apply -f - <<EOF
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: my-studio
namespace: flink
spec:
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
EOF
Connection Modes
| Mode | Best For | Authentication |
|---|---|---|
| Proxy Mode | Local Docker or Kubernetes clusters. Routes through nginx — zero config required. | None |
| Direct Gateway | Remote clusters, Confluent Cloud, Ververica, AWS Managed Flink. Enter gateway URL directly. | Bearer token, Basic Auth, or None |
| Confluent Cloud | Managed Flink SQL on Confluent. Enter Org ID, Env ID, region, API key + secret. | Basic (API key : secret) |
| Ververica | VVP self-managed, Ververica Cloud, or BYOC. Enter base URL, namespace, API token. | Bearer token |
| AWS Managed Flink | Amazon Managed Service for Apache Flink. Studio notebooks via Zeppelin. | IAM / Access Keys |
| Admin Session | Platform operators. Full cluster job visibility, cross-session oversight, advanced reports. | Admin passcode (default: admin1234) |
Architecture
Str:::lab Studio is a static HTML application served by nginx. All SQL execution happens via the Flink SQL Gateway REST API. The browser manages all session state client-side.
State Management
| State | Storage | Survives Reload |
|---|---|---|
| Editor tabs and SQL content | localStorage (auto-saved) | Yes |
| Query history | localStorage | Yes |
| Theme preference | localStorage | Yes |
| Feature Engineering history and output schema | localStorage (strlabstudio_fem_history, strlabstudio_fem_state) | Yes |
| Inference Manager history | localStorage (strlabstudio_ifm_history) | Yes |
| Pipeline Manager saved pipelines | localStorage (strlabstudio_pipelines) | Yes |
| SQL Gateway session handle | In-memory only | No — reconnect required |
| Result rows and profiler recordings | In-memory only | No |
| Full workspace | JSON file (export/import) | Yes, via import |
All Features
Editor
Multi-Tab Editor
Unlimited named tabs with isolated editor content, results, logs, and performance metrics per tab. Double-click to rename.
Run Selection
Highlight any SQL fragment and click Run Selection to execute only that portion. Essential for multi-statement files.
Format and Explain
Auto-format normalises whitespace. EXPLAIN Plan shows the logical and physical execution plan before job submission.
Duplicate Submission Guard
Tracks running INSERT INTO statements. Resubmitting the same pipeline is blocked with a warning until the previous job is cancelled.
Sessions
Session Persistence
Sessions never expire from the IDE side — a heartbeat keeps them alive. Only explicit disconnect or connection loss ends a session.
Load Existing Sessions
After testing the connection the IDE fetches all existing sessions from the gateway and presents them in a dropdown — no UUID copy-paste needed.
Job Tagging
Every INSERT INTO pipeline automatically tags the Flink job name with the session label via pipeline.name, making jobs traceable in the Flink UI.
Session Isolation
Each regular session sees only its own jobs. Admin sessions see all jobs across all sessions on the cluster.
Job Graph
Live DAG
Real-time directed acyclic graph of any running Flink job. Nodes show operator type, parallelism, and record counts updating live.
Node Detail Modal
Double-click any node for full metrics: status, records in/out/s, backpressure, JVM heap, subtask table, and a live throughput streaming chart.
Vertex Timeline
Gantt-style chart showing each operator state transitions (CREATED to RUNNING to FINISHED). Click any bar for subtask status detail.
Cancel Job
Cancel button shows a confirmation dialog before sending the cancel signal to the JobManager. Admin sessions can cancel any job on the cluster.
Performance Monitoring
Per-Query Timing
Every statement timed from submission to first result. Bar charts show relative query durations with row count and status.
Live Throughput
Records in/out per second as animated sparklines with min/max labels. Toggle Live for auto-refresh every 2 seconds.
Checkpoint Health
Shows total/completed/failed counts, last checkpoint duration and state size, and a history sparkline.
Cluster Resources
CPU cores, physical memory, JVM heap, Flink version, task manager count, and slot utilisation gauges from the JobManager REST API.
Snippets Library
30+ built-in SQL templates accessible from the toolbar. Click any snippet to insert into the active editor tab.
| Category | Snippets |
|---|---|
| Setup | Common Flink SQL Mistakes, Use Default Catalog, Create In-Memory Catalog, Use Hive Catalog |
| Config | Recommended Streaming Config (all-in-one), Set Runtime Mode, Set Parallelism, Set State TTL |
| DDL | Show Catalogs, Show Databases, Show Tables, Describe Table, Show Create Table, Show Jobs |
| Window | Tumble Window, Hop Window, Session Window, Cumulate Window |
| Connector | Kafka Source, Datagen Source, Elasticsearch Sink, MinIO / S3 Sink, Print Sink |
| Perf | Enable MiniBatch, Local-Global Aggregation, Idle Source Timeout |
| Pattern | Top-N per Group, Deduplication, Temporal Join, Interval Join |
Project Manager
Save, load, run, and organise named Str:::lab projects via Projects in the topbar. Each project stores SQL tabs, SET config, catalog context, run count, and timestamps. Export and import as JSON files.
UDF Manager
Complete UDF lifecycle management: browse registered functions and views, upload JARs directly to the cluster, register functions (Java/Scala/Python — LANGUAGE SQL does not exist in Flink and has been removed), build SQL expressions via the View Builder, generate Maven/Gradle configs, and insert professional UDF templates. Fixes in v2: SHOW FUNCTIONS now returns all rows, library search works across Views and UDFs, View Builder Insert Expression works correctly when the form is filled.
Colour Describe
User-defined live row highlighting. Build rules (field + operator + value + color + style mode) that apply to streaming result rows in real time. Operators: == != > >= < <=, contains, starts, ends, regex. Style modes: row background, left border accent, text color. Rules evaluated top-to-bottom — first match wins. Color legend rendered below the table with a Turn off button. Report integration included.
Pipeline Manager v2.1
Visual drag-and-drop Flink SQL pipeline builder. Drag operators from the palette onto the canvas, connect them, configure parameters inline, and get production-ready Flink SQL generated automatically. Access via Managers → Pipeline Manager.
Operator Palette
| Group | Operators |
|---|---|
| Sources | Kafka, Datagen, JDBC, Filesystem/S3, Pulsar, Kinesis |
| Transformations | Filter, Project, UDF Map, Lookup Enrich, Union, Split/Route |
| Windows | Tumble, Hop, Session, Cumulate |
| Aggregations | Group Agg, Dedup, Top-N |
| Joins | Interval Join, Temporal Join, Regular Join |
| CEP | MATCH_RECOGNIZE, CEP Alert |
| Sinks | Kafka, JDBC, Filesystem, Elasticsearch, Print, Blackhole, MongoDB, Kinesis |
| Output | Results Tab, AI Model Node |
| My UDFs | UDF Function (dropdown populated from UDF Manager), Feature Store |
Edge Types
| Type | Description | Use When |
|---|---|---|
| FORWARD | Same parallelism, no network shuffle. Default and most efficient. | Consecutive operators at the same parallelism |
| HASH | Records shuffled by key hash. All records with the same key go to the same subtask. | Before GROUP BY, JOIN, DEDUP |
| REBALANCE | Round-robin distribution. Breaks key affinity. | Fixing hotspots before stateless sinks |
| BROADCAST | Every record sent to every downstream subtask. | Small reference datasets for broadcast joins |
| RESCALE | Local round-robin — stays within the same TaskManager where possible. | Scaling up parallelism cheaply |
Canvas Controls
- Drag nodes from the palette onto the canvas. Nodes can be freely repositioned by dragging.
- Connect nodes by dragging from an output port (right edge) to an input port (left edge).
- Configure by double-clicking a node — the config modal opens pinned beside the node.
- Pan by dragging the canvas background. Zoom with the mouse wheel.
- Auto Layout arranges all nodes in a clean left-to-right topology.
- Animate with the Run button — particles flow along edges for RUNNING pipelines.
Feature Engineering Manager New
A six-step guided wizard for building real-time Flink SQL feature pipelines from streaming data. Supports Schema Registry integration or manual column entry. Generates production-ready Flink SQL with per-column transforms, multiple window aggregations, configurable sinks, and a fully interactive pipeline canvas. Access via Managers → Feature Engineering.
Six-Step Wizard
Data Source
Choose Manual Column Entry or Schema Registry. For Schema Registry, connect to any Confluent, Apicurio, Karapace, or AWS Glue SR endpoint, browse subjects, and import the field schema in one click. Enter the source table name and connector type (Kafka, Datagen, JDBC, Filesystem, Kinesis, Pulsar).
Feature Column Selection
Toggle which columns from the source schema to include. Time and timestamp columns are highlighted for watermark assignment. Configure the event time column and out-of-orderness delay. Numeric-only quick-select available.
Per-Column Transformations
Configure a transformation per column. Available transforms: Pass-through, CAST, Math expression, COALESCE, UPPER/LOWER/TRIM, SUBSTRING, ROUND, ABS, LN, Bucketing / CASE WHEN, MD5 hash encode, DATE_FORMAT, UNIX_TIMESTAMP, FROM_UNIXTIME, IFNULL, LAG, Min-Max Normalise, and Custom SQL expression. Each transform produces a correctly typed output column in the generated SQL.
Windows and Aggregations
Add one or more window aggregations: TUMBLE (fixed), HOP (sliding), SESSION (activity-based), or CUMULATE (running total). Each window requires a time column and aggregation expressions. A keyless GROUP BY aggregation (no window) can be added separately. Multiple windows of different sizes are fully supported — common for feature stores modelling 1-min, 5-min, and 1-hour features simultaneously.
Output Sink
Configure the output table name and sink connector. Supported sinks: Kafka (with SASL/SSL and Schema Registry), JDBC (PostgreSQL/MySQL/Redshift), Filesystem/S3/GCS/HDFS, Apache Iceberg, Apache Hudi, Elasticsearch/OpenSearch, Redis, Feast Feature Store, Print, Blackhole. All credentials entered in this step are emitted directly into the generated SQL — no placeholder values.
Review and SQL
Side-by-side view: an interactive pipeline canvas on the left and the complete generated Flink SQL on the right. The canvas is fully zoomable (scroll wheel), pannable (drag), and collapsible to full-screen (Maximize button). Double-click the canvas background to fit the view. The pipeline is automatically saved to history with its full output column schema for use by the Inference Manager.
Pipeline Canvas
The Review step renders an SVG-based pipeline canvas showing all four stages: Source feature columns, Transformations (with operator icons), Windows/Aggregations, and the Output node. The canvas supports:
- Scroll to zoom — mouse wheel zooms toward the cursor position, range 15%–400%
- Drag to pan — click and drag moves the viewport freely in all directions
- Fit to view — the Fit button (and double-click on background) auto-centres and scales all content
- Maximize/restore — the expand button takes the canvas to full-screen (fixed overlay) and restores it cleanly
- Zoom label — shows current zoom percentage in the toolbar
History
Every completed pipeline is saved to localStorage with the source table name, output table name, sink type, column counts, window counts, the full generated SQL, and the complete output column schema (with correctly inferred types per transformation). The History button shows up to 20 entries with an Insert SQL button per entry. History can be cleared individually or entirely.
Validation
Each step enforces required fields before allowing progression. Required fields vary by step: source table name and at least one column are required at Step 1; window time column and aggregations are required at Step 4; output table name is required at Step 5. A red validation banner appears at the top of the wizard with the specific error. Navigation via step numbers respects the same validation rules.
Generated SQL Structure
-- Feature Engineering Pipeline
-- Source: transactions_raw Sink: user_features_out
-- Generated by Str:::lab Studio — Feature Engineering Manager
SET 'execution.runtime-mode' = 'streaming';
SET 'parallelism.default' = '4';
SET 'execution.checkpointing.interval' = '10000';
SET 'table.exec.state.ttl' = '3600000';
CREATE TEMPORARY TABLE IF NOT EXISTS transactions_raw (
user_id BIGINT,
merchant_id BIGINT,
amount DOUBLE,
currency STRING,
status STRING,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '5 SECOND'
) WITH (
'connector' = 'kafka',
'topic' = 'transactions',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'feature-pipeline-group',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
);
-- Window 1: TUMBLE — 5 MINUTE feature aggregation
INSERT INTO user_features_out
SELECT
user_id,
window_start,
window_end,
COUNT(*) AS tx_count_5m,
SUM(amount) AS total_amount_5m,
AVG(amount) AS avg_amount_5m
FROM TABLE(TUMBLE(TABLE transactions_raw, DESCRIPTOR(event_time), INTERVAL '5 MINUTE'))
GROUP BY user_id, window_start, window_end;
-- Direct feature transformation stream (no aggregation)
INSERT INTO user_features_out
SELECT
user_id,
ROUND(amount, 2) AS amount,
MD5(CAST(user_id AS STRING)) AS user_id_hash,
UPPER(status) AS status,
CASE
WHEN amount < 100 THEN 'bucket_0'
WHEN amount < 500 THEN 'bucket_1'
ELSE 'bucket_2'
END AS amount_bucket
FROM transactions_raw;
AI Model Inference Manager New
A seven-step guided wizard for configuring real-time ML model inference on Flink streaming pipelines. Supports 25 model server types across five categories. Generates production-ready Flink SQL with async UDF patterns, all credentials embedded, and server-specific configuration documented as structured comments. Access via Managers → Inference Manager.
Source Table — Schema Auto-Population
Step 1 presents a combined table selector: a dropdown listing all known tables (Feature Engineering outputs first, then session tables), quick-pill shortcuts, a manual text input, and a From FEM button. Selecting a table auto-populates the schema using three strategies in order of richness:
- Reads
outputColumnsdirectly from the FEM history entry — the richest source, set by the Feature Engineering Manager with correctly typed columns per transformation - Falls back to the
strlabstudio_fem_statesnapshot written at FEM pipeline save time - Falls back to parsing the
CREATE TABLEDDL from the stored SQL string using a regex matched against the exact table name
Loaded columns are displayed as interactive tags and simultaneously populate the textarea for manual editing. If no FEM history is found for the selected table, a yellow indicator prompts the user to enter the schema manually.
Model Servers — 25 Supported
| Category | Servers |
|---|---|
| MLOps Platforms | MLflow, AWS SageMaker, Azure ML, Google Vertex AI, Databricks |
| Open-Source Serving | MLflow pyfunc serve, BentoML, NVIDIA Triton, TorchServe, TF Serving, Ray Serve, Seldon Core, KServe |
| Artifact Stores | MinIO / S3 (model artifact loading) |
| Hosted APIs | HuggingFace Hub, OpenAI, Anthropic Claude, Cohere, Mistral AI, Together AI, AWS Bedrock |
| Custom / Bespoke | OpenAI-compatible (vLLM, Ollama, LocalAI, LM Studio), Custom HTTP REST, Custom gRPC, Flink UDF / JAR |
Authentication Methods
| Method | Header / Mechanism | Used For |
|---|---|---|
| Bearer Token | Authorization: Bearer <token> | MLflow, Databricks, BentoML, Ray Serve, Seldon, KServe |
| API Key (header) | Configurable header name + value | OpenAI, Mistral, Together, HuggingFace, custom |
| x-api-key header | x-api-key: <key> | Anthropic Claude |
| Basic Auth | Authorization: Basic base64(user:pass) | MLflow self-hosted, Elasticsearch |
| Custom Header | Any header name + value | Proprietary endpoints |
| AWS SigV4 | Request signing via AWS SDK | SageMaker, AWS Bedrock |
| AWS Access Keys | Access Key ID + Secret + optional session token | SageMaker, Bedrock, MinIO |
| MinIO Access Keys | MinIO access key + secret key | MinIO artifact store |
| Azure AD | OAuth2 client credentials flow | Azure ML |
| Google Service Account | SA JSON key file path | Google Vertex AI |
| mTLS Certificate | Client cert + key + CA cert paths | Custom gRPC, enterprise endpoints |
Inference I/O Configuration
- Input feature selection — click-to-toggle column tags from the loaded schema. Primary input column or multi-feature ARRAY expression for the UDF call.
- Pre-processing expression — optional SQL expression applied before the model call (e.g.
CAST(amount AS DOUBLE) / 1000.0) - Passthrough columns — columns carried forward alongside the prediction result
- Output alias and SQL type — the name and type of the prediction column (
DOUBLE,STRING,ARRAY<DOUBLE>,ROW,MAP, etc.) - Post-processing expression — optional SQL expression applied to the raw model output (e.g. CASE WHEN for classifying a score)
- Async parallelism — number of concurrent in-flight model requests per subtask
- Timeout and retry — request timeout in milliseconds and retry count on failure
- Error default — value emitted when the model call fails (null propagates the error)
Generated SQL — Credentials Included
All values entered across the wizard are emitted into the generated SQL. No placeholder values are used for fields that have been filled in. The structure is:
-- Real-time ML Inference Pipeline
-- Model Server : AWS SageMaker
-- Auth : aws_sigv4
-- Source Table : user_features_out
-- Output Sink : kafka → user_features_scored
-- Prediction : fraud_score (DOUBLE)
SET 'execution.runtime-mode' = 'streaming';
SET 'parallelism.default' = '4';
-- Output table with actual credentials
CREATE TEMPORARY TABLE IF NOT EXISTS user_features_scored (
user_id BIGINT,
amount DOUBLE,
fraud_score DOUBLE
) WITH (
'connector' = 'kafka',
'topic' = 'fraud-scores-output',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.security.protocol' = 'SASL_SSL',
'properties.sasl.mechanism' = 'PLAIN',
'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="api-key-123" password="api-secret-456";',
'format' = 'json'
);
-- Model Server Configuration
-- Server : AWS SageMaker (sagemaker)
-- Endpoint : fraud-detection-endpoint-prod
-- Auth : aws_sigv4
-- AWS SigV4 region: us-east-1
-- AWS service : sagemaker
-- Timeout : 5000ms Retries : 2 Async parallelism : 4
CREATE TEMPORARY FUNCTION IF NOT EXISTS CALL_MODEL_UDF
AS 'com.yourcompany.flink.udf.AWSageMakerAsyncUDF'
LANGUAGE JAVA;
INSERT INTO user_features_scored
SELECT user_id, amount, CALL_MODEL_UDF(amount, tx_count_5m, avg_amount_5m,
'fraud-detection-endpoint-prod', 'us-east-1') AS fraud_score
FROM user_features_out;
-- Real-time ML Inference Pipeline
-- Model Server : Anthropic Claude
-- Auth : x_api_key
-- Prediction : risk_assessment (STRING)
CREATE TEMPORARY TABLE IF NOT EXISTS transactions_scored (
user_id BIGINT,
amount DOUBLE,
risk_assessment STRING
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:postgresql://localhost:5432/ml_results',
'table-name' = 'public.fraud_assessments',
'driver' = 'org.postgresql.Driver',
'username' = 'flink_writer',
'password' = 'db-secret-pass'
);
-- Model Server Configuration
-- Server : Anthropic Claude (anthropic)
-- Model : claude-sonnet-4-6
-- Auth : x_api_key
-- Header : x-api-key: sk-ant-api03-xxxx…[KEY]
-- Timeout : 5000ms Retries : 2
CREATE TEMPORARY FUNCTION IF NOT EXISTS CALL_MODEL_UDF
AS 'com.yourcompany.flink.udf.AnthropicClaudeAsyncUDF'
LANGUAGE JAVA;
INSERT INTO transactions_scored
SELECT user_id, amount,
CALL_MODEL_UDF(CAST(amount AS STRING), 'claude-sonnet-4-6',
'Classify this transaction as FRAUD or LEGITIMATE.', 128) AS risk_assessment
FROM transactions_raw;
Inference Pipeline Canvas
The Review step (Step 7) renders a fully interactive SVG inference pipeline canvas showing: source feature column nodes, an optional pre-process node, the model server hero node (with provider icon, async parallelism badge, and dashed orbit ring), an optional post-process node, and the output node. Passthrough columns are annotated below the output. The canvas supports the same zoom/pan/maximize controls as the Feature Engineering canvas.
History
Every completed inference pipeline is saved to history with source table, output table, model server, output alias, and the full generated SQL. Up to 20 entries. Each entry has an Insert SQL button to reload the pipeline into the editor without going through the wizard again.
Observability and Diagnostics v0.0.22
Six diagnostic tools accessible from the editor toolbar. Each opens as a modal with full interactive controls.
Live Pipeline Visualiser
Embedded in the SQL editor. Parses SQL in real time and renders a source-to-transform-to-sink DAG. Pan, zoom, and streaming animation while a job runs. Collapses to a vertical tab on the right edge (Ctrl+`). Recognises CREATE TABLE, CREATE VIEW, CREATE FUNCTION, INSERT INTO, CTEs, JOINs, GROUP BY, and UDF/VIEW names defined in the current script.
Schema Registry Browser
Connect to Confluent Schema Registry, Apicurio, Karapace, or AWS Glue SR. Browse all subjects with live search. Compare schema versions — breaking changes (removed required fields, unsafe type changes) shown in red, non-breaking in green. Validate a CREATE TABLE in the editor against the registered schema. One-click Generate CREATE TABLE with avro-confluent config and auto-detected watermark.
Backpressure Root Cause Analyser
Fetches per-vertex backpressure %, busy %, idle %, and records in/out per second. Identifies the highest-backpressure non-source vertex as the bottleneck. Traces the propagation chain. Correlates with checkpoint duration and GC pause times. Classifies root cause: slow sink, processing bottleneck, upstream starvation, GC pressure, or checkpoint slowness. Plain-English severity and per-finding actionable recommendations. SVG heatmap. Live auto-refresh every 8 seconds.
Watermark Health Monitor
Real-time per-vertex watermark lag, stall detection with configurable threshold (5s/10s/30s/60s), late-event rate, sparklines over ~4 minutes of history, and a cross-vertex timeline canvas showing all streams event-time positions relative to now.
Multi-Pipeline Dependency Graph
Fetches execution plans for all active jobs and builds a single cross-job topology: topics/tables as nodes, pipelines as nodes, data flow as directed edges. Detects dead-end topics, shared sources (multiple consumers), and circular dependencies. Animated particles on edges connected to RUNNING pipelines. Each pipeline node gets a unique palette color. Drag to reposition nodes. Double-click a node for detailed metrics.
SQL Profiler / Execution Replay
Flight recorder for running Flink jobs. Captures per-operator metric snapshots (records/s, backpressure %, busy %, bytes/s, checkpoint duration) at configurable intervals (2s to 30s) into a ring buffer of up to 360 snapshots. Replay any historical moment with a time scrubber — the DAG re-renders with backpressure tints and anomaly badges. Auto-play at 1x/2x/4x. Per-operator sparklines across the full recording. Export to JSON.
Reports
PDF reports generated from the Performance tab. Standard session report covers queries, timing, jobs, cluster info, and checkpoints. Admin Technical Report adds vertex-level metrics, JVM, and per-session breakdown. Admin Business/Management Report focuses on resource utilisation percentages and throughput summaries with plain-language annotations. Job Graph Report captures a single job's DAG topology, vertex states, record counts, and checkpoint history.
Admin Session
The Admin Session mode provides a privileged view of the entire Flink cluster across all users and sessions. Select Admin mode on the connect screen and enter the admin passcode.
| Capability | Regular Session | Admin Session |
|---|---|---|
| Jobs visible in Job Graph | Own jobs only | All jobs on the cluster |
| Cancel job | Own jobs only | Any job on the cluster |
| Session Activity modal | Not available | Full cross-session breakdown |
| Report type | Standard session report | Technical or Business/Management |
admin1234. Change it immediately after first login via the Admin Settings modal (click the shield badge in the topbar). The new passcode is persisted in localStorage.
Connector JARs
| Flink Version | Kafka Connector JAR |
|---|---|
1.20.x | flink-sql-connector-kafka-3.3.0-1.20.jar |
1.19.x — recommended | flink-sql-connector-kafka-3.3.0-1.19.jar |
1.18.x | flink-sql-connector-kafka-3.3.0-1.18.jar |
1.17.x | flink-sql-connector-kafka-3.2.0-1.17.jar |
1.16.x | flink-sql-connector-kafka-3.1.0-1.16.jar |
./scripts/download-connectors.sh
# Override: ./scripts/download-connectors.sh --flink-version 1.19.1
.\scripts\download-connectors.ps1
# Override: .\scripts\download-connectors.ps1 -FlinkVersion 1.19.1
FROM flink:1.19.1-scala_2.12-java11
USER root
RUN apt-get update && apt-get install -y libgomp1 && rm -rf /var/lib/apt/lists/*
COPY ./connectors/*.jar /opt/flink/lib/
USER flink
Str:::lab Studio Kube Operator
Manages Studio deployments as first-class Kubernetes CRDs. Deploy and lifecycle-manage Studio instances with a single kubectl apply. Built with Kopf (Python). Repository: github.com/coded-streams/strlabstudio-operator
CRD Reference — StrlabStudio
Short name: kubectl get fss · Group: codedstreams.io/v1alpha1
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
spec.gateway.host | string | Yes | — | Flink SQL Gateway hostname |
spec.gateway.port | integer | — | 8083 | SQL Gateway port |
spec.jobmanager.host | string | — | localhost | JobManager REST hostname |
spec.jobmanager.port | integer | — | 8081 | JobManager REST port |
spec.image | string | — | codedstreams/strlabstudio:latest | IDE Docker image |
spec.replicas | integer | — | 1 | Pod count (1–10) |
spec.service.type | string | — | ClusterIP | ClusterIP, NodePort, or LoadBalancer |
spec.service.nodePort | integer | — | — | NodePort (30000–32767) |
spec.resources.limits | object | — | {cpu:200m, memory:128Mi} | Pod resource limits |
helm repo add strlabstudio https://coded-streams.github.io/strlabstudio-operator/charts
helm repo update
helm install strlabstudio-operator strlabstudio/strlabstudio-operator \
--namespace flinksql-system --create-namespace
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: dev-studio
namespace: flink
spec:
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
apiVersion: codedstreams.io/v1alpha1
kind: StrlabStudio
metadata:
name: prod-studio
namespace: flink
spec:
image: codedstreams/strlabstudio:v0.0.23
replicas: 2
gateway:
host: flink-sql-gateway
port: 8083
jobmanager:
host: flink-jobmanager
port: 8081
service:
type: LoadBalancer
port: 80
resources:
requests: {cpu: "100m", memory: "128Mi"}
limits: {cpu: "500m", memory: "256Mi"}
Deploy Options
Docker Compose
Full stack in one command — Studio, Flink cluster, SQL Gateway, Kafka.
docker compose up -d
Standalone HTML
Serve studio/index.html from any static server. Use Direct Gateway mode.
python3 -m http.server 8080
Kube Operator
CRD-based deployment via Helm. GitOps-friendly. No PVC required.
Cloud Managed Flink
Connect to Confluent Cloud, Ververica Cloud, or AWS Managed Flink with Direct Gateway and bearer token auth.
Flink Compatibility
| Flink Version | Status | Notes |
|---|---|---|
2.0.x | Supported | Full feature parity. SHOW VIEWS available. |
1.20.x | Supported | Full feature parity. |
1.19.x | Recommended | Stable, well-tested connector ecosystem. |
1.18.x | Supported | SHOW VIEWS available. Cumulate window supported. |
1.17.x | Supported | Most features work. SHOW VIEWS may not be available. |
1.16.x | Supported (min) | Core features work. Some newer SQL syntax not available. |
Changelog
- Feature Engineering Manager — six-step guided wizard for building real-time Flink SQL feature pipelines. Schema Registry integration or manual column entry. 20 per-column transform types with correctly typed output columns. TUMBLE/HOP/SESSION/CUMULATE window support. Configurable sink with all credentials emitted. Interactive SVG canvas with zoom/pan/maximize. Full session history with output column schema. Validation on all required fields.
- AI Model Inference Manager — seven-step wizard for real-time ML inference on streaming pipelines. 25 model servers across five categories (MLflow, SageMaker, Azure ML, Vertex AI, Databricks, Triton, TorchServe, TF Serving, BentoML, Ray Serve, Seldon, KServe, MinIO, HuggingFace, OpenAI, Anthropic, Cohere, Mistral, Together, Bedrock, OpenAI-compatible, Custom HTTP/gRPC, Flink UDF). 11 auth methods. All credentials embedded in generated SQL. Interactive inference canvas with zoom/pan/maximize. FEM-to-IFM schema handoff: three-strategy auto-population of source columns.
- FEM schema persistence — Feature Engineering Manager now saves a full state snapshot (
strlabstudio_fem_state) and enriched history entries withsourceColumnsandoutputColumns(correctly typed per transformation) for cross-manager schema sharing. - Inference Manager source table selector — combined dropdown (FEM outputs + session tables), quick-pill shortcuts, aligned From FEM button, live column tag display, and three-strategy schema loading with fallback chain.
- Sink credential embedding — all sink connector fields now have stable IDs and are collected into
_IFM.sinkConfigbefore SQL generation. No placeholder values for fields the user has filled in. - Multi-Pipeline Dependency Graph v0.0.23 fixes — unique palette color per pipeline node, double-click node detail uses Map-based data store to avoid JSON-in-attribute encoding issues, drag unblocked in all directions, maximize/restore saves and reinstates exact original CSS.
- Managers dropdown — Feature Engineering and Inference Manager added to the Managers select in the topbar alongside existing Pipeline Manager, Systems, UDF, Catalog, Schema Registry, BP Analyser, Watermarks, Dep Graph, and SQL Profiler.
- Live Pipeline Visualiser — real-time SQL parsing, DAG rendering beside editor, pan/zoom/animate, collapse to vertical tab, continuous animation for streaming jobs, UDF/VIEW name tracking.
- Schema Registry Browser — Confluent/Apicurio/Karapace/AWS Glue SR. Subject browsing, version diff with breaking-change detection, CREATE TABLE generation, editor SQL validation.
- Backpressure Root Cause Analyser — bottleneck identification, propagation chain tracing, GC/checkpoint correlation, SVG heatmap, live auto-refresh.
- Watermark Health Monitor — per-vertex lag, stall detection, late-event rate, sparklines, cross-vertex timeline canvas.
- Multi-Pipeline Dependency Graph — cross-job topology, connector-type node colours, dead-end/shared-source/cycle detection, animated particles on RUNNING pipelines.
- SQL Profiler / Execution Replay — ring buffer recording, time scrubber replay, backpressure tints, auto-play, per-operator sparklines, JSON export.
- SQL Editor keywords expanded — 109 documented keywords, expanded autocomplete, JavaDoc-style hover tooltips on 25+ keywords.
- Pipeline Manager v2.0 — inline node expand (double-click configures in-place on canvas), animation stops with error highlights preserved, My UDFs operator group populated from UDF Manager, SQL generation trailing comma fixes, edge config About tab documenting all five edge types.
- Documentation — separate full-length reference pages for Pipeline Manager, Systems Integration, and Catalog Management.
- UDF Manager v2 — SHOW FUNCTIONS 0-row bug fixed, View Builder Insert Expression false-positive fixed, Library Views section separated, LANGUAGE SQL permanently removed, friendly error messages for registration failures.
- Colour Describe redesigned — professional modal with slot selector, 10-operator rule builder, 8-preset swatches + custom hex, three style modes, reorderable rules list, PDF report integration.
- execution.js — block comment support in splitSQL, duplicate submission guard, SHOW/SELECT always render all rows.
- Brand rename from FlinkSQL Studio to Str:::lab Studio
- Admin Session with full cluster visibility, cross-session job attribution, Technical and Business/Management PDF reports
- Project Manager — named projects with SQL tabs, SET config, catalog context, JSON export/import, storage tracking
- JAR upload directly to cluster via REST, Maven/Gradle config generation
- Tips and Concepts modal, session heartbeat, load existing sessions dropdown, job tagging, cancel confirmation
- Job Graph DAG with drag-to-pan, vertex timeline, node detail modal
- Performance panel with per-query timing, checkpoint health, cluster resources, live throughput sparklines
- Initial build: multi-tab editor, session management, Results/Log/Operations tabs, snippets, query history, catalog browser