# SnapBench Component Designer Guide

This guide explains how to create components for SnapBench, a platform for interactive Kubernetes-based labs.

## Component JSON Structure

```json
{
  "ComponentName": {
    "kind": "DisplayName",
    "label": "service-hostname",
    "deploymentType": "docker|helm|compose",
    "image": "image:tag",
    "env": {"KEY": "value"},
    "command": [],
    "args": [],
    "expose": [{"name": "port-name", "port": 8080}],
    "ui": [{"path": "/", "servicePort": 8080}],
    "resources": {"cpuM": 500, "memMi": 512},
    "tabs": [],
    "files": [],
    "readinessCheck": {},
    "help": "Markdown documentation"
  }
}
```

## Deployment Types

### Docker (Single Container)

For simple, single-container deployments.

```json
{
  "MyApp": {
    "kind": "MyApp",
    "label": "myapp",
    "deploymentType": "docker",
    "image": "nginx:latest",
    "env": {
      "ENV_VAR": "value"
    },
    "expose": [
      {"name": "http", "port": 80}
    ],
    "resources": {
      "cpuM": 100,
      "memMi": 128
    },
    "tabs": [
      {"type": "terminal", "label": "Terminal", "workingDir": "/"}
    ]
  }
}
```

**CRITICAL**: For Docker components, you MUST define `expose` to create a Kubernetes Service. Without this, other components cannot reach this component by its label.

### Helm (Complex Infrastructure)

For complex deployments using Helm charts.

```json
{
  "PostgreSQL": {
    "kind": "PostgreSQL",
    "label": "postgres",
    "deploymentType": "helm",
    "helm": {
      "repository": "oci://registry-1.docker.io/bitnamicharts",
      "chart": "postgresql",
      "version": "16.0.0",
      "values": "auth:\n  username: postgres\n  password: postgres\nprimary:\n  resources:\n    requests:\n      cpu: 250m\n      memory: 512Mi"
    },
    "resources": {
      "cpuM": 250,
      "memMi": 512
    },
    "tabs": [
      {"type": "terminal", "label": "psql", "workingDir": "/tmp"}
    ]
  }
}
```

### Docker Compose (Multi-Container)

For multi-service applications. The `compose` field is a JSON object (not a YAML string).

```json
{
  "MyStack": {
    "kind": "MyStack",
    "label": "mystack",
    "deploymentType": "compose",
    "compose": {
      "services": {
        "web": {
          "image": "nginx",
          "ports": ["80:80"],
          "deploy": {
            "resources": {
              "reservations": {"cpus": "0.25", "memory": "256M"}
            }
          }
        },
        "db": {
          "image": "postgres",
          "environment": {
            "POSTGRES_PASSWORD": "secret"
          },
          "deploy": {
            "resources": {
              "reservations": {"cpus": "0.25", "memory": "512M"}
            }
          }
        }
      }
    }
  }
}
```

#### Key Difference with Local Docker Compose

**IMPORTANT - Ports for Inter-Service Communication:**

In standard Docker Compose, all services share the same network and can communicate on any port without explicit port mappings. In SnapBench (Kubernetes), each service becomes a separate Pod, and **ports must be explicitly declared** for inter-service communication.

For example, if `web` needs to connect to `db` on port 5432:
- **Docker Compose (local)**: Works without declaring ports (implicit network sharing)
- **SnapBench (Kubernetes)**: Service `db` must have `"ports": ["5432:5432"]` to create the Kubernetes Service

#### Compose Service Naming

**IMPORTANT**: In Kubernetes, Compose services are named `{componentID}-{serviceName}`.

For example, if you have a Flink component with ID `flink` and services `jobmanager` and `taskmanager`:
- The JobManager service will be accessible at `flink-jobmanager`
- The TaskManager service will be accessible at `flink-taskmanager`

When configuring inter-service communication within the same Compose component, use the **full service name** with the `{{$self}}` template variable:

```json
{
  "flink": {
    "kind": "Flink",
    "label": "flink",
    "deploymentType": "compose",
    "compose": {
      "services": {
        "jobmanager": {
          "image": "flink:1.20",
          "command": "jobmanager",
          "environment": {
            "FLINK_PROPERTIES": "jobmanager.rpc.address: {{$self}}-jobmanager\n"
          }
        },
        "taskmanager": {
          "image": "flink:1.20",
          "command": "taskmanager",
          "depends_on": ["jobmanager"],
          "environment": {
            "FLINK_PROPERTIES": "jobmanager.rpc.address: {{$self}}-jobmanager\n"
          }
        }
      }
    }
  }
}
```

Using `{{$self}}-jobmanager` ensures the service name stays correct even if you change the component ID.

## Exposed Ports

### Structure

```json
{
  "expose": [
    {
      "name": "http",
      "port": 8080,
      "public": false,
      "podSelector": "",
      "containerSelector": ""
    }
  ]
}
```

### Behavior by Deployment Type

| Type | `expose` empty | Service created? |
|------|----------------|------------------|
| Docker | Yes | NO - Component unreachable |
| Docker | Ports defined | YES - ClusterIP service |
| Compose | Any | YES - Always created |
| Helm | N/A | YES - Managed by chart |

### Public TCP Ports

For components that need to be accessible from outside the cluster (e.g., Kafka brokers for external clients), set `public: true` and `protocol: "tcp"`:

```json
{
  "expose": [
    {
      "name": "external",
      "port": 9094,
      "public": true,
      "protocol": "tcp"
    }
  ]
}
```

When a port is marked as public TCP:
- A public endpoint is allocated (e.g., `tcp.preprod.snapbench.io:12345`)
- The endpoint is available via the `{{$self:public}}` template variable
- External clients can connect directly to this endpoint

**Example: Kafka with External Access**

```json
{
  "Kafka": {
    "kind": "Kafka",
    "label": "kafka",
    "deploymentType": "docker",
    "image": "apache/kafka:3.7.0",
    "env": {
      "KAFKA_LISTENERS": "INTERNAL://:9092,EXTERNAL://:9094,CONTROLLER://:9093",
      "KAFKA_ADVERTISED_LISTENERS": "INTERNAL://{{$self}}:9092,EXTERNAL://{{$self:public}}",
      "KAFKA_LISTENER_SECURITY_PROTOCOL_MAP": "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT",
      "KAFKA_INTER_BROKER_LISTENER_NAME": "INTERNAL"
    },
    "expose": [
      {"name": "internal", "port": 9092},
      {"name": "external", "port": 9094, "public": true, "protocol": "tcp"}
    ]
  }
}
```

In this example, `{{$self:public}}` resolves to the TCP endpoint (e.g., `tcp.preprod.snapbench.io:12345`), allowing external Kafka clients to connect.

### Common Ports

| Service | Port |
|---------|------|
| PostgreSQL | 5432 |
| MySQL | 3306 |
| Redis | 6379 |
| Kafka | 9092 |
| Elasticsearch | 9200 |
| MongoDB | 27017 |
| HTTP services | 80, 8080, 3000 |

## Web UI Configuration

To expose a web UI, you need THREE things:

1. **`expose`** - Creates the Kubernetes Service
2. **`ui`** - Creates the Ingress for external browser access
3. **`tabs`** with `type: "web"` - Creates the clickable tab

```json
{
  "expose": [{"name": "http", "port": 3000}],
  "ui": [{"path": "/", "servicePort": 3000}],
  "tabs": [
    {"type": "web", "label": "Dashboard", "port": 3000, "path": "/"}
  ]
}
```

## Tabs Configuration

### Terminal Tab

```json
{
  "type": "terminal",
  "label": "Terminal",
  "workingDir": "/home"
}
```

For Helm charts with multiple pods:

```json
{
  "type": "terminal",
  "label": "Kafka CLI",
  "workingDir": "/opt/kafka/bin",
  "podSelector": "controller",
  "containerSelector": "kafka"
}
```

### Web Tab

```json
{
  "type": "web",
  "label": "Web UI",
  "port": 8080,
  "path": "/",
  "serviceName": "my-service"
}
```

## Readiness Check

Readiness checks ensure a component is fully started before dependent components begin.

**IMPORTANT: Be conservative with timing!** Some applications take a long time to start. If your readiness check is too aggressive, the component will be marked as **failed** before it has a chance to start.

### Example Initial Delays

These are indicative values — always test and adjust based on your actual environment:

| Component | Example Initial Delay |
|-----------|----------------------|
| Redis, lightweight services | 5-10s |
| PostgreSQL, MySQL | 10-15s |
| Kafka | 15-30s |
| Elasticsearch, Flink, heavy JVM apps | 60-120s |

### TCP Check (databases, message brokers)

```json
{
  "readinessCheck": {
    "enabled": true,
    "type": "tcp",
    "tcpSocket": {"port": 5432},
    "initialDelaySeconds": 10,
    "periodSeconds": 5,
    "timeoutSeconds": 3
  }
}
```

### HTTP Check (web services)

```json
{
  "readinessCheck": {
    "enabled": true,
    "type": "http",
    "httpGet": {"port": 8080, "path": "/health"},
    "initialDelaySeconds": 15,
    "periodSeconds": 5,
    "timeoutSeconds": 3
  }
}
```

### Slow-Starting Applications (Elasticsearch, Flink, etc.)

```json
{
  "readinessCheck": {
    "enabled": true,
    "type": "http",
    "httpGet": {"port": 9200, "path": "/_cluster/health"},
    "initialDelaySeconds": 90,
    "periodSeconds": 10,
    "timeoutSeconds": 5
  }
}
```

## Files

Mount configuration files into containers at `/sb/`:

```json
{
  "files": [
    {
      "name": "config.yaml",
      "content": "key: value\nother: setting"
    },
    {
      "name": "script.py",
      "content": "print('Hello World')"
    }
  ]
}
```

Files are accessible at `/sb/{filename}`.

## Resources

```json
{
  "resources": {
    "cpuM": 500,
    "memMi": 512
  }
}
```

- `cpuM`: CPU in millicores (1000m = 1 core)
- `memMi`: Memory in mebibytes

### Typical Resource Allocations

| Component Type | CPU | Memory |
|----------------|-----|--------|
| Database | 250m | 512Mi |
| Message Broker | 500m | 1Gi |
| Search Engine | 500m | 1Gi |
| Simple App | 100m | 128Mi |
| Workbench | 200m | 256Mi |

## Interactive Containers (Workbenches)

For containers where users run commands interactively:

```json
{
  "kind": "Workbench",
  "label": "workbench",
  "deploymentType": "docker",
  "image": "python:3.11-slim",
  "args": ["sleep", "infinity"],
  "tabs": [
    {"type": "terminal", "label": "Terminal", "workingDir": "/sb"}
  ],
  "files": [
    {"name": "script.py", "content": "print('Hello')"}
  ]
}
```

**Key points:**
- Use `"args": ["sleep", "infinity"]` to keep container running
- Do NOT set `command` - leave it undefined
- This allows interactive shell access

## Command and Arguments Parsing

**IMPORTANT**: Arguments with spaces are split into separate arguments.

When specifying `command` or `args`, each array element is split on spaces:
- `["pip install fastapi"]` becomes `["pip", "install", "fastapi"]`
- This breaks commands like `bash -c "script"` because the script becomes multiple arguments

**To keep arguments together**, wrap them in quotes:

```json
{
  "command": ["bash", "-c"],
  "args": ["\"pip install fastapi && python /sb/app.py\""]
}
```

### Examples

❌ **WRONG** - Will fail (pip called without install command):
```json
{
  "command": ["bash", "-c"],
  "args": ["pip install fastapi && python /sb/app.py"]
}
```

✅ **CORRECT** - Script preserved as single argument:
```json
{
  "command": ["bash", "-c"],
  "args": ["\"pip install fastapi && python /sb/app.py\""]
}
```

For simple commands without shell features, separate each argument:
```json
{
  "args": ["--config", "/etc/app.conf", "--port", "8080"]
}
```

## Common Component Patterns

### PostgreSQL (Docker)

```json
{
  "PostgreSQL": {
    "kind": "PostgreSQL",
    "label": "postgres",
    "deploymentType": "docker",
    "image": "postgres:16-alpine",
    "env": {
      "POSTGRES_USER": "postgres",
      "POSTGRES_PASSWORD": "postgres",
      "POSTGRES_DB": "demo"
    },
    "expose": [{"name": "postgres", "port": 5432}],
    "resources": {"cpuM": 250, "memMi": 512},
    "tabs": [
      {"type": "terminal", "label": "psql", "workingDir": "/"}
    ],
    "readinessCheck": {
      "enabled": true,
      "type": "tcp",
      "tcpSocket": {"port": 5432},
      "initialDelaySeconds": 5,
      "periodSeconds": 3,
      "timeoutSeconds": 2
    }
  }
}
```

### Redis (Docker)

```json
{
  "Redis": {
    "kind": "Redis",
    "label": "redis",
    "deploymentType": "docker",
    "image": "redis:7-alpine",
    "expose": [{"name": "redis", "port": 6379}],
    "resources": {"cpuM": 100, "memMi": 256},
    "tabs": [
      {"type": "terminal", "label": "redis-cli", "workingDir": "/data"}
    ],
    "readinessCheck": {
      "enabled": true,
      "type": "tcp",
      "tcpSocket": {"port": 6379},
      "initialDelaySeconds": 3,
      "periodSeconds": 2,
      "timeoutSeconds": 1
    }
  }
}
```

### Kafka (KRaft Mode)

```json
{
  "Kafka": {
    "kind": "Kafka",
    "label": "kafka",
    "deploymentType": "docker",
    "image": "apache/kafka:3.7.0",
    "env": {
      "KAFKA_NODE_ID": "1",
      "KAFKA_PROCESS_ROLES": "broker,controller",
      "KAFKA_LISTENERS": "PLAINTEXT://:9092,CONTROLLER://:9093",
      "KAFKA_ADVERTISED_LISTENERS": "PLAINTEXT://kafka:9092",
      "KAFKA_CONTROLLER_LISTENER_NAMES": "CONTROLLER",
      "KAFKA_LISTENER_SECURITY_PROTOCOL_MAP": "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT",
      "KAFKA_CONTROLLER_QUORUM_VOTERS": "1@localhost:9093",
      "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR": "1",
      "KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR": "1",
      "KAFKA_TRANSACTION_STATE_LOG_MIN_ISR": "1",
      "KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS": "0",
      "CLUSTER_ID": "MkU3OEVBNTcwNTJENDM2Qk"
    },
    "expose": [{"name": "broker", "port": 9092}],
    "resources": {"cpuM": 500, "memMi": 1024},
    "tabs": [
      {"type": "terminal", "label": "Kafka CLI", "workingDir": "/opt/kafka/bin"}
    ],
    "readinessCheck": {
      "enabled": true,
      "type": "tcp",
      "tcpSocket": {"port": 9092},
      "initialDelaySeconds": 15,
      "periodSeconds": 5,
      "timeoutSeconds": 3
    }
  }
}
```

### Grafana (with Web UI)

```json
{
  "Grafana": {
    "kind": "Grafana",
    "label": "grafana",
    "deploymentType": "docker",
    "image": "grafana/grafana:11.0.0",
    "env": {
      "GF_SECURITY_ADMIN_USER": "admin",
      "GF_SECURITY_ADMIN_PASSWORD": "admin",
      "GF_AUTH_ANONYMOUS_ENABLED": "true"
    },
    "expose": [{"name": "http", "port": 3000}],
    "ui": [{"path": "/", "servicePort": 3000}],
    "resources": {"cpuM": 200, "memMi": 256},
    "tabs": [
      {"type": "web", "label": "Grafana UI", "port": 3000, "path": "/"}
    ],
    "readinessCheck": {
      "enabled": true,
      "type": "http",
      "httpGet": {"port": 3000, "path": "/api/health"},
      "initialDelaySeconds": 10,
      "periodSeconds": 5,
      "timeoutSeconds": 3
    }
  }
}
```

### MinIO (S3-Compatible Storage)

```json
{
  "MinIO": {
    "kind": "MinIO",
    "label": "minio",
    "deploymentType": "docker",
    "image": "minio/minio:latest",
    "env": {
      "MINIO_ROOT_USER": "minioadmin",
      "MINIO_ROOT_PASSWORD": "minioadmin"
    },
    "args": ["server", "/data", "--console-address", ":9001"],
    "expose": [
      {"name": "api", "port": 9000},
      {"name": "console", "port": 9001}
    ],
    "ui": [{"path": "/", "servicePort": 9001}],
    "resources": {"cpuM": 250, "memMi": 512},
    "tabs": [
      {"type": "web", "label": "MinIO Console", "port": 9001, "path": "/"},
      {"type": "terminal", "label": "Terminal", "workingDir": "/data"}
    ],
    "readinessCheck": {
      "enabled": true,
      "type": "http",
      "httpGet": {"port": 9000, "path": "/minio/health/live"},
      "initialDelaySeconds": 5,
      "periodSeconds": 5,
      "timeoutSeconds": 3
    }
  }
}
```

## Inter-Component References

Components can reference each other using the `label` field as hostname:
- If PostgreSQL has `"label": "postgres"`, connect to `postgres:5432`
- The target component MUST have `expose` defined with that port

## Template Variables

Template variables allow dynamic values in component configuration. They are resolved at deployment time.

### Self-Reference Variables

| Variable | Description |
|----------|-------------|
| `{{$self}}` | Current component's ID (default) |
| `{{$self:id}}` | Current component's ID (explicit) |
| `{{$self:label}}` | Current component's label |
| `{{$self:fqdn}}` | Current component's FQDN (full Kubernetes DNS name) |
| `{{$self:public}}` | Current component's public endpoint (TCP: `host:port`, HTTP: hostname) |

**Public Endpoint Resolution:**
- If the component has a **public TCP port**, `{{$self:public}}` returns `host:port` (e.g., `tcp.domain.com:12345`)
- If the component only has **HTTP ports**, it returns the ingress hostname (e.g., `myapp-abc123.domain.com`)

**Example**: Using `{{$self}}` for Compose service references:

```json
{
  "environment": {
    "FLINK_PROPERTIES": "jobmanager.rpc.address: {{$self}}-jobmanager\n"
  }
}
```

If the component ID is `flink`, this resolves to `flink-jobmanager`.

**Example**: Using `{{$self:label}}` for display purposes:

```json
{
  "env": {
    "APP_NAME": "Service running as {{$self:label}}"
  }
}
```

### Component Reference Variables

| Variable | Description |
|----------|-------------|
| `{{componentLabel}}` | Referenced component's service name |
| `{{componentLabel:fqdn}}` | Referenced component's FQDN |
| `{{componentLabel:public}}` | Referenced component's public endpoint (first TCP if available, else HTTP) |
| `{{componentLabel:public:port}}` | Referenced component's public endpoint for a specific port |

**Port-specific public endpoints:**

When a component exposes multiple public TCP ports, use the `:port` suffix to target a specific one:

```json
{
  "env": {
    "KAFKA_BOOTSTRAP": "{{kafka:public:9094}}"
  }
}
```

This resolves to the TCP endpoint for port 9094 (e.g., `tcp.domain.com:12345`).

**Example**: Connecting to another component:

```json
{
  "env": {
    "DATABASE_HOST": "{{postgres}}",
    "DATABASE_URL": "postgresql://user:pass@{{postgres:fqdn}}:5432/db"
  }
}
```

### Global Variables

| Variable | Description |
|----------|-------------|
| `{{$namespace}}` | Kubernetes namespace for this run |
| `{{$runId}}` | Unique run identifier |
| `{{$baseDomain}}` | Base domain for public URLs |

## Best Practices

1. **Always set `expose` for Docker components** - Required for inter-component communication
2. **Use `sleep infinity` for interactive containers** - Keeps container running for shell access
3. **Include readiness checks** - Ensures proper startup order
4. **Set appropriate resources** - Start minimal, increase if needed
5. **Add descriptive tab labels** - Helps users understand purpose
6. **Write help documentation** - Explains credentials and usage