chore: add missing source modules to version control
Deploy to Production / deploy (push) Failing after 7s
Deploy to Production / deploy (push) Failing after 7s
apix-demo, apix-portal/src, apix-spider/src, apix-registry/src, apix-common/src were never staged. Without them the CI build has no source to compile and the Docker images cannot be produced. Also adds docs/ (infrastructure notes) missed in prior commits. Co-Authored-By: Mira <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,793 @@
|
||||
# APIX Registry — Infrastructure Setup Guide
|
||||
|
||||
Complete walkthrough for deploying the APIX registry to a Hetzner VPS with Bunny.net CDN,
|
||||
Prometheus/Grafana observability, and live Loki telemetry.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Architecture Overview](#1-architecture-overview)
|
||||
2. [Prerequisites](#2-prerequisites)
|
||||
3. [VPS Provisioning (Hetzner)](#3-vps-provisioning-hetzner)
|
||||
4. [DNS Configuration](#4-dns-configuration)
|
||||
5. [Server Bootstrap](#5-server-bootstrap)
|
||||
6. [Application Build](#6-application-build)
|
||||
7. [Environment Configuration](#7-environment-configuration)
|
||||
8. [Deploy the Stack](#8-deploy-the-stack)
|
||||
9. [Caddy TLS Reverse Proxy](#9-caddy-tls-reverse-proxy)
|
||||
10. [Bunny.net CDN Setup](#10-bunnynet-cdn-setup)
|
||||
11. [Live Telemetry: Promtail → Loki](#11-live-telemetry-promtail--loki)
|
||||
12. [Grafana Dashboards](#12-grafana-dashboards)
|
||||
13. [Weekly Analytics (Bunny.net Logs)](#13-weekly-analytics-bunnynet-logs)
|
||||
14. [Verification Checklist](#14-verification-checklist)
|
||||
15. [Routine Operations](#15-routine-operations)
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
```
|
||||
Internet
|
||||
│
|
||||
▼
|
||||
Bunny.net CDN (100+ PoPs, GDPR-compliant, European)
|
||||
│ Cache-Control headers respected; query-string vary cache enabled
|
||||
│ HIT: served from edge (~5ms). MISS: forwarded to VPS
|
||||
│
|
||||
▼
|
||||
Hetzner VPS (Helsinki / Falkenstein)
|
||||
│
|
||||
├── Caddy (80/443) ─── TLS termination, HTTPS redirect, rate limiting
|
||||
│ │
|
||||
│ ├── registry:8180 — REST API (Quarkus, JVM)
|
||||
│ ├── portal:8081 — Web UI (Quarkus, Qute templates)
|
||||
│ └── grafana:3000 — Dashboards (internal access only)
|
||||
│
|
||||
├── db:5432 — PostgreSQL 16 (no public port)
|
||||
├── spider:8082 — Liveness checker (no public port)
|
||||
├── prometheus:9090 — Metrics scraper (no public port)
|
||||
└── promtail:9080 — Syslog receiver → Loki (port 5514 open)
|
||||
|
||||
Grafana Cloud (Loki)
|
||||
│ Real-time: Bunny.net → TCP Syslog → Promtail → Loki → Grafana
|
||||
└── Every CDN request visible within 1-2 seconds
|
||||
```
|
||||
|
||||
**CDN governance constraint:** Cloudflare and AWS CloudFront must never be used.
|
||||
Both are founding member candidates — operating infrastructure gives governance leverage
|
||||
regardless of the founding charter. Approved providers: Bunny.net (primary), Fastly (fallback).
|
||||
|
||||
---
|
||||
|
||||
## 2. Prerequisites
|
||||
|
||||
### Local machine
|
||||
| Tool | Version | Purpose |
|
||||
|------|---------|---------|
|
||||
| Java (Temurin) | 21 | Building application JARs |
|
||||
| Maven | 3.9+ | Build system |
|
||||
| Docker Desktop | latest | Local dev + image building |
|
||||
| `curl` | any | Script calls to Bunny.net API |
|
||||
| `python3` | 3.8+ | JSON parsing in setup scripts |
|
||||
|
||||
Install Java 21 via SDKMAN:
|
||||
```bash
|
||||
curl -s https://get.sdkman.io | bash
|
||||
sdk install java 21.0.3-tem
|
||||
```
|
||||
|
||||
### Accounts required
|
||||
| Service | What you need |
|
||||
|---------|---------------|
|
||||
| Hetzner Cloud | API token for VPS creation (optional — can provision manually) |
|
||||
| Bunny.net | Account + API key (`Account → API`) |
|
||||
| Grafana Cloud | Free tier sufficient; Loki + Prometheus endpoints |
|
||||
| Domain registrar | Control over DNS for `api-index.org` |
|
||||
|
||||
---
|
||||
|
||||
## 3. VPS Provisioning (Hetzner)
|
||||
|
||||
### Recommended spec
|
||||
```
|
||||
Type: CPX21 (3 vCPU, 4 GB RAM) — sufficient for MVP
|
||||
Location: Helsinki (hel1) or Falkenstein (fsn1)
|
||||
OS: Ubuntu 24.04 LTS
|
||||
Network: Primary IPv4 + IPv6 dual stack
|
||||
Backups: Enable automatic backups (adds 20% to monthly cost)
|
||||
```
|
||||
|
||||
The CPX21 comfortably runs the full Docker stack (registry + spider + portal + db +
|
||||
prometheus + grafana + caddy + promtail) under MVP load. Upgrade to CPX31 if Prometheus
|
||||
retention or portal traffic grows.
|
||||
|
||||
### Firewall rules (Hetzner firewall or `ufw`)
|
||||
|
||||
| Port | Protocol | Source | Purpose |
|
||||
|------|----------|--------|---------|
|
||||
| 22 | TCP | Your IP only | SSH |
|
||||
| 80 | TCP | any | Caddy HTTP→HTTPS redirect |
|
||||
| 443 | TCP+UDP | any | Caddy HTTPS + HTTP/3 |
|
||||
| 5514 | TCP | Bunny.net IPs | Promtail syslog receiver |
|
||||
| 9090 | TCP | VPS localhost | Prometheus (internal only) |
|
||||
| 3000 | TCP | VPS localhost | Grafana (access via Caddy or SSH tunnel) |
|
||||
|
||||
**Bunny.net syslog source IPs:** Bunny.net does not publish a static IP list; open 5514
|
||||
to `0.0.0.0/0` and rely on the Promtail pipeline to discard unexpected traffic.
|
||||
The syslog format is the only authentication layer needed at this volume.
|
||||
|
||||
### SSH hardening (run as root after first login)
|
||||
```bash
|
||||
# Create deploy user
|
||||
useradd -m -s /bin/bash deploy
|
||||
usermod -aG sudo,docker deploy
|
||||
|
||||
# Copy your SSH public key
|
||||
mkdir -p /home/deploy/.ssh
|
||||
echo "YOUR_PUBLIC_KEY_HERE" > /home/deploy/.ssh/authorized_keys
|
||||
chown -R deploy:deploy /home/deploy/.ssh
|
||||
chmod 700 /home/deploy/.ssh
|
||||
chmod 600 /home/deploy/.ssh/authorized_keys
|
||||
|
||||
# Disable root SSH + password auth
|
||||
sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
|
||||
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
|
||||
systemctl restart sshd
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. DNS Configuration
|
||||
|
||||
### Required records
|
||||
|
||||
| Name | Type | Value | TTL |
|
||||
|------|------|-------|-----|
|
||||
| `api-index.org` | A | VPS IPv4 | 300 |
|
||||
| `api-index.org` | AAAA | VPS IPv6 | 300 |
|
||||
| `www.api-index.org` | CNAME | `api-index.org` | 3600 |
|
||||
|
||||
Set TTL to 300 (5 min) before the cutover so propagation is fast.
|
||||
After CDN is live (Step 10), change the A/AAAA records to the Bunny.net CNAME instead.
|
||||
|
||||
### After CDN setup (replace A records)
|
||||
```
|
||||
api-index.org CNAME <bunnynet-cname>.b-cdn.net
|
||||
```
|
||||
|
||||
Caddy still handles TLS on the VPS. Bunny.net terminates the edge TLS and forwards
|
||||
to the VPS over HTTPS using the Caddy certificate.
|
||||
|
||||
---
|
||||
|
||||
## 5. Server Bootstrap
|
||||
|
||||
SSH in as `deploy` and run:
|
||||
|
||||
```bash
|
||||
# 1. System update
|
||||
sudo apt-get update && sudo apt-get upgrade -y
|
||||
|
||||
# 2. Docker (official repo — apt package is outdated)
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
|
||||
| sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
|
||||
|
||||
echo "deb [arch=$(dpkg --print-architecture) \
|
||||
signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
|
||||
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
|
||||
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
||||
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
|
||||
|
||||
# 3. Add deploy user to docker group (logout + login to apply)
|
||||
sudo usermod -aG docker deploy
|
||||
|
||||
# 4. Install Promtail (for Loki integration — see Step 11)
|
||||
PROMTAIL_VERSION="3.0.0"
|
||||
wget -q "https://github.com/grafana/loki/releases/download/v${PROMTAIL_VERSION}/promtail-linux-amd64.zip"
|
||||
unzip -q promtail-linux-amd64.zip
|
||||
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
|
||||
sudo chmod +x /usr/local/bin/promtail
|
||||
rm promtail-linux-amd64.zip
|
||||
|
||||
# 5. Clone the repository
|
||||
git clone https://gitea.your-server.example/botstandards/apix-mvp.git /opt/apix
|
||||
# Or using GitHub mirror during MVP phase:
|
||||
git clone https://github.com/your-org/apix-mvp.git /opt/apix
|
||||
|
||||
cd /opt/apix
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Application Build
|
||||
|
||||
Build JARs locally and copy to the VPS, or build directly on the VPS if Java 21 is installed.
|
||||
|
||||
### Build locally (recommended for CI phase)
|
||||
```bash
|
||||
# On your dev machine:
|
||||
cd bot-service-index/apix-mvp
|
||||
mvn clean package -DskipTests
|
||||
|
||||
# Copy artifacts to VPS
|
||||
scp apix-registry/target/quarkus-app/ deploy@<vps-ip>:/opt/apix/apix-registry/target/quarkus-app/ -r
|
||||
scp apix-spider/target/quarkus-app/ deploy@<vps-ip>:/opt/apix/apix-spider/target/quarkus-app/ -r
|
||||
scp apix-portal/target/quarkus-app/ deploy@<vps-ip>:/opt/apix/apix-portal/target/quarkus-app/ -r
|
||||
```
|
||||
|
||||
### Build on VPS (MVP shortcut)
|
||||
```bash
|
||||
# Install Java 21 on VPS
|
||||
sudo apt-get install -y wget
|
||||
wget -q "https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.3%2B9/OpenJDK21U-jdk_x64_linux_hotspot_21.0.3_9.tar.gz" \
|
||||
-O /tmp/jdk21.tar.gz
|
||||
sudo mkdir -p /opt/java
|
||||
sudo tar xzf /tmp/jdk21.tar.gz -C /opt/java
|
||||
sudo ln -sf /opt/java/jdk-21.0.3+9/bin/java /usr/local/bin/java
|
||||
sudo ln -sf /opt/java/jdk-21.0.3+9/bin/javac /usr/local/bin/javac
|
||||
|
||||
# Install Maven
|
||||
sudo apt-get install -y maven
|
||||
|
||||
# Build
|
||||
cd /opt/apix
|
||||
mvn clean package -DskipTests
|
||||
```
|
||||
|
||||
### Docker images
|
||||
|
||||
Dockerfiles are defined in WORKLOG Block 5 (I-04 to I-06) and not yet created.
|
||||
Until they exist, run the JARs directly via `quarkus:dev` or write minimal Dockerfiles:
|
||||
|
||||
```dockerfile
|
||||
# infra/Dockerfile.registry (placeholder until Block 5)
|
||||
FROM eclipse-temurin:21-jre-alpine
|
||||
WORKDIR /app
|
||||
COPY apix-registry/target/quarkus-app/ quarkus-app/
|
||||
EXPOSE 8180
|
||||
ENTRYPOINT ["java", "-jar", "quarkus-app/quarkus-run.jar"]
|
||||
```
|
||||
|
||||
Repeat for `Dockerfile.spider` (port 8082) and `Dockerfile.portal` (port 8081).
|
||||
|
||||
Build and tag:
|
||||
```bash
|
||||
cd /opt/apix
|
||||
docker build -f infra/Dockerfile.registry -t apix-registry:latest .
|
||||
docker build -f infra/Dockerfile.spider -t apix-spider:latest .
|
||||
docker build -f infra/Dockerfile.portal -t apix-portal:latest .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Environment Configuration
|
||||
|
||||
Create `/opt/apix/.env` from the template below. This file is in `.gitignore` — never commit it.
|
||||
|
||||
```bash
|
||||
# /opt/apix/.env — production values
|
||||
|
||||
# ── Database ──────────────────────────────────────────────────────────────────
|
||||
APIX_DB_USER=apix
|
||||
APIX_DB_PASSWORD=<generate: openssl rand -base64 32>
|
||||
APIX_DB_NAME=apix
|
||||
APIX_DB_PORT=5432 # Only exposed inside Docker network in production
|
||||
|
||||
# ── API security ─────────────────────────────────────────────────────────────
|
||||
# Used to authenticate write requests (POST /services, PATCH /services/*, etc.)
|
||||
# Rotate this key when onboarding new registrars.
|
||||
APIX_API_KEY=<generate: openssl rand -hex 32>
|
||||
|
||||
# ── Registry identity ─────────────────────────────────────────────────────────
|
||||
APIX_REGISTRY_BASE_URL=https://api-index.org
|
||||
APIX_REGISTRY_NAME=APIX Registry
|
||||
APIX_REGISTRY_DESCRIPTION=The open autonomous agent service discovery registry.
|
||||
|
||||
# ── Verification integrations ────────────────────────────────────────────────
|
||||
GLEIF_API_URL=https://api.gleif.org/api/v1
|
||||
OPENCORPORATES_API_KEY=<your OpenCorporates API key — leave blank if not yet contracted>
|
||||
APIX_VERIFICATION_TIMEOUT_MS=5000
|
||||
|
||||
# ── Mail signing (Ed25519) ────────────────────────────────────────────────────
|
||||
# Leave blank on first deploy — ephemeral key generated at startup.
|
||||
# Set before production: openssl genpkey -algorithm ed25519 | ...
|
||||
APIX_MAIL_SIGNING_PRIVATE_KEY=
|
||||
APIX_MAIL_SIGNING_PUBLIC_KEY=
|
||||
APIX_MAIL_SIGNING_KID=2026-05
|
||||
|
||||
# ── Spider ────────────────────────────────────────────────────────────────────
|
||||
SPIDER_INTERVAL_MINUTES=15
|
||||
|
||||
# ── Grafana ───────────────────────────────────────────────────────────────────
|
||||
GRAFANA_ADMIN_PASSWORD=<generate: openssl rand -base64 16>
|
||||
GRAFANA_ROOT_URL=https://grafana.api-index.org # or http://localhost:3000 if SSH tunnel only
|
||||
|
||||
# ── Logging ───────────────────────────────────────────────────────────────────
|
||||
LOG_LEVEL=INFO
|
||||
```
|
||||
|
||||
Generate secrets in one pass:
|
||||
```bash
|
||||
echo "APIX_DB_PASSWORD=$(openssl rand -base64 32)"
|
||||
echo "APIX_API_KEY=$(openssl rand -hex 32)"
|
||||
echo "GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 16)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Deploy the Stack
|
||||
|
||||
```bash
|
||||
cd /opt/apix/infra
|
||||
|
||||
# Start everything
|
||||
docker compose --env-file ../.env up -d
|
||||
|
||||
# Watch startup logs
|
||||
docker compose logs -f --tail=50
|
||||
|
||||
# Verify all services are healthy
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
Expected healthy services after ~60 seconds:
|
||||
|
||||
| Service | Health endpoint | Expected |
|
||||
|---------|----------------|---------|
|
||||
| `db` | `pg_isready` | healthy |
|
||||
| `registry` | `http://localhost:8180/q/health/live` | `{"status":"UP"}` |
|
||||
| `spider` | `http://localhost:8082/q/health/live` | `{"status":"UP"}` |
|
||||
| `portal` | `http://localhost:8081/q/health/live` | `{"status":"UP"}` |
|
||||
| `prometheus` | `http://localhost:9090/-/healthy` | `Prometheus Server is Healthy.` |
|
||||
| `grafana` | `http://localhost:3000/api/health` | `{"database":"ok"}` |
|
||||
|
||||
Quick smoke test (from VPS):
|
||||
```bash
|
||||
# Registry root (HATEOAS navigation)
|
||||
curl -s http://localhost:8180/ | python3 -m json.tool
|
||||
|
||||
# Metrics endpoint (Prometheus scrape target)
|
||||
curl -s http://localhost:8180/q/metrics | grep apix_search
|
||||
|
||||
# Search endpoint
|
||||
curl -s "http://localhost:8180/services?capability=nlp" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### Liquibase note
|
||||
|
||||
Liquibase runs automatically at startup (`quarkus.liquibase.migrate-at-start=true`).
|
||||
If the changelog is missing (`db/changelog/db.changelog-master.xml`), the registry will
|
||||
fail to start. Check logs with `docker compose logs registry` and ensure migrations
|
||||
are present (WORKLOG Block 1 / C-20 to C-24).
|
||||
|
||||
---
|
||||
|
||||
## 9. Caddy TLS Reverse Proxy
|
||||
|
||||
Create `infra/Caddyfile`:
|
||||
|
||||
```caddy
|
||||
# infra/Caddyfile
|
||||
|
||||
api-index.org {
|
||||
# Public API — registry
|
||||
handle /services* { reverse_proxy registry:8180 }
|
||||
handle /devices* { reverse_proxy registry:8180 }
|
||||
handle /organizations* { reverse_proxy registry:8180 }
|
||||
handle /mail-signing-keys { reverse_proxy registry:8180 }
|
||||
handle / { reverse_proxy registry:8180 }
|
||||
|
||||
# Caddy does not forward /q/* to CDN — Quarkus internals only
|
||||
handle /q/* { reverse_proxy registry:8180 }
|
||||
|
||||
# Rate limiting (requires caddy-ratelimit plugin or enterprise)
|
||||
# Basic protection: Caddy's built-in connection limit
|
||||
header {
|
||||
Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
||||
X-Content-Type-Options "nosniff"
|
||||
X-Frame-Options "DENY"
|
||||
}
|
||||
|
||||
log {
|
||||
output file /var/log/caddy/api-index.log
|
||||
format json
|
||||
}
|
||||
}
|
||||
|
||||
# Portal — separate subdomain (optional)
|
||||
portal.api-index.org {
|
||||
reverse_proxy portal:8081
|
||||
header Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
||||
}
|
||||
|
||||
# Grafana — restrict to internal access or require basic auth
|
||||
grafana.api-index.org {
|
||||
basicauth {
|
||||
# htpasswd -nb admin <password> — generate and paste hash here
|
||||
admin $2a$14$REPLACE_WITH_BCRYPT_HASH
|
||||
}
|
||||
reverse_proxy grafana:3000
|
||||
}
|
||||
```
|
||||
|
||||
Caddy fetches TLS certificates from Let's Encrypt automatically on first request.
|
||||
No manual certificate management needed.
|
||||
|
||||
Rebuild the `caddy` container to pick up the new Caddyfile:
|
||||
```bash
|
||||
cd /opt/apix/infra
|
||||
docker compose restart caddy
|
||||
docker compose logs caddy -f
|
||||
```
|
||||
|
||||
Verify TLS:
|
||||
```bash
|
||||
curl -sv https://api-index.org/ 2>&1 | grep -E "SSL|certificate|subject"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Bunny.net CDN Setup
|
||||
|
||||
The CDN sits in front of Caddy/registry and handles ~95% of read traffic from cache.
|
||||
|
||||
### One-time setup
|
||||
```bash
|
||||
cd /opt/apix
|
||||
|
||||
# With Loki log forwarding (strongly recommended for observability):
|
||||
BUNNYNET_API_KEY=your-key \
|
||||
ORIGIN_URL=https://api-index.org \
|
||||
CUSTOM_HOSTNAME=api-index.org \
|
||||
SYSLOG_HOST=<vps-public-ip> \
|
||||
SYSLOG_PORT=5514 \
|
||||
./scripts/setup-bunnynet.sh
|
||||
```
|
||||
|
||||
The script:
|
||||
1. Creates a pull zone pointing at `https://api-index.org`
|
||||
2. Enables query-string vary cache (so `?capability=nlp` and `?capability=translation`
|
||||
are cached as separate entries — critical for correct cache behavior)
|
||||
3. Sets edge TTL to follow origin `Cache-Control` headers (registry sets `max-age=30`
|
||||
on `/services` and `/devices`; `max-age=60` on `/`)
|
||||
4. Adds the `api-index.org` custom hostname
|
||||
5. Adds an edge rule to bypass cache for `/q/*` (Quarkus health/metrics endpoints)
|
||||
6. Enables real-time syslog forwarding to Promtail (when `SYSLOG_HOST` is set)
|
||||
7. Prints the CNAME value for your DNS record
|
||||
|
||||
### Update DNS to point to CDN edge
|
||||
After the script prints the CDN hostname:
|
||||
```
|
||||
api-index.org CNAME apix-registry.b-cdn.net
|
||||
```
|
||||
|
||||
Remove the A/AAAA records pointing directly to the VPS. The VPS is now origin-only.
|
||||
|
||||
### Verify CDN is caching
|
||||
|
||||
```bash
|
||||
# First request — cache MISS (origin hit)
|
||||
curl -sI https://api-index.org/services?capability=nlp | grep -i "cache\|x-cache\|age"
|
||||
|
||||
# Second request within 30s — cache HIT
|
||||
curl -sI https://api-index.org/services?capability=nlp | grep -i "cache\|x-cache\|age"
|
||||
|
||||
# From an Asian machine or VPN endpoint (tests geographic edge)
|
||||
curl -w "Total: %{time_total}s\n" -o /dev/null -s https://api-index.org/services?capability=nlp
|
||||
# Target: <20ms after warm-up
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Live Telemetry: Promtail → Loki
|
||||
|
||||
Provides real-time telemetry of all CDN traffic (hits + misses) during demos.
|
||||
Origin Prometheus only sees cache misses — Loki sees everything.
|
||||
|
||||
### Grafana Cloud Loki credentials
|
||||
|
||||
In Grafana Cloud (`grafana.com`):
|
||||
1. Go to your stack → Loki → Details
|
||||
2. Note the **Push URL** (e.g. `https://logs-prod-eu-west-0.grafana.net/loki/api/v1/push`)
|
||||
3. Create a service account with `logs:write` scope → copy the token
|
||||
|
||||
### Configure Promtail
|
||||
|
||||
```bash
|
||||
# Copy config to system path
|
||||
sudo cp /opt/apix/scripts/promtail-cdn-logs.yaml /etc/promtail/cdn-logs.yaml
|
||||
|
||||
# Fill in credentials
|
||||
sudo sed -i 's|https://LOKI_PUSH_URL/loki/api/v1/push|https://logs-prod-eu-west-0.grafana.net/loki/api/v1/push|g' \
|
||||
/etc/promtail/cdn-logs.yaml
|
||||
sudo sed -i 's|"LOKI_USERNAME"|"123456"|g' /etc/promtail/cdn-logs.yaml # stack user ID
|
||||
sudo sed -i 's|"LOKI_PASSWORD"|"your-token-here"|g' /etc/promtail/cdn-logs.yaml
|
||||
```
|
||||
|
||||
Or edit directly: `sudo nano /etc/promtail/cdn-logs.yaml` — replace the three
|
||||
`LOKI_PUSH_URL`, `LOKI_USERNAME`, `LOKI_PASSWORD` placeholders.
|
||||
|
||||
### Create systemd service
|
||||
|
||||
```bash
|
||||
sudo tee /etc/systemd/system/promtail.service > /dev/null <<'EOF'
|
||||
[Unit]
|
||||
Description=Promtail — Bunny.net CDN log forwarder
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
User=nobody
|
||||
Group=nogroup
|
||||
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/cdn-logs.yaml
|
||||
Restart=on-failure
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now promtail
|
||||
sudo systemctl status promtail
|
||||
```
|
||||
|
||||
### Verify the pipeline end-to-end
|
||||
|
||||
```bash
|
||||
# 1. Make a test request through the CDN
|
||||
curl -s "https://api-index.org/services?capability=nlp" > /dev/null
|
||||
|
||||
# 2. Check Promtail received it (should appear within 1-2 seconds)
|
||||
sudo journalctl -u promtail -f --no-pager
|
||||
|
||||
# 3. In Grafana Explore (Loki datasource):
|
||||
# {job="apix-cdn"} — should show a log line from the test request
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Grafana Dashboards
|
||||
|
||||
### Add Loki datasource to Grafana
|
||||
|
||||
The stack's provisioned Prometheus datasource is auto-loaded. Add Loki manually or via provisioning:
|
||||
|
||||
```yaml
|
||||
# infra/grafana/provisioning/datasources/loki.yml
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: Loki
|
||||
type: loki
|
||||
access: proxy
|
||||
url: https://logs-prod-eu-west-0.grafana.net
|
||||
basicAuth: true
|
||||
basicAuthUser: "123456" # Grafana Cloud stack user ID
|
||||
secureJsonData:
|
||||
basicAuthPassword: "your-loki-token"
|
||||
isDefault: false
|
||||
editable: false
|
||||
```
|
||||
|
||||
Restart Grafana to apply:
|
||||
```bash
|
||||
docker compose restart grafana
|
||||
```
|
||||
|
||||
### Import the OpenClaw demo dashboard
|
||||
|
||||
```bash
|
||||
# Copy the dashboard JSON to the provisioning directory
|
||||
cp /opt/apix/scripts/grafana-demo-dashboard.json \
|
||||
/opt/apix/infra/grafana/provisioning/dashboards/demo-openclaw.json
|
||||
```
|
||||
|
||||
Grafana auto-discovers dashboards in the provisioning path (30s poll interval per `provider.yml`).
|
||||
No manual import needed.
|
||||
|
||||
Alternatively, import via UI:
|
||||
1. Grafana → Dashboards → Import
|
||||
2. Upload `scripts/grafana-demo-dashboard.json`
|
||||
3. Select the Loki and Prometheus datasources when prompted
|
||||
|
||||
### Dashboard refresh for demo sessions
|
||||
|
||||
The demo dashboard is pre-configured with:
|
||||
- **Refresh:** 5 seconds
|
||||
- **Time range:** Last 15 minutes
|
||||
- **Auto-play:** Enable via Grafana's kiosk mode for the demo screen
|
||||
|
||||
Kiosk mode URL (hides nav bar):
|
||||
```
|
||||
https://grafana.api-index.org/d/apix-demo-openclaw/apix-registry-demo?kiosk&refresh=5s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. Weekly Analytics (Bunny.net Logs)
|
||||
|
||||
Bunny.net stores gzip access logs (one file per day). The `query-report.sh` script
|
||||
downloads them, parses them, and produces a capability frequency report.
|
||||
|
||||
```bash
|
||||
# Basic report — last 7 days
|
||||
BUNNYNET_API_KEY=your-key \
|
||||
PULL_ZONE_ID=$(cat /opt/apix/.bunnynet-pull-zone-id) \
|
||||
./scripts/query-report.sh
|
||||
```
|
||||
|
||||
With Prometheus Pushgateway (builds a weekly time-series in Grafana):
|
||||
```bash
|
||||
BUNNYNET_API_KEY=your-key \
|
||||
PULL_ZONE_ID=$(cat /opt/apix/.bunnynet-pull-zone-id) \
|
||||
PROMETHEUS_PUSH_URL=https://pushgateway.your-grafana.example/metrics/job/apix-cdn-report \
|
||||
DAYS=7 \
|
||||
./scripts/query-report.sh
|
||||
```
|
||||
|
||||
### Weekly cron job (on VPS)
|
||||
|
||||
Pass `INSTALL_CRON=true` when running `setup-bunnynet.sh` and the cron entry is
|
||||
installed automatically — no manual `crontab -e` needed:
|
||||
|
||||
```bash
|
||||
BUNNYNET_API_KEY=your-key \
|
||||
ORIGIN_URL=https://api-index.org \
|
||||
INSTALL_CRON=true \
|
||||
./scripts/setup-bunnynet.sh
|
||||
```
|
||||
|
||||
The script installs a `crontab` entry for the current user (Mondays 06:00) and
|
||||
deduplicates on re-run — safe to call again after rotating the API key.
|
||||
Verify with: `crontab -l | grep query-report`
|
||||
|
||||
The report answers:
|
||||
- Which capabilities are queried most (all requests, including CDN hits)
|
||||
- Cache hit ratio per endpoint
|
||||
- Geographic distribution (PoP breakdown)
|
||||
- Top query string combinations
|
||||
|
||||
---
|
||||
|
||||
## 14. Verification Checklist
|
||||
|
||||
Run through this after each deployment.
|
||||
|
||||
### Registry
|
||||
```bash
|
||||
# HATEOAS root
|
||||
curl -s https://api-index.org/ | python3 -m json.tool
|
||||
|
||||
# Search (returns empty array if no services registered yet — correct)
|
||||
curl -s "https://api-index.org/services?capability=nlp"
|
||||
|
||||
# Health
|
||||
curl -s https://api-index.org/q/health | python3 -m json.tool
|
||||
|
||||
# Cache-Control header on search endpoint
|
||||
curl -sI "https://api-index.org/services?capability=nlp" | grep -i cache-control
|
||||
# Expected: Cache-Control: public, max-age=30
|
||||
```
|
||||
|
||||
### CDN
|
||||
```bash
|
||||
# Second request should be a cache HIT
|
||||
curl -sI "https://api-index.org/services?capability=nlp" | grep -i cache
|
||||
# Expected: X-Cache: HIT (or similar from Bunny.net)
|
||||
|
||||
# Edge latency test — run from a machine outside Germany
|
||||
curl -w "DNS: %{time_namelookup}s Connect: %{time_connect}s Total: %{time_total}s\n" \
|
||||
-o /dev/null -s "https://api-index.org/services?capability=nlp"
|
||||
# Target: Total <20ms from Asia/US after warm-up
|
||||
```
|
||||
|
||||
### Observability
|
||||
```bash
|
||||
# Prometheus scraping registry
|
||||
curl -s http://localhost:9090/api/v1/targets | python3 -c \
|
||||
"import sys,json; [print(t['labels']['job'], t['health']) for t in json.load(sys.stdin)['data']['activeTargets']]"
|
||||
|
||||
# Loki receiving CDN logs
|
||||
# In Grafana Explore: {job="apix-cdn"} | last 5 min should show entries
|
||||
```
|
||||
|
||||
### TLS
|
||||
```bash
|
||||
curl -sv https://api-index.org/ 2>&1 | grep -E "issuer|subject|expire"
|
||||
# Should show Let's Encrypt issuer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 15. Routine Operations
|
||||
|
||||
### Update application
|
||||
```bash
|
||||
cd /opt/apix
|
||||
|
||||
# Pull latest code
|
||||
git pull origin main
|
||||
|
||||
# Rebuild affected images
|
||||
docker build -f infra/Dockerfile.registry -t apix-registry:latest .
|
||||
|
||||
# Rolling restart
|
||||
docker compose up -d registry
|
||||
docker compose logs registry -f --tail=50
|
||||
```
|
||||
|
||||
### Rotate API key
|
||||
```bash
|
||||
NEW_KEY=$(openssl rand -hex 32)
|
||||
# Update .env
|
||||
sed -i "s/^APIX_API_KEY=.*/APIX_API_KEY=${NEW_KEY}/" /opt/apix/.env
|
||||
# Restart registry to pick up new key
|
||||
docker compose up -d registry
|
||||
echo "New key: ${NEW_KEY}"
|
||||
# Distribute to all registrar clients before the old key expires
|
||||
```
|
||||
|
||||
### Database backup
|
||||
```bash
|
||||
# Manual backup
|
||||
docker exec apix-infra-db-1 pg_dump -U apix apix | gzip > /opt/apix/backups/apix-$(date +%Y%m%d).sql.gz
|
||||
|
||||
# Automated daily backup via cron
|
||||
0 2 * * * docker exec apix-infra-db-1 pg_dump -U apix apix | gzip \
|
||||
> /opt/apix/backups/apix-$(date +\%Y\%m\%d).sql.gz
|
||||
|
||||
# Restore
|
||||
gunzip -c /opt/apix/backups/apix-20260101.sql.gz | docker exec -i apix-infra-db-1 psql -U apix apix
|
||||
```
|
||||
|
||||
### View logs
|
||||
```bash
|
||||
# Live registry logs
|
||||
docker compose -f /opt/apix/infra/docker-compose.yml logs registry -f
|
||||
|
||||
# Use the convenience scripts (from project root)
|
||||
./scripts/logs.sh registry
|
||||
./scripts/logs.sh spider
|
||||
./scripts/logs.sh portal
|
||||
|
||||
# Promtail (CDN log forwarder)
|
||||
sudo journalctl -u promtail -f
|
||||
```
|
||||
|
||||
### Purge CDN cache (after deploying schema changes)
|
||||
```bash
|
||||
curl -sf -X POST "https://api.bunny.net/pullzone/$(cat /opt/apix/.bunnynet-pull-zone-id)/purgeCache" \
|
||||
-H "AccessKey: ${BUNNYNET_API_KEY}"
|
||||
```
|
||||
|
||||
### Stop / restart the full stack
|
||||
```bash
|
||||
cd /opt/apix/infra
|
||||
./scripts/stop.sh # graceful stop
|
||||
./scripts/restart.sh # stop + start
|
||||
./scripts/reset.sh # WARNING: drops all volumes including DB data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Environment Variable Reference
|
||||
|
||||
All variables accepted by the `registry` container, sourced from `.env`:
|
||||
|
||||
| Variable | Default | Required | Description |
|
||||
|----------|---------|----------|-------------|
|
||||
| `QUARKUS_DATASOURCE_JDBC_URL` | `jdbc:postgresql://db:5432/apix` | yes | Database JDBC URL |
|
||||
| `QUARKUS_DATASOURCE_USERNAME` | `apix` | yes | DB username |
|
||||
| `QUARKUS_DATASOURCE_PASSWORD` | `apix` | yes | DB password — use a strong value |
|
||||
| `APIX_API_KEY` | `dev-insecure-key-change-in-prod` | yes | Write-endpoint auth key |
|
||||
| `APIX_REGISTRY_BASE_URL` | `http://localhost:8180` | yes | Used in HATEOAS links |
|
||||
| `GLEIF_API_URL` | `https://api.gleif.org/api/v1` | no | O2 verification: GLEIF REST API |
|
||||
| `OPENCORPORATES_API_KEY` | _(blank)_ | no | O2 verification: OpenCorporates |
|
||||
| `APIX_VERIFICATION_TIMEOUT_MS` | `5000` | no | HTTP timeout for verification calls |
|
||||
| `APIX_MAIL_SIGNING_PRIVATE_KEY` | _(blank)_ | no | Ed25519 private key, Base64; ephemeral if blank |
|
||||
| `APIX_MAIL_SIGNING_PUBLIC_KEY` | _(blank)_ | no | Ed25519 public key, Base64 |
|
||||
| `APIX_MAIL_SIGNING_KID` | `dev` | no | Key ID in signed payloads; rotate every 6 months |
|
||||
| `SANCTIONS_CACHE_PATH` | `./sanctions-cache` | no | Local path for sanctions list cache |
|
||||
| `LOG_LEVEL` | `INFO` | no | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
|
||||
Reference in New Issue
Block a user