Commit graph

22 commits

Author SHA1 Message Date
44ccdf4882 refactor: remove tools server, Vaultwarden, monitoring stack; rename plane→hub
Some checks failed
CI/CD / syntax-check (push) Successful in 1m30s
CI/CD / deploy (push) Failing after 6m8s
- Remove tools server entirely (roles/tools, playbooks/tools.yml, CI deploy step)
- Remove Vaultwarden (already absent from compose, clean up vars)
- Remove node-exporter, cadvisor, promtail from main stack
- Remove grafana/uptime-kuma Traefik routes (pointed to tools)
- Remove monitoring network from docker-compose
- Remove tools vault vars (grafana_admin_password, alertmanager telegram)
- Rename domain_plane: plane.walava.io → hub.walava.io
- Update CI workflow to only deploy main server
- Update STATUS.md and BACKLOG.md to reflect current state
2026-03-27 19:05:19 +07:00
8feff04136 fix: use curl for Docmost health check, check Docker health status
All checks were successful
CI/CD / deploy (push) Successful in 13m46s
CI/CD / syntax-check (push) Successful in 1m0s
- wget not available in Docmost Node.js image → switch to curl
- Ansible now checks Docker health status instead of exec-ing into container
- Increased healthcheck start_period to 60s and retries to 10 for DB migrations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:09:14 +07:00
9f85988c1f fix: increase Docmost health check retries to 30 (5 min total)
Some checks failed
CI/CD / deploy (push) Failing after 13m46s
CI/CD / syntax-check (push) Successful in 59s
First deploy needs time for DB migrations and initial setup.
30×10s = 300s gives enough buffer for cold start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:48:25 +07:00
472c2b944b feat: replace Outline with Docmost
Some checks failed
CI/CD / syntax-check (push) Successful in 1m0s
CI/CD / deploy (push) Failing after 5m1s
- Replace outline/outline-db/outline-redis with docmost/docmost-db/docmost-redis
- Update Traefik route: wiki → http://docmost:3000
- Update S3 bucket: walava-outline → walava-docmost (new bucket created: 481385)
- Remove env.outline.j2 deploy task (Docmost config is inline in compose)
- Update backup script: outline.sql.gz → docmost.sql.gz
- Update CORS task for walava-docmost bucket
- Add vault_docmost_app_secret + vault_docmost_db_password secrets
- Remove outline_mcp_image (no longer needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:31:51 +07:00
f0c3fbbe1b fix: auto-bootstrap Outline team on fresh install
All checks were successful
CI/CD / deploy (push) Successful in 15m8s
CI/CD / syntax-check (push) Successful in 1m4s
On a fresh DB Outline shows a blank login page because there is no team
and emailSigninEnabled = false. Add idempotent Ansible tasks that:
1. Create the 'Visual' team if none exists
2. Set guestSignin=true so email magic-link login works
Triggered by: server rebuild lost Outline DB (no backup existed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:14:14 +07:00
fde51352d7 feat: migrate monitoring to tools server, fix Outline S3 uploads
Monitoring stack (Prometheus, AlertManager, Grafana, Loki, Uptime Kuma)
moved from main to tools server. Prometheus now scrapes main exporters
over network (ip_main:9100/8080). Promtail pushes logs to ip_tools:3100.
Traefik routes for dash/status.walava.io updated to ip_tools. discord-bot
PROMETHEUS_URL updated to http://ip_tools:9090.

Outline S3 fix: remove AWS_S3_ACL=private (Timeweb doesn't support
per-object ACLs — caused upload failures). Add CORS configuration task
for browser-side presigned uploads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 04:10:28 +07:00
36be9fb33d chore: remove SMTP relay, clean up tools role after Outline/n8n migration to main
- Remove smtp-relay (postfix) container — Outline now on main, uses Resend directly
- Remove UFW port 1025 rule (SMTP relay no longer needed)
- Remove postfix-relay from image pull list
- Clean up tools role: remove Outline/n8n/env.j2, simplify tasks/main.yml
- tools docker-compose now empty (pending monitoring migration)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:10:56 +07:00
fba7eb68ea fix: add SMTP relay on main server for Outline email auth
Some checks failed
CI/CD / deploy (push) Blocked by required conditions
CI/CD / syntax-check (push) Has been cancelled
tools-server (85.193.83.9) has outbound SMTP ports 465/587 blocked by VPS
provider. Added tecnativa/postfix-relay container on main server that relays
to smtp.resend.com:587. Outline now uses ip_main:1025 as SMTP host.

- UFW rule: allow port 1025 from ip_tools only
- Remove stale authelia_image from docker pull list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:35:30 +07:00
c1a71b7f50 fix: add remove_orphans to docker compose tasks
All checks were successful
CI/CD / syntax-check (push) Successful in 1m33s
CI/CD / deploy (push) Successful in 14m0s
Ensures removed services (vaultwarden, mailserver, snappymail)
are automatically stopped on next deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:00:17 +07:00
4090d8289b fix: add username/icon_url to Forgejo Discord webhook config
All checks were successful
CI/CD / syntax-check (push) Successful in 1m8s
CI/CD / deploy (push) Successful in 13m10s
Prevents the 'meta json: readObjectStart' error on fresh deploys.
Existing hooks already fixed via direct DB update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 06:10:56 +07:00
4b00804f3e fix: use forgejo_api_token for webhook creation, cover both repos
Some checks failed
CI/CD / syntax-check (push) Successful in 1m6s
CI/CD / deploy (push) Has been cancelled
- Add vault_forgejo_api_token (Personal Access Token with write:repository)
- Ansible task now creates Discord webhook on both jack/infra and jack/discord-bot
- Webhooks already created manually for this deploy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:53:25 +07:00
d83ead2cbe feat(discord): integrate alerts and deploy notifications
Some checks failed
CI/CD / syntax-check (push) Successful in 1m3s
CI/CD / deploy (push) Has been cancelled
- Add discord_webhook_alerts and discord_webhook_deploys to vault + main.yml
- AlertManager: send alerts to both Telegram and Discord #alerts channel
- Forgejo: auto-create Discord webhook on repo pushes → #deploys channel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:58:12 +07:00
a620bb381c fix: remove all remaining Vaultwarden references after service removal
Some checks failed
CI/CD / syntax-check (push) Successful in 1m1s
CI/CD / deploy (push) Has been cancelled
- tasks/main.yml: remove vaultwarden_image from image pull list
- tasks/directories.yml: remove vaultwarden/data directory creation
- backup.sh.j2: remove Vaultwarden backup/restore section and stop command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:49:12 +07:00
66b70827df chore: full project cleanup + documentation
Some checks failed
CI/CD / syntax-check (push) Successful in 1m31s
CI/CD / deploy (push) Has been cancelled
Syncthing removal (was already decided, now fully removed):
- roles/base/tasks/firewall.yml: remove 3 UFW rules (ports 22000/21027)
- inventory/group_vars/all/main.yml: remove domain_sync, domain_mon, syncthing_basic_auth_htpasswd
- roles/services/templates/env.j2: remove DOMAIN_SYNC
- roles/services/templates/authelia/configuration.yml.j2: remove Syncthing 2FA rule
- roles/services/tasks/directories.yml: remove syncthing/config and syncthing/data dirs
- roles/services/defaults/main.yml: remove syncthing_image
- roles/services/tasks/main.yml: remove syncthing image pull

Security hardening:
- inventory/group_vars/all/main.yml: move cloudflare_zone_id to vault
- inventory/group_vars/all/vault.yml: add vault_cloudflare_zone_id

.gitignore improvements:
- add *.env, acme.json, *.log, editor dirs, venv, temp files

Documentation (new):
- docs/STATUS.md: all services, servers, known issues
- docs/BACKLOG.md: prioritized task list, done/todo
- docs/DECISIONS.md: architecture decisions and rationale
- CLAUDE.md: rewritten with read-first docs, rules, full arch reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 19:58:12 +07:00
6279bcb9b4 fix: remove cs-firewall-bouncer from image pre-pull list
Some checks failed
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Failing after 9m44s
crowdsecurity/cs-firewall-bouncer:v0.0.31 does not exist on Docker Hub.
The bouncer service was already removed from docker-compose.yml.
Remove from pre-pull list and defaults to unblock CI/CD deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:39:31 +07:00
a344998405 feat: add uptime-kuma pull, logrotate deploy task, logrotate package
Some checks failed
CI/CD / syntax-check (push) Successful in 41s
CI/CD / deploy (push) Failing after 39s
- Add uptime_kuma_image to image pull loop in services/tasks/main.yml
- Add logrotate deploy task to services/tasks/configs.yml
- Add logrotate package to base_packages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:54:24 +07:00
aa9706bbc4 feat: comprehensive security hardening
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 59s
Traefik:
- Enable access logs → /var/log/traefik/access.log (needed for CrowdSec)
- Add global security headers middleware: HSTS, X-Frame-Options, CSP,
  nosniff, XSS filter, referrer policy, permissions policy
- Add rate limiting: default 100/s, API 30/s, admin 10/s (strict)
- Add Authelia ForwardAuth middleware for SSO integration

CrowdSec (new service):
- Analyzes Traefik access logs + auth.log in real time
- Community IP reputation blocklist (crowdsecurity/traefik + http-cve)
- Firewall bouncer: bans malicious IPs at kernel level (iptables)

Authelia (new service, auth.csrx.ru):
- 2FA/SSO portal with TOTP (Google Authenticator)
- Protects: traefik.csrx.ru, sync.csrx.ru, /god-mode/ in Plane
- Session: 12h expiry, 30m inactivity, Redis backend
- argon2id password hashing

Container security:
- Add security_opt: no-new-privileges to traefik, vaultwarden,
  forgejo, grafana, authelia

CI/CD security:
- Remove hardcoded server IP 87.249.49.32 from workflow
- Use SSH_KNOWN_HOSTS secret instead of ssh-keyscan (prevents MITM)
- Added SSH_KNOWN_HOSTS secret to Forgejo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:44:54 +07:00
6ebd237894 feat: major infrastructure improvements
Some checks failed
CI/CD / deploy (push) Has been cancelled
CI/CD / syntax-check (push) Successful in 1m7s
Reliability:
- Add swap role (2GB, swappiness=10, idempotent via /etc/fstab)
- Add mem_limit to plane-worker (512m) and plane-beat (256m)
- Add health checks to all services (traefik, vaultwarden, forgejo,
  plane-*, syncthing, prometheus, grafana, loki)

Code quality:
- Remove Traefik Docker labels (file provider used, labels were dead code)
- Add comment explaining file provider architecture

Observability:
- Add AlertManager with Telegram notifications
- Add Prometheus alert rules: CPU, RAM, disk, swap, container health
- Add Loki + Promtail for centralized log aggregation
- Add Loki datasource to Grafana
- Enable Traefik /ping endpoint for health checks

Backups:
- Add backup role: pg_dump for forgejo + plane DBs, tar for
  vaultwarden and forgejo data
- 7-day retention, daily cron at 03:00
- Backup script at /usr/local/bin/backup-services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:28:16 +07:00
972a76db4c feat: add monitoring stack (Prometheus + Grafana + cAdvisor + Node Exporter)
All checks were successful
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Successful in 6m51s
- Adds monitoring Docker network (internal)
- Prometheus scrapes node-exporter (host metrics) and cAdvisor (containers)
  with 30-day retention
- Grafana exposed at dashboard.csrx.ru with pre-provisioned datasource
  and two dashboards: Node Exporter Full (1860) and cAdvisor (14282)
- Vault secret: vault_grafana_admin_password

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:05:34 +07:00
efbbc3cac5 fix: add plane-admin, plane-space and configure instance URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 2m31s
CI/CD / deploy (push) Has been cancelled
New Plane stable requires 3 frontend services:
- plane-admin (nginx:80) for /god-mode/ routes
- plane-space (node:3000) for /spaces/ routes
- plane-web (nginx:80) for all other routes

Also add APP/ADMIN/SPACE_BASE_URL env vars to plane-api so the
setup wizard knows where to redirect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 02:14:34 +07:00
d2d5f12d5a Add Forgejo Actions CI/CD with act_runner
Some checks failed
CI/CD / syntax-check (push) Failing after 12s
CI/CD / deploy (push) Has been skipped
- Add gitea/act_runner:0.3.0 to docker-compose stack on runner-jobs network
- Add act_runner config template and directory provisioning
- Add FORGEJO_RUNNER_TOKEN to env template
- Add CI deploy SSH public key to authorized_keys via base role
- Create .forgejo/workflows/deploy.yml: syntax-check on PR, deploy on push to master
- Add .claude/launch.json with ansible-playbook configurations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 21:28:15 +07:00
a1b97f3e4b Initial commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:39:26 +07:00