Commit graph

30 commits

Author SHA1 Message Date
f4688ed8be fix: disable outline-mcp until image is built and pushed to registry
outline-mcp uses git.walava.io/jack/outline-mcp:latest which doesn't
exist in Forgejo registry yet (Forgejo itself wasn't running).
Comment out the service; re-enable after building the image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 06:05:30 +07:00
fde51352d7 feat: migrate monitoring to tools server, fix Outline S3 uploads
Monitoring stack (Prometheus, AlertManager, Grafana, Loki, Uptime Kuma)
moved from main to tools server. Prometheus now scrapes main exporters
over network (ip_main:9100/8080). Promtail pushes logs to ip_tools:3100.
Traefik routes for dash/status.walava.io updated to ip_tools. discord-bot
PROMETHEUS_URL updated to http://ip_tools:9090.

Outline S3 fix: remove AWS_S3_ACL=private (Timeweb doesn't support
per-object ACLs — caused upload failures). Add CORS configuration task
for browser-side presigned uploads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 04:10:28 +07:00
d6015b76a3 fix: add proxy network to Outline and n8n for outbound internet access
Outline needs proxy network for SMTP (Resend) and S3 (Timeweb).
n8n needs proxy network for external API calls in workflows.
Both were only on backend (internal:true) so DNS/TCP to internet was blocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:54:19 +07:00
36be9fb33d chore: remove SMTP relay, clean up tools role after Outline/n8n migration to main
- Remove smtp-relay (postfix) container — Outline now on main, uses Resend directly
- Remove UFW port 1025 rule (SMTP relay no longer needed)
- Remove postfix-relay from image pull list
- Clean up tools role: remove Outline/n8n/env.j2, simplify tasks/main.yml
- tools docker-compose now empty (pending monitoring migration)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:10:56 +07:00
489791403c feat: migrate Outline + n8n to main server, rename S3 buckets to walava-*
- Add Outline, outline-db, outline-redis, n8n, outline-mcp containers to main docker-compose
- Add env.outline.j2 template with Resend SMTP and S3 (walava-outline bucket)
- Update Traefik routes: wiki → outline:3000, auto → n8n:5678 (local, not cross-server)
- Rename S3 buckets: visual-backup → walava-backup, visual-outline → walava-outline
- Extend backup.sh.j2: add Outline DB, n8n, Plane MinIO to backup scope
- Add outline_image, n8n_image, outline_mcp_image to services/defaults
- Remove Authelia config deployment tasks from configs.yml
- Add outline-internal and n8n-internal networks to docker-compose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:04:54 +07:00
fba7eb68ea fix: add SMTP relay on main server for Outline email auth
Some checks failed
CI/CD / deploy (push) Blocked by required conditions
CI/CD / syntax-check (push) Has been cancelled
tools-server (85.193.83.9) has outbound SMTP ports 465/587 blocked by VPS
provider. Added tecnativa/postfix-relay container on main server that relays
to smtp.resend.com:587. Outline now uses ip_main:1025 as SMTP host.

- UFW rule: allow port 1025 from ip_tools only
- Remove stale authelia_image from docker pull list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:35:30 +07:00
d635522199 feat: remove Authelia, protect dashboard with basic auth
Some checks are pending
CI/CD / syntax-check (push) Waiting to run
CI/CD / deploy (push) Blocked by required conditions
Authelia was unused overhead — only traefik-dashboard and plane /god-mode/
were behind it. Dashboard now uses traefik-auth (basic auth). /god-mode/
uses rate-limit-strict only.

Removes: authelia + authelia-redis containers, authelia-internal network,
authelia_data volume, authelia router/service/forwardAuth middleware.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:50:41 +07:00
fb769b2f8c feat: migrate domain from csrx.ru to walava.io
Some checks failed
CI/CD / syntax-check (push) Successful in 1m44s
CI/CD / deploy (push) Failing after 20m21s
- domain_base changed to walava.io
- domain_n8n now auto.walava.io
- Added domain_landing for walava.io root
- Added walava-web landing page container + Traefik route
- Updated Cloudflare token/zone_id for walava.io account
- Updated ACME email to walava@tutamail.com
- Fixed discord-bot image to use domain_base variable
- DNS records already created in Cloudflare

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:17:00 +07:00
f3f665a5be fix: add DISCORD_APP_ID env var to discord-bot container
Some checks failed
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:42:55 +07:00
0315ee6a72 feat: add Discord bot service + workflow_dispatch trigger
All checks were successful
CI/CD / syntax-check (push) Successful in 1m5s
CI/CD / deploy (push) Successful in 14m7s
- Add discord-bot container to docker-compose (uses git.csrx.ru registry image)
- Inject DISCORD_BOT_TOKEN via .env, bot accesses Docker socket + Prometheus
- Add vault_discord_bot_{token,app_id,public_key}, aliases in main.yml
- Add workflow_dispatch to deploy.yml so /deploy bot command works

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:27:42 +07:00
1b063c3947 fix(uptime-kuma): add proxy network for internet access to Discord/Telegram
Some checks failed
CI/CD / syntax-check (push) Successful in 1m7s
CI/CD / deploy (push) Has been cancelled
Container was on backend (internal: true) only — couldn't resolve
discord.com for webhook notifications. Added proxy network which
has outbound internet access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:01:27 +07:00
40c8d291ca fix(plane): add WEB_URL and NEXT_PUBLIC_API_BASE_URL to plane-web container
Some checks failed
CI/CD / syntax-check (push) Successful in 1m6s
CI/CD / deploy (push) Failing after 5m49s
Without these env vars Next.js SSR renders with wrong base URL causing
React hydration error #418 — server/client HTML mismatch on first render.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:13:35 +07:00
75bed6bb04 feat: remove mail stack and Vaultwarden
Some checks failed
CI/CD / syntax-check (push) Successful in 1m15s
CI/CD / deploy (push) Has been cancelled
Removed services:
- docker-mailserver (Postfix + Dovecot)
- SnappyMail webmail
- Vaultwarden password manager

Removed infrastructure:
- certbot + Cloudflare DNS-01 TLS for mx.csrx.ru
- UFW rules for ports 25/587/993/465
- mail-internal and webmail-internal Docker networks
- SMTP config from Outline env
- vault, mail Traefik routes
- All related vault secrets and variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:06:29 +07:00
207e1dcff0 chore: project cleanup and docs update
All checks were successful
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Successful in 16m39s
- Remove Syncthing mention from authelia comment in docker-compose
- Fix backup.sh.j2 comment: hourly → every 6 hours
- Update CLAUDE.md: add docs update rule, fix backup schedule note
- Update STATUS.md: dash.csrx.ru fixed, PTR pending, backup schedule, mail hostnames
- Update BACKLOG.md: mark DNS/PTR/backup-schedule done, add SnappyMail domain task
- Update DECISIONS.md: fix backup section (no --storage-class COLD, correct schedule)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:00:35 +07:00
2b5524f258 fix: remove promtail nested /var/log/traefik volume mount
All checks were successful
CI/CD / syntax-check (push) Successful in 1m9s
CI/CD / deploy (push) Successful in 15m33s
Docker cannot mount to /var/log/traefik when /var/log is already
bind-mounted (read-only). The nested mount fails with 'read-only
file system' error in the overlay upper layer.

The mount was unused anyway — promtail config only reads syslog,
auth.log, and Docker container logs via the socket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:55:39 +07:00
28f8c76433 fix: plane and authelia health check URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 1m21s
CI/CD / deploy (push) Failing after 12m3s
- plane-web/admin: localhost:80 → 127.0.0.1:3000 (nginx listens on 3000)
- plane-space: localhost:3000 → 127.0.0.1:3000/spaces/ (node server needs basename)
- plane-api: localhost:8000/api/ → 127.0.0.1:8000/ (/ returns status OK, /api/ returns 404)
- uptime-kuma: localhost:3001 → curl -sf (wget not available in image)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 14:50:08 +07:00
9ca1177461 fix: crowdsec proxy network, uptime-kuma curl healthcheck, outline en_US, n8n 127.0.0.1
Some checks failed
CI/CD / syntax-check (push) Successful in 1m4s
CI/CD / deploy (push) Failing after 10m46s
- crowdsec: add proxy network for internet access (hub downloads)
- crowdsec-bouncer: remove (image crowdsecurity/cs-firewall-bouncer doesn't exist on Docker Hub)
- uptime-kuma: switch healthcheck from wget to curl (wget not in image)
- outline: fix DEFAULT_LANGUAGE ru_RU → en_US (unsupported locale)
- n8n: fix healthcheck localhost → 127.0.0.1 (IPv6 issue in Alpine)
- alertmanager: config permissions 0644 (was 0640, container couldn't read)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 08:14:07 +07:00
92d2c845d8 feat: add n8n, outline routes, remove syncthing, fix backup awscli
Some checks failed
CI/CD / syntax-check (push) Successful in 1m14s
CI/CD / deploy (push) Failing after 10m51s
- Add n8n to tools server (n8n.csrx.ru)
- Add cross-server Traefik routes: wiki.csrx.ru + n8n.csrx.ru → tools
- Remove Syncthing (replaced by Outline wiki)
- Fix awscli install: download static binary (apt/pip broken on Ubuntu 24.04)
- Add n8n secrets to vault (encryption key + JWT secret)
- Improve CI/CD workflow: syntax-check both playbooks, deploy both servers
- Update site.yml: unified single-command deploy for all servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:19:39 +07:00
c2f9a0c21c feat: wildcard TLS via Cloudflare DNS-01 + real-IP forwarding
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 46s
- Switch Traefik ACME to dnsChallenge (provider: cloudflare)
- Add *.csrx.ru wildcard cert via tls.stores.default.defaultGeneratedCert
- Pass CLOUDFLARE_DNS_API_TOKEN to Traefik via env_file: .env
- Add Cloudflare IP ranges to forwardedHeaders.trustedIPs (real visitor IPs)
- Fix UFW: allow 172.16.0.0/12 on 80/443 so act_runner can reach Forgejo
- Add A records: auth.csrx.ru, status.csrx.ru, csrx.ru root → 87.249.49.32

Result: one *.csrx.ru cert covers all subdomains, auto-renewed by Traefik.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:47:46 +07:00
fccbd1a45a feat: Cloudflare DNS-01 ACME + Docker hardening + sysctl
Some checks failed
CI/CD / syntax-check (push) Successful in 42s
CI/CD / deploy (push) Failing after 52s
Cloudflare DNS-01 ACME:
- Switch Traefik cert resolver from httpChallenge to dnsChallenge
  using Cloudflare provider (resolvers: 1.1.1.1, 1.0.0.1)
- Add CLOUDFLARE_DNS_API_TOKEN env to Traefik container
- Add CF_ZONE_ID + cloudflare_dns_api_token to all/main.yml
- Store API token in Ansible Vault

Docker daemon hardening:
- Add log-driver: json-file with max-size 10m / max-file 3
  (prevents disk fill from unbounded container logs)
- Add live-restore: true (containers survive Docker daemon restart)

Kernel hardening (sysctl):
- New roles/base/tasks/sysctl.yml via ansible.posix.sysctl
- IP spoofing protection (rp_filter)
- Disable ICMP redirects and broadcast pings
- SYN flood protection (syncookies, backlog)
- Disable IPv6 (not used)
- Restrict kernel pointers and dmesg to root
- Disable SysRq, suid core dumps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:06:46 +07:00
aa9706bbc4 feat: comprehensive security hardening
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 59s
Traefik:
- Enable access logs → /var/log/traefik/access.log (needed for CrowdSec)
- Add global security headers middleware: HSTS, X-Frame-Options, CSP,
  nosniff, XSS filter, referrer policy, permissions policy
- Add rate limiting: default 100/s, API 30/s, admin 10/s (strict)
- Add Authelia ForwardAuth middleware for SSO integration

CrowdSec (new service):
- Analyzes Traefik access logs + auth.log in real time
- Community IP reputation blocklist (crowdsecurity/traefik + http-cve)
- Firewall bouncer: bans malicious IPs at kernel level (iptables)

Authelia (new service, auth.csrx.ru):
- 2FA/SSO portal with TOTP (Google Authenticator)
- Protects: traefik.csrx.ru, sync.csrx.ru, /god-mode/ in Plane
- Session: 12h expiry, 30m inactivity, Redis backend
- argon2id password hashing

Container security:
- Add security_opt: no-new-privileges to traefik, vaultwarden,
  forgejo, grafana, authelia

CI/CD security:
- Remove hardcoded server IP 87.249.49.32 from workflow
- Use SSH_KNOWN_HOSTS secret instead of ssh-keyscan (prevents MITM)
- Added SSH_KNOWN_HOSTS secret to Forgejo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:44:54 +07:00
6ebd237894 feat: major infrastructure improvements
Some checks failed
CI/CD / deploy (push) Has been cancelled
CI/CD / syntax-check (push) Successful in 1m7s
Reliability:
- Add swap role (2GB, swappiness=10, idempotent via /etc/fstab)
- Add mem_limit to plane-worker (512m) and plane-beat (256m)
- Add health checks to all services (traefik, vaultwarden, forgejo,
  plane-*, syncthing, prometheus, grafana, loki)

Code quality:
- Remove Traefik Docker labels (file provider used, labels were dead code)
- Add comment explaining file provider architecture

Observability:
- Add AlertManager with Telegram notifications
- Add Prometheus alert rules: CPU, RAM, disk, swap, container health
- Add Loki + Promtail for centralized log aggregation
- Add Loki datasource to Grafana
- Enable Traefik /ping endpoint for health checks

Backups:
- Add backup role: pg_dump for forgejo + plane DBs, tar for
  vaultwarden and forgejo data
- 7-day retention, daily cron at 03:00
- Backup script at /usr/local/bin/backup-services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:28:16 +07:00
972a76db4c feat: add monitoring stack (Prometheus + Grafana + cAdvisor + Node Exporter)
All checks were successful
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Successful in 6m51s
- Adds monitoring Docker network (internal)
- Prometheus scrapes node-exporter (host metrics) and cAdvisor (containers)
  with 30-day retention
- Grafana exposed at dashboard.csrx.ru with pre-provisioned datasource
  and two dashboards: Node Exporter Full (1860) and cAdvisor (14282)
- Vault secret: vault_grafana_admin_password

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:05:34 +07:00
efbbc3cac5 fix: add plane-admin, plane-space and configure instance URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 2m31s
CI/CD / deploy (push) Has been cancelled
New Plane stable requires 3 frontend services:
- plane-admin (nginx:80) for /god-mode/ routes
- plane-space (node:3000) for /spaces/ routes
- plane-web (nginx:80) for all other routes

Also add APP/ADMIN/SPACE_BASE_URL env vars to plane-api so the
setup wizard knows where to redirect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 02:14:34 +07:00
66c03ffc04 fix: update plane backend for new stable image requirements
All checks were successful
CI/CD / syntax-check (push) Successful in 2m49s
CI/CD / deploy (push) Successful in 8m54s
makeplane/plane-backend:stable now requires:
- AMQP_URL: Celery broker URL (defaults to amqp://localhost, broken)
  → set to redis://plane-redis:6379/ to reuse existing Redis
- GUNICORN_WORKERS: must be set explicitly (empty string causes crash)
  → set to 2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 01:32:11 +07:00
679d3ed010 fix: update plane-web for nginx-based stable image
All checks were successful
CI/CD / syntax-check (push) Successful in 2m41s
CI/CD / deploy (push) Successful in 11m24s
makeplane/plane-frontend:stable now uses nginx (not Next.js/node).
Remove `command: node web/server.js` override and update Traefik
port from 3000 to 80.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:44:00 +07:00
6a2c38b4bf Fix act_runner: use public Forgejo URL for job container access
Some checks failed
CI/CD / syntax-check (push) Failing after 48s
CI/CD / deploy (push) Has been skipped
Job containers run on runner-jobs network (internet only), so they
can't reach forgejo:3000 (backend-only). Use public https://git.csrx.ru
so both runner and job containers can reach Forgejo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 22:53:25 +07:00
d2d5f12d5a Add Forgejo Actions CI/CD with act_runner
Some checks failed
CI/CD / syntax-check (push) Failing after 12s
CI/CD / deploy (push) Has been skipped
- Add gitea/act_runner:0.3.0 to docker-compose stack on runner-jobs network
- Add act_runner config template and directory provisioning
- Add FORGEJO_RUNNER_TOKEN to env template
- Add CI deploy SSH public key to authorized_keys via base role
- Create .forgejo/workflows/deploy.yml: syntax-check on PR, deploy on push to master
- Add .claude/launch.json with ansible-playbook configurations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 21:28:15 +07:00
652737239d Add Forgejo SSH port 2222 and open in UFW
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:43:22 +07:00
a1b97f3e4b Initial commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:39:26 +07:00