Commit graph

20 commits

Author SHA1 Message Date
44ccdf4882 refactor: remove tools server, Vaultwarden, monitoring stack; rename plane→hub
Some checks failed
CI/CD / syntax-check (push) Successful in 1m30s
CI/CD / deploy (push) Failing after 6m8s
- Remove tools server entirely (roles/tools, playbooks/tools.yml, CI deploy step)
- Remove Vaultwarden (already absent from compose, clean up vars)
- Remove node-exporter, cadvisor, promtail from main stack
- Remove grafana/uptime-kuma Traefik routes (pointed to tools)
- Remove monitoring network from docker-compose
- Remove tools vault vars (grafana_admin_password, alertmanager telegram)
- Rename domain_plane: plane.walava.io → hub.walava.io
- Update CI workflow to only deploy main server
- Update STATUS.md and BACKLOG.md to reflect current state
2026-03-27 19:05:19 +07:00
472c2b944b feat: replace Outline with Docmost
Some checks failed
CI/CD / syntax-check (push) Successful in 1m0s
CI/CD / deploy (push) Failing after 5m1s
- Replace outline/outline-db/outline-redis with docmost/docmost-db/docmost-redis
- Update Traefik route: wiki → http://docmost:3000
- Update S3 bucket: walava-outline → walava-docmost (new bucket created: 481385)
- Remove env.outline.j2 deploy task (Docmost config is inline in compose)
- Update backup script: outline.sql.gz → docmost.sql.gz
- Update CORS task for walava-docmost bucket
- Add vault_docmost_app_secret + vault_docmost_db_password secrets
- Remove outline_mcp_image (no longer needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:31:51 +07:00
fde51352d7 feat: migrate monitoring to tools server, fix Outline S3 uploads
Monitoring stack (Prometheus, AlertManager, Grafana, Loki, Uptime Kuma)
moved from main to tools server. Prometheus now scrapes main exporters
over network (ip_main:9100/8080). Promtail pushes logs to ip_tools:3100.
Traefik routes for dash/status.walava.io updated to ip_tools. discord-bot
PROMETHEUS_URL updated to http://ip_tools:9090.

Outline S3 fix: remove AWS_S3_ACL=private (Timeweb doesn't support
per-object ACLs — caused upload failures). Add CORS configuration task
for browser-side presigned uploads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 04:10:28 +07:00
489791403c feat: migrate Outline + n8n to main server, rename S3 buckets to walava-*
- Add Outline, outline-db, outline-redis, n8n, outline-mcp containers to main docker-compose
- Add env.outline.j2 template with Resend SMTP and S3 (walava-outline bucket)
- Update Traefik routes: wiki → outline:3000, auto → n8n:5678 (local, not cross-server)
- Rename S3 buckets: visual-backup → walava-backup, visual-outline → walava-outline
- Extend backup.sh.j2: add Outline DB, n8n, Plane MinIO to backup scope
- Add outline_image, n8n_image, outline_mcp_image to services/defaults
- Remove Authelia config deployment tasks from configs.yml
- Add outline-internal and n8n-internal networks to docker-compose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:04:54 +07:00
d635522199 feat: remove Authelia, protect dashboard with basic auth
Some checks are pending
CI/CD / syntax-check (push) Waiting to run
CI/CD / deploy (push) Blocked by required conditions
Authelia was unused overhead — only traefik-dashboard and plane /god-mode/
were behind it. Dashboard now uses traefik-auth (basic auth). /god-mode/
uses rate-limit-strict only.

Removes: authelia + authelia-redis containers, authelia-internal network,
authelia_data volume, authelia router/service/forwardAuth middleware.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:50:41 +07:00
fb769b2f8c feat: migrate domain from csrx.ru to walava.io
Some checks failed
CI/CD / syntax-check (push) Successful in 1m44s
CI/CD / deploy (push) Failing after 20m21s
- domain_base changed to walava.io
- domain_n8n now auto.walava.io
- Added domain_landing for walava.io root
- Added walava-web landing page container + Traefik route
- Updated Cloudflare token/zone_id for walava.io account
- Updated ACME email to walava@tutamail.com
- Fixed discord-bot image to use domain_base variable
- DNS records already created in Cloudflare

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:17:00 +07:00
75bed6bb04 feat: remove mail stack and Vaultwarden
Some checks failed
CI/CD / syntax-check (push) Successful in 1m15s
CI/CD / deploy (push) Has been cancelled
Removed services:
- docker-mailserver (Postfix + Dovecot)
- SnappyMail webmail
- Vaultwarden password manager

Removed infrastructure:
- certbot + Cloudflare DNS-01 TLS for mx.csrx.ru
- UFW rules for ports 25/587/993/465
- mail-internal and webmail-internal Docker networks
- SMTP config from Outline env
- vault, mail Traefik routes
- All related vault secrets and variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:06:29 +07:00
1e638055c8 feat(mail): rename mail→mx, webmail→mail.csrx.ru + reliability
Some checks failed
CI/CD / syntax-check (push) Successful in 1m23s
CI/CD / deploy (push) Has been cancelled
Rename:
- docker-mailserver: hostname mail → mx, OVERRIDE_HOSTNAME → mx.csrx.ru
- Traefik route: webmail/domain_webmail → mail/domain_mail
- domain_webmail removed, domain_mail + domain_mx added to main.yml
- certbot cert: mail.csrx.ru → mx.csrx.ru

Email reliability improvements:
- certbot renewal cron (03:15 + 15:15 daily)
- deploy-hook: auto-reload Postfix+Dovecot after cert renewal
- POSTFIX_MESSAGE_SIZE_LIMIT=26214400 (25 MB)
- SPF hardened: ~all → -all
- DMARC hardened: p=none → p=quarantine, added ruf + fo=1 + adkim/aspf strict
- autodiscover/autoconfig CNAME records for mail client setup
- dns-zone.zone fully updated with architecture comments

Docs:
- STATUS.md: full mail architecture section, client settings, DNS table
- BACKLOG.md: rDNS task + DNS migration steps
- DECISIONS.md: mx/mail split rationale

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:07:59 +07:00
644b5b74c1 feat: add SnappyMail webmail and docker-mailserver with full send/receive
Some checks failed
CI/CD / syntax-check (push) Successful in 1m35s
CI/CD / deploy (push) Failing after 17m28s
- Add docker-mailserver (Postfix+Dovecot) with SSL via certbot+Cloudflare DNS-01
- Add SnappyMail webmail client at webmail.csrx.ru (port 8888)
- Open UFW ports 25/465/587/993 on tools server
- Create mail accounts: noreply@, admin@, jack@csrx.ru
- Generate DKIM key and print DNS instructions on first run
- Add Traefik route on main server proxying webmail → tools:8888
- Add all secrets to vault (mailserver passwords, snappymail admin)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 17:21:25 +07:00
92d2c845d8 feat: add n8n, outline routes, remove syncthing, fix backup awscli
Some checks failed
CI/CD / syntax-check (push) Successful in 1m14s
CI/CD / deploy (push) Failing after 10m51s
- Add n8n to tools server (n8n.csrx.ru)
- Add cross-server Traefik routes: wiki.csrx.ru + n8n.csrx.ru → tools
- Remove Syncthing (replaced by Outline wiki)
- Fix awscli install: download static binary (apt/pip broken on Ubuntu 24.04)
- Add n8n secrets to vault (encryption key + JWT secret)
- Improve CI/CD workflow: syntax-check both playbooks, deploy both servers
- Update site.yml: unified single-command deploy for all servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:19:39 +07:00
c2f9a0c21c feat: wildcard TLS via Cloudflare DNS-01 + real-IP forwarding
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 46s
- Switch Traefik ACME to dnsChallenge (provider: cloudflare)
- Add *.csrx.ru wildcard cert via tls.stores.default.defaultGeneratedCert
- Pass CLOUDFLARE_DNS_API_TOKEN to Traefik via env_file: .env
- Add Cloudflare IP ranges to forwardedHeaders.trustedIPs (real visitor IPs)
- Fix UFW: allow 172.16.0.0/12 on 80/443 so act_runner can reach Forgejo
- Add A records: auth.csrx.ru, status.csrx.ru, csrx.ru root → 87.249.49.32

Result: one *.csrx.ru cert covers all subdomains, auto-renewed by Traefik.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:47:46 +07:00
f183fe485f revert: switch back to HTTP-01 until Cloudflare NS propagation
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 39s
DNS-01 + wildcard cert requires Cloudflare to be authoritative NS.
Until propagation completes, use httpChallenge on port 80.

Plan after Cloudflare NS is active:
1. Switch back to dnsChallenge in traefik.yml.j2
2. Re-enable tls.stores.default.defaultGeneratedCert in routes.yml.j2
3. Clear acme.json → Traefik issues *.csrx.ru wildcard cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:18:21 +07:00
0496e9ab61 feat: wildcard TLS certificate *.csrx.ru via Cloudflare DNS-01
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 48s
Add tls.stores.default.defaultGeneratedCert in dynamic config:
- Traefik requests one *.csrx.ru + csrx.ru SAN cert via DNS-01
- All existing and future subdomains use this single cert
- No per-service cert issuance wait when adding new services
- Cert auto-renewed by Traefik ~30 days before expiry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:13:42 +07:00
fccbd1a45a feat: Cloudflare DNS-01 ACME + Docker hardening + sysctl
Some checks failed
CI/CD / syntax-check (push) Successful in 42s
CI/CD / deploy (push) Failing after 52s
Cloudflare DNS-01 ACME:
- Switch Traefik cert resolver from httpChallenge to dnsChallenge
  using Cloudflare provider (resolvers: 1.1.1.1, 1.0.0.1)
- Add CLOUDFLARE_DNS_API_TOKEN env to Traefik container
- Add CF_ZONE_ID + cloudflare_dns_api_token to all/main.yml
- Store API token in Ansible Vault

Docker daemon hardening:
- Add log-driver: json-file with max-size 10m / max-file 3
  (prevents disk fill from unbounded container logs)
- Add live-restore: true (containers survive Docker daemon restart)

Kernel hardening (sysctl):
- New roles/base/tasks/sysctl.yml via ansible.posix.sysctl
- IP spoofing protection (rp_filter)
- Disable ICMP redirects and broadcast pings
- SYN flood protection (syncookies, backlog)
- Disable IPv6 (not used)
- Restrict kernel pointers and dmesg to root
- Disable SysRq, suid core dumps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:06:46 +07:00
e935c897c6 feat: Cloudflare integration — real IP forwarding + firewall lockdown
Some checks failed
CI/CD / syntax-check (push) Successful in 58s
CI/CD / deploy (push) Failing after 43s
Traefik traefik.yml.j2:
- Add forwardedHeaders.trustedIPs with all Cloudflare CIDR ranges
  on both web and websecure entrypoints so rate limiting and
  CrowdSec see real visitor IPs, not Cloudflare proxy IPs

firewall.yml:
- Replace open HTTP/HTTPS rules with per-CIDR allow rules
  scoped to Cloudflare IP ranges only
- Direct access to ports 80/443 bypassing Cloudflare is now blocked

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:02:06 +07:00
aa9706bbc4 feat: comprehensive security hardening
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 59s
Traefik:
- Enable access logs → /var/log/traefik/access.log (needed for CrowdSec)
- Add global security headers middleware: HSTS, X-Frame-Options, CSP,
  nosniff, XSS filter, referrer policy, permissions policy
- Add rate limiting: default 100/s, API 30/s, admin 10/s (strict)
- Add Authelia ForwardAuth middleware for SSO integration

CrowdSec (new service):
- Analyzes Traefik access logs + auth.log in real time
- Community IP reputation blocklist (crowdsecurity/traefik + http-cve)
- Firewall bouncer: bans malicious IPs at kernel level (iptables)

Authelia (new service, auth.csrx.ru):
- 2FA/SSO portal with TOTP (Google Authenticator)
- Protects: traefik.csrx.ru, sync.csrx.ru, /god-mode/ in Plane
- Session: 12h expiry, 30m inactivity, Redis backend
- argon2id password hashing

Container security:
- Add security_opt: no-new-privileges to traefik, vaultwarden,
  forgejo, grafana, authelia

CI/CD security:
- Remove hardcoded server IP 87.249.49.32 from workflow
- Use SSH_KNOWN_HOSTS secret instead of ssh-keyscan (prevents MITM)
- Added SSH_KNOWN_HOSTS secret to Forgejo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:44:54 +07:00
6ebd237894 feat: major infrastructure improvements
Some checks failed
CI/CD / deploy (push) Has been cancelled
CI/CD / syntax-check (push) Successful in 1m7s
Reliability:
- Add swap role (2GB, swappiness=10, idempotent via /etc/fstab)
- Add mem_limit to plane-worker (512m) and plane-beat (256m)
- Add health checks to all services (traefik, vaultwarden, forgejo,
  plane-*, syncthing, prometheus, grafana, loki)

Code quality:
- Remove Traefik Docker labels (file provider used, labels were dead code)
- Add comment explaining file provider architecture

Observability:
- Add AlertManager with Telegram notifications
- Add Prometheus alert rules: CPU, RAM, disk, swap, container health
- Add Loki + Promtail for centralized log aggregation
- Add Loki datasource to Grafana
- Enable Traefik /ping endpoint for health checks

Backups:
- Add backup role: pg_dump for forgejo + plane DBs, tar for
  vaultwarden and forgejo data
- 7-day retention, daily cron at 03:00
- Backup script at /usr/local/bin/backup-services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:28:16 +07:00
972a76db4c feat: add monitoring stack (Prometheus + Grafana + cAdvisor + Node Exporter)
All checks were successful
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Successful in 6m51s
- Adds monitoring Docker network (internal)
- Prometheus scrapes node-exporter (host metrics) and cAdvisor (containers)
  with 30-day retention
- Grafana exposed at dashboard.csrx.ru with pre-provisioned datasource
  and two dashboards: Node Exporter Full (1860) and cAdvisor (14282)
- Vault secret: vault_grafana_admin_password

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:05:34 +07:00
d2d5f12d5a Add Forgejo Actions CI/CD with act_runner
Some checks failed
CI/CD / syntax-check (push) Failing after 12s
CI/CD / deploy (push) Has been skipped
- Add gitea/act_runner:0.3.0 to docker-compose stack on runner-jobs network
- Add act_runner config template and directory provisioning
- Add FORGEJO_RUNNER_TOKEN to env template
- Add CI deploy SSH public key to authorized_keys via base role
- Create .forgejo/workflows/deploy.yml: syntax-check on PR, deploy on push to master
- Add .claude/launch.json with ansible-playbook configurations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 21:28:15 +07:00
a1b97f3e4b Initial commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:39:26 +07:00