Commit graph

82 commits

Author SHA1 Message Date
a28fffa7ae fix: mailserver account check via host file, not docker exec
Some checks failed
CI/CD / syntax-check (push) Successful in 1m30s
CI/CD / deploy (push) Has been cancelled
setup email list fails with rc=1 when postfix-accounts.cf doesn't
exist yet (fresh install). Check the mounted config file on the host
instead, which correctly handles the empty/missing case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:55:52 +07:00
b616c18c58 feat: add docker-mailserver for self-hosted outbound SMTP
Some checks failed
CI/CD / syntax-check (push) Successful in 1m6s
CI/CD / deploy (push) Failing after 18m22s
Adds docker-mailserver (SMTP_ONLY mode) to the tools stack so Outline
can send magic-link emails without depending on an external SMTP provider.

Changes:
- docker-compose.yml.j2: add mailserver service + mail-internal network
  outline gets mail-internal network to reach mailserver
- env.j2: point Outline SMTP at local mailserver:587 with noreply account
- defaults/main.yml: add mailserver_image (v14)
- tasks/main.yml: create mailserver dirs, wait for postfix ready,
  idempotent account creation, DKIM key generation + DNS instructions
- inventory/group_vars/all/main.yml: add mailserver_noreply_password alias
- vault.yml: add vault_mailserver_noreply_password

After deploy, Ansible will print DKIM/SPF/DMARC DNS records to add
to Cloudflare.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:28:29 +07:00
bf59b75c8f fix: redesign backup archive structure + enable Outline email auth
Some checks failed
CI/CD / syntax-check (push) Successful in 1m13s
CI/CD / deploy (push) Has been cancelled
Backup (backup.sh.j2):
- Creates a single data_YYYY-MM-DD_HH-MM.tar.gz archive
- Unified data/ layout: databases/ (pg_dump .sql.gz) + volumes/ (docker volumes)
- Includes RESTORE.md with step-by-step instructions inside the archive
- S3 uploads to main/ prefix instead of flat root

Outline (tools role):
- Add SMTP_HOST/PORT/FROM vars to env.j2 template (required for email magic-link auth to activate)
- Add outline_smtp_* defaults to roles/tools/defaults/main.yml
- Without SMTP_HOST, the email auth plugin is disabled and clicking login does nothing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:20:11 +07:00
2b5524f258 fix: remove promtail nested /var/log/traefik volume mount
All checks were successful
CI/CD / syntax-check (push) Successful in 1m9s
CI/CD / deploy (push) Successful in 15m33s
Docker cannot mount to /var/log/traefik when /var/log is already
bind-mounted (read-only). The nested mount fails with 'read-only
file system' error in the overlay upper layer.

The mount was unused anyway — promtail config only reads syslog,
auth.log, and Docker container logs via the socket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:55:39 +07:00
6279bcb9b4 fix: remove cs-firewall-bouncer from image pre-pull list
Some checks failed
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Failing after 9m44s
crowdsecurity/cs-firewall-bouncer:v0.0.31 does not exist on Docker Hub.
The bouncer service was already removed from docker-compose.yml.
Remove from pre-pull list and defaults to unblock CI/CD deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:39:31 +07:00
a7b14759af fix: add front network to tools stack for Docker port binding
Some checks failed
CI/CD / syntax-check (push) Successful in 1m11s
CI/CD / deploy (push) Failing after 1m45s
Docker 29.x does not create DNAT rules for containers only on internal
networks. Add a non-internal 'front' network that outline and n8n join
alongside their internal networks, enabling host port binding to work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:10:52 +07:00
28f8c76433 fix: plane and authelia health check URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 1m21s
CI/CD / deploy (push) Failing after 12m3s
- plane-web/admin: localhost:80 → 127.0.0.1:3000 (nginx listens on 3000)
- plane-space: localhost:3000 → 127.0.0.1:3000/spaces/ (node server needs basename)
- plane-api: localhost:8000/api/ → 127.0.0.1:8000/ (/ returns status OK, /api/ returns 404)
- uptime-kuma: localhost:3001 → curl -sf (wget not available in image)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 14:50:08 +07:00
9ca1177461 fix: crowdsec proxy network, uptime-kuma curl healthcheck, outline en_US, n8n 127.0.0.1
Some checks failed
CI/CD / syntax-check (push) Successful in 1m4s
CI/CD / deploy (push) Failing after 10m46s
- crowdsec: add proxy network for internet access (hub downloads)
- crowdsec-bouncer: remove (image crowdsecurity/cs-firewall-bouncer doesn't exist on Docker Hub)
- uptime-kuma: switch healthcheck from wget to curl (wget not in image)
- outline: fix DEFAULT_LANGUAGE ru_RU → en_US (unsupported locale)
- n8n: fix healthcheck localhost → 127.0.0.1 (IPv6 issue in Alpine)
- alertmanager: config permissions 0644 (was 0640, container couldn't read)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 08:14:07 +07:00
92d2c845d8 feat: add n8n, outline routes, remove syncthing, fix backup awscli
Some checks failed
CI/CD / syntax-check (push) Successful in 1m14s
CI/CD / deploy (push) Failing after 10m51s
- Add n8n to tools server (n8n.csrx.ru)
- Add cross-server Traefik routes: wiki.csrx.ru + n8n.csrx.ru → tools
- Remove Syncthing (replaced by Outline wiki)
- Fix awscli install: download static binary (apt/pip broken on Ubuntu 24.04)
- Add n8n secrets to vault (encryption key + JWT secret)
- Improve CI/CD workflow: syntax-check both playbooks, deploy both servers
- Update site.yml: unified single-command deploy for all servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:19:39 +07:00
05bcbab858 feat: add tools role (Outline wiki) + 3-server architecture
Some checks failed
CI/CD / syntax-check (push) Successful in 59s
CI/CD / deploy (push) Failing after 11m20s
Services:
- Outline wiki at wiki.csrx.ru → visual-tools:3000
- Outline uses Timeweb S3 (visual-outline bucket) for files

Structure:
- roles/tools/ — docker-compose + env templates for tools server
- playbooks/tools.yml — deploys base+docker+tools to visual-tools

Config changes:
- domain_dashboard: dashboard → dash.csrx.ru
- domain_wiki: wiki.csrx.ru (new)
- domain_mon: mon.csrx.ru (new, for Grafana)
- ip_main/tools/mon vars for cross-server Traefik routing
- outline_* secrets added to vault + main.yml aliases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 05:36:04 +07:00
321e1c4daa feat: extend fail2ban with Forgejo SSH and Traefik HTTP jails
Some checks failed
CI/CD / syntax-check (push) Successful in 42s
CI/CD / deploy (push) Failing after 46s
- Add traefik-auth filter: ban IPs with 10+ HTTP 401/403 in 5 min
- Add forgejo-ssh jail: ban after 3 failed SSH attempts (24h ban)
- Both jails are active; forgejo-ssh already detected 8 real attempts
- Traefik access.log now written to /opt/services/traefik/logs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:51:43 +07:00
c2f9a0c21c feat: wildcard TLS via Cloudflare DNS-01 + real-IP forwarding
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 46s
- Switch Traefik ACME to dnsChallenge (provider: cloudflare)
- Add *.csrx.ru wildcard cert via tls.stores.default.defaultGeneratedCert
- Pass CLOUDFLARE_DNS_API_TOKEN to Traefik via env_file: .env
- Add Cloudflare IP ranges to forwardedHeaders.trustedIPs (real visitor IPs)
- Fix UFW: allow 172.16.0.0/12 on 80/443 so act_runner can reach Forgejo
- Add A records: auth.csrx.ru, status.csrx.ru, csrx.ru root → 87.249.49.32

Result: one *.csrx.ru cert covers all subdomains, auto-renewed by Traefik.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:47:46 +07:00
f183fe485f revert: switch back to HTTP-01 until Cloudflare NS propagation
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 39s
DNS-01 + wildcard cert requires Cloudflare to be authoritative NS.
Until propagation completes, use httpChallenge on port 80.

Plan after Cloudflare NS is active:
1. Switch back to dnsChallenge in traefik.yml.j2
2. Re-enable tls.stores.default.defaultGeneratedCert in routes.yml.j2
3. Clear acme.json → Traefik issues *.csrx.ru wildcard cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:18:21 +07:00
0496e9ab61 feat: wildcard TLS certificate *.csrx.ru via Cloudflare DNS-01
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 48s
Add tls.stores.default.defaultGeneratedCert in dynamic config:
- Traefik requests one *.csrx.ru + csrx.ru SAN cert via DNS-01
- All existing and future subdomains use this single cert
- No per-service cert issuance wait when adding new services
- Cert auto-renewed by Traefik ~30 days before expiry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:13:42 +07:00
5befd48a50 fix: allow Docker bridge networks through UFW for runner + add unattended-upgrades
Some checks are pending
CI/CD / deploy (push) Blocked by required conditions
CI/CD / syntax-check (push) Successful in 41s
firewall.yml:
- Allow 172.16.0.0/12 and 10.0.0.0/8 on ports 80/443 so act_runner
  job containers can reach git.csrx.ru (Forgejo via Traefik)
- Without this, Cloudflare-only rules broke CI/CD pipeline

unattended_upgrades.yml (new):
- Install unattended-upgrades + apt-listchanges
- Configure auto-apply of security patches only (not all updates)
- Auto-clean every 7 days, remove unused deps
- No auto-reboot (manual control over kernel reboots)

base/tasks/main.yml:
- Add unattended_upgrades.yml to task sequence

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:11:39 +07:00
fccbd1a45a feat: Cloudflare DNS-01 ACME + Docker hardening + sysctl
Some checks failed
CI/CD / syntax-check (push) Successful in 42s
CI/CD / deploy (push) Failing after 52s
Cloudflare DNS-01 ACME:
- Switch Traefik cert resolver from httpChallenge to dnsChallenge
  using Cloudflare provider (resolvers: 1.1.1.1, 1.0.0.1)
- Add CLOUDFLARE_DNS_API_TOKEN env to Traefik container
- Add CF_ZONE_ID + cloudflare_dns_api_token to all/main.yml
- Store API token in Ansible Vault

Docker daemon hardening:
- Add log-driver: json-file with max-size 10m / max-file 3
  (prevents disk fill from unbounded container logs)
- Add live-restore: true (containers survive Docker daemon restart)

Kernel hardening (sysctl):
- New roles/base/tasks/sysctl.yml via ansible.posix.sysctl
- IP spoofing protection (rp_filter)
- Disable ICMP redirects and broadcast pings
- SYN flood protection (syncookies, backlog)
- Disable IPv6 (not used)
- Restrict kernel pointers and dmesg to root
- Disable SysRq, suid core dumps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:06:46 +07:00
e935c897c6 feat: Cloudflare integration — real IP forwarding + firewall lockdown
Some checks failed
CI/CD / syntax-check (push) Successful in 58s
CI/CD / deploy (push) Failing after 43s
Traefik traefik.yml.j2:
- Add forwardedHeaders.trustedIPs with all Cloudflare CIDR ranges
  on both web and websecure entrypoints so rate limiting and
  CrowdSec see real visitor IPs, not Cloudflare proxy IPs

firewall.yml:
- Replace open HTTP/HTTPS rules with per-CIDR allow rules
  scoped to Cloudflare IP ranges only
- Direct access to ports 80/443 bypassing Cloudflare is now blocked

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:02:06 +07:00
1f03022086 fix: correct invalid PromQL in ContainerHighMemory alert rule
Some checks failed
CI/CD / syntax-check (push) Successful in 53s
CI/CD / deploy (push) Failing after 57s
Cannot use comparison operators inside label matchers {}.
Move the > 0 filter outside braces as a scalar filter on the
denominator — idiomatic Prometheus way to exclude unlimited containers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:59:56 +07:00
fc6b1c0cec feat: Timeweb S3 offsite backup uploads
Some checks failed
CI/CD / syntax-check (push) Successful in 39s
CI/CD / deploy (push) Has been cancelled
- Add vault_s3_access_key / vault_s3_secret_key to Ansible Vault
- Expose via s3_access_key / s3_secret_key in all/main.yml
- Add s3_endpoint + s3_bucket to backup role defaults
- Install awscli via apt in backup role tasks
- Extend backup.sh.j2: upload *.gz to S3 after local backup,
  prune S3 objects older than backup_retention_days

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:58:58 +07:00
a344998405 feat: add uptime-kuma pull, logrotate deploy task, logrotate package
Some checks failed
CI/CD / syntax-check (push) Successful in 41s
CI/CD / deploy (push) Failing after 39s
- Add uptime_kuma_image to image pull loop in services/tasks/main.yml
- Add logrotate deploy task to services/tasks/configs.yml
- Add logrotate package to base_packages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:54:24 +07:00
aa9706bbc4 feat: comprehensive security hardening
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 59s
Traefik:
- Enable access logs → /var/log/traefik/access.log (needed for CrowdSec)
- Add global security headers middleware: HSTS, X-Frame-Options, CSP,
  nosniff, XSS filter, referrer policy, permissions policy
- Add rate limiting: default 100/s, API 30/s, admin 10/s (strict)
- Add Authelia ForwardAuth middleware for SSO integration

CrowdSec (new service):
- Analyzes Traefik access logs + auth.log in real time
- Community IP reputation blocklist (crowdsecurity/traefik + http-cve)
- Firewall bouncer: bans malicious IPs at kernel level (iptables)

Authelia (new service, auth.csrx.ru):
- 2FA/SSO portal with TOTP (Google Authenticator)
- Protects: traefik.csrx.ru, sync.csrx.ru, /god-mode/ in Plane
- Session: 12h expiry, 30m inactivity, Redis backend
- argon2id password hashing

Container security:
- Add security_opt: no-new-privileges to traefik, vaultwarden,
  forgejo, grafana, authelia

CI/CD security:
- Remove hardcoded server IP 87.249.49.32 from workflow
- Use SSH_KNOWN_HOSTS secret instead of ssh-keyscan (prevents MITM)
- Added SSH_KNOWN_HOSTS secret to Forgejo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:44:54 +07:00
6ebd237894 feat: major infrastructure improvements
Some checks failed
CI/CD / deploy (push) Has been cancelled
CI/CD / syntax-check (push) Successful in 1m7s
Reliability:
- Add swap role (2GB, swappiness=10, idempotent via /etc/fstab)
- Add mem_limit to plane-worker (512m) and plane-beat (256m)
- Add health checks to all services (traefik, vaultwarden, forgejo,
  plane-*, syncthing, prometheus, grafana, loki)

Code quality:
- Remove Traefik Docker labels (file provider used, labels were dead code)
- Add comment explaining file provider architecture

Observability:
- Add AlertManager with Telegram notifications
- Add Prometheus alert rules: CPU, RAM, disk, swap, container health
- Add Loki + Promtail for centralized log aggregation
- Add Loki datasource to Grafana
- Enable Traefik /ping endpoint for health checks

Backups:
- Add backup role: pg_dump for forgejo + plane DBs, tar for
  vaultwarden and forgejo data
- 7-day retention, daily cron at 03:00
- Backup script at /usr/local/bin/backup-services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:28:16 +07:00
972a76db4c feat: add monitoring stack (Prometheus + Grafana + cAdvisor + Node Exporter)
All checks were successful
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Successful in 6m51s
- Adds monitoring Docker network (internal)
- Prometheus scrapes node-exporter (host metrics) and cAdvisor (containers)
  with 30-day retention
- Grafana exposed at dashboard.csrx.ru with pre-provisioned datasource
  and two dashboards: Node Exporter Full (1860) and cAdvisor (14282)
- Vault secret: vault_grafana_admin_password

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:05:34 +07:00
efbbc3cac5 fix: add plane-admin, plane-space and configure instance URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 2m31s
CI/CD / deploy (push) Has been cancelled
New Plane stable requires 3 frontend services:
- plane-admin (nginx:80) for /god-mode/ routes
- plane-space (node:3000) for /spaces/ routes
- plane-web (nginx:80) for all other routes

Also add APP/ADMIN/SPACE_BASE_URL env vars to plane-api so the
setup wizard knows where to redirect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 02:14:34 +07:00
66c03ffc04 fix: update plane backend for new stable image requirements
All checks were successful
CI/CD / syntax-check (push) Successful in 2m49s
CI/CD / deploy (push) Successful in 8m54s
makeplane/plane-backend:stable now requires:
- AMQP_URL: Celery broker URL (defaults to amqp://localhost, broken)
  → set to redis://plane-redis:6379/ to reuse existing Redis
- GUNICORN_WORKERS: must be set explicitly (empty string causes crash)
  → set to 2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 01:32:11 +07:00
679d3ed010 fix: update plane-web for nginx-based stable image
All checks were successful
CI/CD / syntax-check (push) Successful in 2m41s
CI/CD / deploy (push) Successful in 11m24s
makeplane/plane-frontend:stable now uses nginx (not Next.js/node).
Remove `command: node web/server.js` override and update Traefik
port from 3000 to 80.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:44:00 +07:00
6a2c38b4bf Fix act_runner: use public Forgejo URL for job container access
Some checks failed
CI/CD / syntax-check (push) Failing after 48s
CI/CD / deploy (push) Has been skipped
Job containers run on runner-jobs network (internet only), so they
can't reach forgejo:3000 (backend-only). Use public https://git.csrx.ru
so both runner and job containers can reach Forgejo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 22:53:25 +07:00
6580e42f53 Fix CI workflow: remove container directive, use runner image directly
Some checks failed
CI/CD / syntax-check (push) Failing after 2m13s
CI/CD / deploy (push) Has been skipped
- Remove container: python:3.12-slim (lacked Node.js for actions/checkout)
- Use runner's ubuntu-latest image which has Node.js + Python pre-installed
- Fix deploy job if condition (remove ${{ }} wrapper)
- Enable debug logging in act_runner config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 22:34:56 +07:00
1d2ba7f7ea Fix act_runner network name to use Docker Compose prefix
Some checks failed
CI/CD / syntax-check (push) Failing after 2s
CI/CD / deploy (push) Has been skipped
Docker Compose prefixes project name to network names:
runner-jobs → services_runner-jobs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 21:54:14 +07:00
d2d5f12d5a Add Forgejo Actions CI/CD with act_runner
Some checks failed
CI/CD / syntax-check (push) Failing after 12s
CI/CD / deploy (push) Has been skipped
- Add gitea/act_runner:0.3.0 to docker-compose stack on runner-jobs network
- Add act_runner config template and directory provisioning
- Add FORGEJO_RUNNER_TOKEN to env template
- Add CI deploy SSH public key to authorized_keys via base role
- Create .forgejo/workflows/deploy.yml: syntax-check on PR, deploy on push to master
- Add .claude/launch.json with ansible-playbook configurations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 21:28:15 +07:00
652737239d Add Forgejo SSH port 2222 and open in UFW
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:43:22 +07:00
a1b97f3e4b Initial commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 19:39:26 +07:00