Commit graph

55 commits

Author SHA1 Message Date
9f85988c1f fix: increase Docmost health check retries to 30 (5 min total)
Some checks failed
CI/CD / deploy (push) Failing after 13m46s
CI/CD / syntax-check (push) Successful in 59s
First deploy needs time for DB migrations and initial setup.
30×10s = 300s gives enough buffer for cold start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:48:25 +07:00
29ba8a64ba fix: remove outline-mcp commented block with undefined Jinja2 vars
Some checks failed
CI/CD / syntax-check (push) Failing after 54s
CI/CD / deploy (push) Has been skipped
Ansible evaluates Jinja2 expressions even in YAML comments, causing
'outline_mcp_image is undefined' error. Removed the entire block since
outline-mcp is no longer relevant (replaced Outline with Docmost).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:39:56 +07:00
472c2b944b feat: replace Outline with Docmost
Some checks failed
CI/CD / syntax-check (push) Successful in 1m0s
CI/CD / deploy (push) Failing after 5m1s
- Replace outline/outline-db/outline-redis with docmost/docmost-db/docmost-redis
- Update Traefik route: wiki → http://docmost:3000
- Update S3 bucket: walava-outline → walava-docmost (new bucket created: 481385)
- Remove env.outline.j2 deploy task (Docmost config is inline in compose)
- Update backup script: outline.sql.gz → docmost.sql.gz
- Update CORS task for walava-docmost bucket
- Add vault_docmost_app_secret + vault_docmost_db_password secrets
- Remove outline_mcp_image (no longer needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:31:51 +07:00
f0c3fbbe1b fix: auto-bootstrap Outline team on fresh install
All checks were successful
CI/CD / deploy (push) Successful in 15m8s
CI/CD / syntax-check (push) Successful in 1m4s
On a fresh DB Outline shows a blank login page because there is no team
and emailSigninEnabled = false. Add idempotent Ansible tasks that:
1. Create the 'Visual' team if none exists
2. Set guestSignin=true so email magic-link login works
Triggered by: server rebuild lost Outline DB (no backup existed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:14:14 +07:00
aa8d5082d3 fix: new CI deploy key + plane-api longer startup timeout
Some checks failed
CI/CD / syntax-check (push) Successful in 1m2s
CI/CD / deploy (push) Failing after 8m13s
- Rotate ci_deploy_pubkey to new ed25519 key (old key lost after
  server rebuild; Forgejo secret SSH_PRIVATE_KEY updated to match)
- Increase plane-api start_period 60s→120s, retries 5→10 to give
  Django time to run DB migrations after backup restore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 08:38:51 +07:00
3b875f57d2 fix: disable discord-bot and walava-web until images exist in registry
Some checks failed
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Failing after 1m39s
These custom images (discord-bot, walava-web) are built by their own
repos' CI/CD and pushed to git.walava.io registry. On a fresh server
Forgejo hasn't run yet so images don't exist — bootstrap chicken/egg.
Re-enable after Forgejo is up and images are pushed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 06:26:14 +07:00
f4688ed8be fix: disable outline-mcp until image is built and pushed to registry
outline-mcp uses git.walava.io/jack/outline-mcp:latest which doesn't
exist in Forgejo registry yet (Forgejo itself wasn't running).
Comment out the service; re-enable after building the image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 06:05:30 +07:00
8a3aaa2fca feat: Terraform infra-as-code + delete mon server + fix S3/Outline
Terraform: imported main (7004701) + tools (7076013) into state,
destroyed mon (7076015, 188.225.79.34). State: No changes.

S3: fix endpoint s3.timeweb.cloud → s3.twcstorage.ru (actual Timeweb
endpoint), remove AWS_S3_ACL=private (Timeweb doesn't support per-object
ACLs — was causing Outline upload failures).

Vault: added vault_timeweb_token.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 04:26:33 +07:00
fde51352d7 feat: migrate monitoring to tools server, fix Outline S3 uploads
Monitoring stack (Prometheus, AlertManager, Grafana, Loki, Uptime Kuma)
moved from main to tools server. Prometheus now scrapes main exporters
over network (ip_main:9100/8080). Promtail pushes logs to ip_tools:3100.
Traefik routes for dash/status.walava.io updated to ip_tools. discord-bot
PROMETHEUS_URL updated to http://ip_tools:9090.

Outline S3 fix: remove AWS_S3_ACL=private (Timeweb doesn't support
per-object ACLs — caused upload failures). Add CORS configuration task
for browser-side presigned uploads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 04:10:28 +07:00
d6015b76a3 fix: add proxy network to Outline and n8n for outbound internet access
Outline needs proxy network for SMTP (Resend) and S3 (Timeweb).
n8n needs proxy network for external API calls in workflows.
Both were only on backend (internal:true) so DNS/TCP to internet was blocked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:54:19 +07:00
36be9fb33d chore: remove SMTP relay, clean up tools role after Outline/n8n migration to main
- Remove smtp-relay (postfix) container — Outline now on main, uses Resend directly
- Remove UFW port 1025 rule (SMTP relay no longer needed)
- Remove postfix-relay from image pull list
- Clean up tools role: remove Outline/n8n/env.j2, simplify tasks/main.yml
- tools docker-compose now empty (pending monitoring migration)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:10:56 +07:00
489791403c feat: migrate Outline + n8n to main server, rename S3 buckets to walava-*
- Add Outline, outline-db, outline-redis, n8n, outline-mcp containers to main docker-compose
- Add env.outline.j2 template with Resend SMTP and S3 (walava-outline bucket)
- Update Traefik routes: wiki → outline:3000, auto → n8n:5678 (local, not cross-server)
- Rename S3 buckets: visual-backup → walava-backup, visual-outline → walava-outline
- Extend backup.sh.j2: add Outline DB, n8n, Plane MinIO to backup scope
- Add outline_image, n8n_image, outline_mcp_image to services/defaults
- Remove Authelia config deployment tasks from configs.yml
- Add outline-internal and n8n-internal networks to docker-compose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 03:04:54 +07:00
fba7eb68ea fix: add SMTP relay on main server for Outline email auth
Some checks failed
CI/CD / deploy (push) Blocked by required conditions
CI/CD / syntax-check (push) Has been cancelled
tools-server (85.193.83.9) has outbound SMTP ports 465/587 blocked by VPS
provider. Added tecnativa/postfix-relay container on main server that relays
to smtp.resend.com:587. Outline now uses ip_main:1025 as SMTP host.

- UFW rule: allow port 1025 from ip_tools only
- Remove stale authelia_image from docker pull list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:35:30 +07:00
d635522199 feat: remove Authelia, protect dashboard with basic auth
Some checks are pending
CI/CD / syntax-check (push) Waiting to run
CI/CD / deploy (push) Blocked by required conditions
Authelia was unused overhead — only traefik-dashboard and plane /god-mode/
were behind it. Dashboard now uses traefik-auth (basic auth). /god-mode/
uses rate-limit-strict only.

Removes: authelia + authelia-redis containers, authelia-internal network,
authelia_data volume, authelia router/service/forwardAuth middleware.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:50:41 +07:00
2770cb61ef fix: CF_DNS_API_TOKEN env var name for Traefik ACME + n8n domain update
Some checks are pending
CI/CD / syntax-check (push) Waiting to run
CI/CD / deploy (push) Blocked by required conditions
- Fix env var CLOUDFLARE_DNS_API_TOKEN → CF_DNS_API_TOKEN (lego requirement)
- n8n env already uses domain_n8n variable (auto.walava.io)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:44:05 +07:00
fb769b2f8c feat: migrate domain from csrx.ru to walava.io
Some checks failed
CI/CD / syntax-check (push) Successful in 1m44s
CI/CD / deploy (push) Failing after 20m21s
- domain_base changed to walava.io
- domain_n8n now auto.walava.io
- Added domain_landing for walava.io root
- Added walava-web landing page container + Traefik route
- Updated Cloudflare token/zone_id for walava.io account
- Updated ACME email to walava@tutamail.com
- Fixed discord-bot image to use domain_base variable
- DNS records already created in Cloudflare

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:17:00 +07:00
c1a71b7f50 fix: add remove_orphans to docker compose tasks
All checks were successful
CI/CD / syntax-check (push) Successful in 1m33s
CI/CD / deploy (push) Successful in 14m0s
Ensures removed services (vaultwarden, mailserver, snappymail)
are automatically stopped on next deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 07:00:17 +07:00
4090d8289b fix: add username/icon_url to Forgejo Discord webhook config
All checks were successful
CI/CD / syntax-check (push) Successful in 1m8s
CI/CD / deploy (push) Successful in 13m10s
Prevents the 'meta json: readObjectStart' error on fresh deploys.
Existing hooks already fixed via direct DB update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 06:10:56 +07:00
4b00804f3e fix: use forgejo_api_token for webhook creation, cover both repos
Some checks failed
CI/CD / syntax-check (push) Successful in 1m6s
CI/CD / deploy (push) Has been cancelled
- Add vault_forgejo_api_token (Personal Access Token with write:repository)
- Ansible task now creates Discord webhook on both jack/infra and jack/discord-bot
- Webhooks already created manually for this deploy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:53:25 +07:00
f3f665a5be fix: add DISCORD_APP_ID env var to discord-bot container
Some checks failed
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:42:55 +07:00
0315ee6a72 feat: add Discord bot service + workflow_dispatch trigger
All checks were successful
CI/CD / syntax-check (push) Successful in 1m5s
CI/CD / deploy (push) Successful in 14m7s
- Add discord-bot container to docker-compose (uses git.csrx.ru registry image)
- Inject DISCORD_BOT_TOKEN via .env, bot accesses Docker socket + Prometheus
- Add vault_discord_bot_{token,app_id,public_key}, aliases in main.yml
- Add workflow_dispatch to deploy.yml so /deploy bot command works

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:27:42 +07:00
1b063c3947 fix(uptime-kuma): add proxy network for internet access to Discord/Telegram
Some checks failed
CI/CD / syntax-check (push) Successful in 1m7s
CI/CD / deploy (push) Has been cancelled
Container was on backend (internal: true) only — couldn't resolve
discord.com for webhook notifications. Added proxy network which
has outbound internet access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:01:27 +07:00
d83ead2cbe feat(discord): integrate alerts and deploy notifications
Some checks failed
CI/CD / syntax-check (push) Successful in 1m3s
CI/CD / deploy (push) Has been cancelled
- Add discord_webhook_alerts and discord_webhook_deploys to vault + main.yml
- AlertManager: send alerts to both Telegram and Discord #alerts channel
- Forgejo: auto-create Discord webhook on repo pushes → #deploys channel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:58:12 +07:00
a620bb381c fix: remove all remaining Vaultwarden references after service removal
Some checks failed
CI/CD / syntax-check (push) Successful in 1m1s
CI/CD / deploy (push) Has been cancelled
- tasks/main.yml: remove vaultwarden_image from image pull list
- tasks/directories.yml: remove vaultwarden/data directory creation
- backup.sh.j2: remove Vaultwarden backup/restore section and stop command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:49:12 +07:00
58e9a0f08b fix: remove vaultwarden_admin_token and DOMAIN_VAULT from env.j2
Some checks failed
CI/CD / syntax-check (push) Successful in 1m3s
CI/CD / deploy (push) Failing after 6m54s
Leftover after Vaultwarden removal caused CI/CD deploy to fail with
'vaultwarden_admin_token is undefined' during .env template rendering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:38:12 +07:00
40c8d291ca fix(plane): add WEB_URL and NEXT_PUBLIC_API_BASE_URL to plane-web container
Some checks failed
CI/CD / syntax-check (push) Successful in 1m6s
CI/CD / deploy (push) Failing after 5m49s
Without these env vars Next.js SSR renders with wrong base URL causing
React hydration error #418 — server/client HTML mismatch on first render.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:13:35 +07:00
75bed6bb04 feat: remove mail stack and Vaultwarden
Some checks failed
CI/CD / syntax-check (push) Successful in 1m15s
CI/CD / deploy (push) Has been cancelled
Removed services:
- docker-mailserver (Postfix + Dovecot)
- SnappyMail webmail
- Vaultwarden password manager

Removed infrastructure:
- certbot + Cloudflare DNS-01 TLS for mx.csrx.ru
- UFW rules for ports 25/587/993/465
- mail-internal and webmail-internal Docker networks
- SMTP config from Outline env
- vault, mail Traefik routes
- All related vault secrets and variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 04:06:29 +07:00
207e1dcff0 chore: project cleanup and docs update
All checks were successful
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Successful in 16m39s
- Remove Syncthing mention from authelia comment in docker-compose
- Fix backup.sh.j2 comment: hourly → every 6 hours
- Update CLAUDE.md: add docs update rule, fix backup schedule note
- Update STATUS.md: dash.csrx.ru fixed, PTR pending, backup schedule, mail hostnames
- Update BACKLOG.md: mark DNS/PTR/backup-schedule done, add SnappyMail domain task
- Update DECISIONS.md: fix backup section (no --storage-class COLD, correct schedule)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:00:35 +07:00
1e638055c8 feat(mail): rename mail→mx, webmail→mail.csrx.ru + reliability
Some checks failed
CI/CD / syntax-check (push) Successful in 1m23s
CI/CD / deploy (push) Has been cancelled
Rename:
- docker-mailserver: hostname mail → mx, OVERRIDE_HOSTNAME → mx.csrx.ru
- Traefik route: webmail/domain_webmail → mail/domain_mail
- domain_webmail removed, domain_mail + domain_mx added to main.yml
- certbot cert: mail.csrx.ru → mx.csrx.ru

Email reliability improvements:
- certbot renewal cron (03:15 + 15:15 daily)
- deploy-hook: auto-reload Postfix+Dovecot after cert renewal
- POSTFIX_MESSAGE_SIZE_LIMIT=26214400 (25 MB)
- SPF hardened: ~all → -all
- DMARC hardened: p=none → p=quarantine, added ruf + fo=1 + adkim/aspf strict
- autodiscover/autoconfig CNAME records for mail client setup
- dns-zone.zone fully updated with architecture comments

Docs:
- STATUS.md: full mail architecture section, client settings, DNS table
- BACKLOG.md: rDNS task + DNS migration steps
- DECISIONS.md: mx/mail split rationale

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:07:59 +07:00
66b70827df chore: full project cleanup + documentation
Some checks failed
CI/CD / syntax-check (push) Successful in 1m31s
CI/CD / deploy (push) Has been cancelled
Syncthing removal (was already decided, now fully removed):
- roles/base/tasks/firewall.yml: remove 3 UFW rules (ports 22000/21027)
- inventory/group_vars/all/main.yml: remove domain_sync, domain_mon, syncthing_basic_auth_htpasswd
- roles/services/templates/env.j2: remove DOMAIN_SYNC
- roles/services/templates/authelia/configuration.yml.j2: remove Syncthing 2FA rule
- roles/services/tasks/directories.yml: remove syncthing/config and syncthing/data dirs
- roles/services/defaults/main.yml: remove syncthing_image
- roles/services/tasks/main.yml: remove syncthing image pull

Security hardening:
- inventory/group_vars/all/main.yml: move cloudflare_zone_id to vault
- inventory/group_vars/all/vault.yml: add vault_cloudflare_zone_id

.gitignore improvements:
- add *.env, acme.json, *.log, editor dirs, venv, temp files

Documentation (new):
- docs/STATUS.md: all services, servers, known issues
- docs/BACKLOG.md: prioritized task list, done/todo
- docs/DECISIONS.md: architecture decisions and rationale
- CLAUDE.md: rewritten with read-first docs, rules, full arch reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 19:58:12 +07:00
644b5b74c1 feat: add SnappyMail webmail and docker-mailserver with full send/receive
Some checks failed
CI/CD / syntax-check (push) Successful in 1m35s
CI/CD / deploy (push) Failing after 17m28s
- Add docker-mailserver (Postfix+Dovecot) with SSL via certbot+Cloudflare DNS-01
- Add SnappyMail webmail client at webmail.csrx.ru (port 8888)
- Open UFW ports 25/465/587/993 on tools server
- Create mail accounts: noreply@, admin@, jack@csrx.ru
- Generate DKIM key and print DNS instructions on first run
- Add Traefik route on main server proxying webmail → tools:8888
- Add all secrets to vault (mailserver passwords, snappymail admin)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 17:21:25 +07:00
2b5524f258 fix: remove promtail nested /var/log/traefik volume mount
All checks were successful
CI/CD / syntax-check (push) Successful in 1m9s
CI/CD / deploy (push) Successful in 15m33s
Docker cannot mount to /var/log/traefik when /var/log is already
bind-mounted (read-only). The nested mount fails with 'read-only
file system' error in the overlay upper layer.

The mount was unused anyway — promtail config only reads syslog,
auth.log, and Docker container logs via the socket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:55:39 +07:00
6279bcb9b4 fix: remove cs-firewall-bouncer from image pre-pull list
Some checks failed
CI/CD / syntax-check (push) Successful in 1m29s
CI/CD / deploy (push) Failing after 9m44s
crowdsecurity/cs-firewall-bouncer:v0.0.31 does not exist on Docker Hub.
The bouncer service was already removed from docker-compose.yml.
Remove from pre-pull list and defaults to unblock CI/CD deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 15:39:31 +07:00
28f8c76433 fix: plane and authelia health check URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 1m21s
CI/CD / deploy (push) Failing after 12m3s
- plane-web/admin: localhost:80 → 127.0.0.1:3000 (nginx listens on 3000)
- plane-space: localhost:3000 → 127.0.0.1:3000/spaces/ (node server needs basename)
- plane-api: localhost:8000/api/ → 127.0.0.1:8000/ (/ returns status OK, /api/ returns 404)
- uptime-kuma: localhost:3001 → curl -sf (wget not available in image)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 14:50:08 +07:00
9ca1177461 fix: crowdsec proxy network, uptime-kuma curl healthcheck, outline en_US, n8n 127.0.0.1
Some checks failed
CI/CD / syntax-check (push) Successful in 1m4s
CI/CD / deploy (push) Failing after 10m46s
- crowdsec: add proxy network for internet access (hub downloads)
- crowdsec-bouncer: remove (image crowdsecurity/cs-firewall-bouncer doesn't exist on Docker Hub)
- uptime-kuma: switch healthcheck from wget to curl (wget not in image)
- outline: fix DEFAULT_LANGUAGE ru_RU → en_US (unsupported locale)
- n8n: fix healthcheck localhost → 127.0.0.1 (IPv6 issue in Alpine)
- alertmanager: config permissions 0644 (was 0640, container couldn't read)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 08:14:07 +07:00
92d2c845d8 feat: add n8n, outline routes, remove syncthing, fix backup awscli
Some checks failed
CI/CD / syntax-check (push) Successful in 1m14s
CI/CD / deploy (push) Failing after 10m51s
- Add n8n to tools server (n8n.csrx.ru)
- Add cross-server Traefik routes: wiki.csrx.ru + n8n.csrx.ru → tools
- Remove Syncthing (replaced by Outline wiki)
- Fix awscli install: download static binary (apt/pip broken on Ubuntu 24.04)
- Add n8n secrets to vault (encryption key + JWT secret)
- Improve CI/CD workflow: syntax-check both playbooks, deploy both servers
- Update site.yml: unified single-command deploy for all servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:19:39 +07:00
c2f9a0c21c feat: wildcard TLS via Cloudflare DNS-01 + real-IP forwarding
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 46s
- Switch Traefik ACME to dnsChallenge (provider: cloudflare)
- Add *.csrx.ru wildcard cert via tls.stores.default.defaultGeneratedCert
- Pass CLOUDFLARE_DNS_API_TOKEN to Traefik via env_file: .env
- Add Cloudflare IP ranges to forwardedHeaders.trustedIPs (real visitor IPs)
- Fix UFW: allow 172.16.0.0/12 on 80/443 so act_runner can reach Forgejo
- Add A records: auth.csrx.ru, status.csrx.ru, csrx.ru root → 87.249.49.32

Result: one *.csrx.ru cert covers all subdomains, auto-renewed by Traefik.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:47:46 +07:00
f183fe485f revert: switch back to HTTP-01 until Cloudflare NS propagation
Some checks failed
CI/CD / syntax-check (push) Successful in 44s
CI/CD / deploy (push) Failing after 39s
DNS-01 + wildcard cert requires Cloudflare to be authoritative NS.
Until propagation completes, use httpChallenge on port 80.

Plan after Cloudflare NS is active:
1. Switch back to dnsChallenge in traefik.yml.j2
2. Re-enable tls.stores.default.defaultGeneratedCert in routes.yml.j2
3. Clear acme.json → Traefik issues *.csrx.ru wildcard cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:18:21 +07:00
0496e9ab61 feat: wildcard TLS certificate *.csrx.ru via Cloudflare DNS-01
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 48s
Add tls.stores.default.defaultGeneratedCert in dynamic config:
- Traefik requests one *.csrx.ru + csrx.ru SAN cert via DNS-01
- All existing and future subdomains use this single cert
- No per-service cert issuance wait when adding new services
- Cert auto-renewed by Traefik ~30 days before expiry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:13:42 +07:00
fccbd1a45a feat: Cloudflare DNS-01 ACME + Docker hardening + sysctl
Some checks failed
CI/CD / syntax-check (push) Successful in 42s
CI/CD / deploy (push) Failing after 52s
Cloudflare DNS-01 ACME:
- Switch Traefik cert resolver from httpChallenge to dnsChallenge
  using Cloudflare provider (resolvers: 1.1.1.1, 1.0.0.1)
- Add CLOUDFLARE_DNS_API_TOKEN env to Traefik container
- Add CF_ZONE_ID + cloudflare_dns_api_token to all/main.yml
- Store API token in Ansible Vault

Docker daemon hardening:
- Add log-driver: json-file with max-size 10m / max-file 3
  (prevents disk fill from unbounded container logs)
- Add live-restore: true (containers survive Docker daemon restart)

Kernel hardening (sysctl):
- New roles/base/tasks/sysctl.yml via ansible.posix.sysctl
- IP spoofing protection (rp_filter)
- Disable ICMP redirects and broadcast pings
- SYN flood protection (syncookies, backlog)
- Disable IPv6 (not used)
- Restrict kernel pointers and dmesg to root
- Disable SysRq, suid core dumps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:06:46 +07:00
e935c897c6 feat: Cloudflare integration — real IP forwarding + firewall lockdown
Some checks failed
CI/CD / syntax-check (push) Successful in 58s
CI/CD / deploy (push) Failing after 43s
Traefik traefik.yml.j2:
- Add forwardedHeaders.trustedIPs with all Cloudflare CIDR ranges
  on both web and websecure entrypoints so rate limiting and
  CrowdSec see real visitor IPs, not Cloudflare proxy IPs

firewall.yml:
- Replace open HTTP/HTTPS rules with per-CIDR allow rules
  scoped to Cloudflare IP ranges only
- Direct access to ports 80/443 bypassing Cloudflare is now blocked

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 04:02:06 +07:00
1f03022086 fix: correct invalid PromQL in ContainerHighMemory alert rule
Some checks failed
CI/CD / syntax-check (push) Successful in 53s
CI/CD / deploy (push) Failing after 57s
Cannot use comparison operators inside label matchers {}.
Move the > 0 filter outside braces as a scalar filter on the
denominator — idiomatic Prometheus way to exclude unlimited containers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:59:56 +07:00
a344998405 feat: add uptime-kuma pull, logrotate deploy task, logrotate package
Some checks failed
CI/CD / syntax-check (push) Successful in 41s
CI/CD / deploy (push) Failing after 39s
- Add uptime_kuma_image to image pull loop in services/tasks/main.yml
- Add logrotate deploy task to services/tasks/configs.yml
- Add logrotate package to base_packages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:54:24 +07:00
aa9706bbc4 feat: comprehensive security hardening
Some checks failed
CI/CD / syntax-check (push) Successful in 43s
CI/CD / deploy (push) Failing after 59s
Traefik:
- Enable access logs → /var/log/traefik/access.log (needed for CrowdSec)
- Add global security headers middleware: HSTS, X-Frame-Options, CSP,
  nosniff, XSS filter, referrer policy, permissions policy
- Add rate limiting: default 100/s, API 30/s, admin 10/s (strict)
- Add Authelia ForwardAuth middleware for SSO integration

CrowdSec (new service):
- Analyzes Traefik access logs + auth.log in real time
- Community IP reputation blocklist (crowdsecurity/traefik + http-cve)
- Firewall bouncer: bans malicious IPs at kernel level (iptables)

Authelia (new service, auth.csrx.ru):
- 2FA/SSO portal with TOTP (Google Authenticator)
- Protects: traefik.csrx.ru, sync.csrx.ru, /god-mode/ in Plane
- Session: 12h expiry, 30m inactivity, Redis backend
- argon2id password hashing

Container security:
- Add security_opt: no-new-privileges to traefik, vaultwarden,
  forgejo, grafana, authelia

CI/CD security:
- Remove hardcoded server IP 87.249.49.32 from workflow
- Use SSH_KNOWN_HOSTS secret instead of ssh-keyscan (prevents MITM)
- Added SSH_KNOWN_HOSTS secret to Forgejo

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:44:54 +07:00
6ebd237894 feat: major infrastructure improvements
Some checks failed
CI/CD / deploy (push) Has been cancelled
CI/CD / syntax-check (push) Successful in 1m7s
Reliability:
- Add swap role (2GB, swappiness=10, idempotent via /etc/fstab)
- Add mem_limit to plane-worker (512m) and plane-beat (256m)
- Add health checks to all services (traefik, vaultwarden, forgejo,
  plane-*, syncthing, prometheus, grafana, loki)

Code quality:
- Remove Traefik Docker labels (file provider used, labels were dead code)
- Add comment explaining file provider architecture

Observability:
- Add AlertManager with Telegram notifications
- Add Prometheus alert rules: CPU, RAM, disk, swap, container health
- Add Loki + Promtail for centralized log aggregation
- Add Loki datasource to Grafana
- Enable Traefik /ping endpoint for health checks

Backups:
- Add backup role: pg_dump for forgejo + plane DBs, tar for
  vaultwarden and forgejo data
- 7-day retention, daily cron at 03:00
- Backup script at /usr/local/bin/backup-services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:28:16 +07:00
972a76db4c feat: add monitoring stack (Prometheus + Grafana + cAdvisor + Node Exporter)
All checks were successful
CI/CD / syntax-check (push) Successful in 3m0s
CI/CD / deploy (push) Successful in 6m51s
- Adds monitoring Docker network (internal)
- Prometheus scrapes node-exporter (host metrics) and cAdvisor (containers)
  with 30-day retention
- Grafana exposed at dashboard.csrx.ru with pre-provisioned datasource
  and two dashboards: Node Exporter Full (1860) and cAdvisor (14282)
- Vault secret: vault_grafana_admin_password

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:05:34 +07:00
efbbc3cac5 fix: add plane-admin, plane-space and configure instance URLs
Some checks failed
CI/CD / syntax-check (push) Successful in 2m31s
CI/CD / deploy (push) Has been cancelled
New Plane stable requires 3 frontend services:
- plane-admin (nginx:80) for /god-mode/ routes
- plane-space (node:3000) for /spaces/ routes
- plane-web (nginx:80) for all other routes

Also add APP/ADMIN/SPACE_BASE_URL env vars to plane-api so the
setup wizard knows where to redirect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 02:14:34 +07:00
66c03ffc04 fix: update plane backend for new stable image requirements
All checks were successful
CI/CD / syntax-check (push) Successful in 2m49s
CI/CD / deploy (push) Successful in 8m54s
makeplane/plane-backend:stable now requires:
- AMQP_URL: Celery broker URL (defaults to amqp://localhost, broken)
  → set to redis://plane-redis:6379/ to reuse existing Redis
- GUNICORN_WORKERS: must be set explicitly (empty string causes crash)
  → set to 2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 01:32:11 +07:00
679d3ed010 fix: update plane-web for nginx-based stable image
All checks were successful
CI/CD / syntax-check (push) Successful in 2m41s
CI/CD / deploy (push) Successful in 11m24s
makeplane/plane-frontend:stable now uses nginx (not Next.js/node).
Remove `command: node web/server.js` override and update Traefik
port from 3000 to 80.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 00:44:00 +07:00
6a2c38b4bf Fix act_runner: use public Forgejo URL for job container access
Some checks failed
CI/CD / syntax-check (push) Failing after 48s
CI/CD / deploy (push) Has been skipped
Job containers run on runner-jobs network (internet only), so they
can't reach forgejo:3000 (backend-only). Use public https://git.csrx.ru
so both runner and job containers can reach Forgejo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 22:53:25 +07:00