HEARTBEAT.md - System Health Checklist
HEARTBEAT.md document
Updated: 2026-03-07 02:45
# HEARTBEAT.md - System Health Checklist
## Purpose
Run this checklist periodically to monitor system health and flag issues.
---
## Quick Scan
### 1. Database Health
- [ ] Connection working?
- [ ] Storage below 80%?
- [ ] Recent query errors?
### 2. Running Jobs
- [ ] Email extraction still running? (`ps aux | grep python`)
- [ ] Any failed jobs needing retry?
### 3. API Status
- [ ] Zoho tokens valid?
- [ ] OpenAI API working?
### 4. Backups
- [ ] Last backup successful?
- [ ] GitHub push working?
---
## Urgent Triggers (Report Immediately)
⚠️ Database connection failures
⚠️ API authentication errors
⚠️ Storage >90% capacity
⚠️ Failed backups
⚠️ Security incidents
---
## Check Commands
```bash
# Database connection
python3 -c "import psycopg2; from dotenv import load_dotenv; import os; load_dotenv('/opt/zoho-extract/.env'); conn = psycopg2.connect(host=os.getenv('DB_HOST'), port=os.getenv('DB_PORT'), dbname=os.getenv('DB_NAME'), user=os.getenv('DB_USER'), password=os.getenv('DB_PASSWORD'), sslmode='require'); print('DB OK')"
# Running processes
ps aux | grep python
# Disk space
df -h /
# Recent backups
ls -lh /backups/*.tar.gz | tail -5
# Email extraction progress
python3 -c "import os, psycopg2; from dotenv import load_dotenv; load_dotenv('/opt/zoho-extract/.env'); conn = psycopg2.connect(host=os.getenv('DB_HOST'), port=os.getenv('DB_PORT'), dbname=os.getenv('DB_NAME'), user=os.getenv('DB_USER'), password=os.getenv('DB_PASSWORD'), sslmode='require'); cur = conn.cursor(); cur.execute('SELECT COUNT(*) FROM zoho.crm_emails WHERE content IS NOT NULL'); print(f'Emails: {cur.fetchone()[0]}/42227')"
```
---
## Response Protocol
- **HEARTBEAT_OK**: Everything nominal
- **ACTION_REQUIRED**: Flag specific issues
- **URGENT**: Critical failures needing immediate attention
---
## Notes
- Email extraction running in background (started 2026-02-17 ~15:30)
- Expected completion: ~4-5 hours from start
- Daily backups at 2 AM