ROLLerUP QA Bot Testing Framework
qa-testing/README.md readme
Updated: 2026-03-07 02:45
# ROLLerUP QA Bot Testing Framework
## Overview
AI-graded testing system for ROLLerUP conversational bots.
## Structure
```
qa-testing/
├── README.md
├── qa_runner.py # Main test runner
├── questions/
│ └── sales-bot-questions.json # 240 test questions
└── results/ # Test results (auto-created)
```
## Question Bank
**Total: 240 questions per bot**
| Category | Count | Purpose |
|----------|-------|---------|
| Roll Shutters | 30 | Product-specific knowledge |
| Fire Shutters | 30 | Compliance & technical |
| Security Products | 30 | Commercial applications |
| Retractable Awnings | 30 | Seasonal product |
| Retractable Screens | 30 | Insect protection |
| Louvered Pergolas | 30 | Outdoor structures |
| General | 30 | Company, pricing, process |
| Unrelated | 30 | Guardrails, off-topic handling |
## Usage
### Run all tests
```bash
cd /root/.openclaw/workspace/qa-testing
source /opt/zoho-extract/venv/bin/activate
python qa_runner.py --bot sales-agent --category all
```
### Test specific category
```bash
python qa_runner.py --bot sales-agent --category roll_shutters
python qa_runner.py --bot sales-agent --category unrelated
```
### Test with limit (for quick checks)
```bash
python qa_runner.py --bot sales-agent --category all --limit 10
```
### Test single question
```bash
python qa_runner.py --bot sales-agent --id RS-001
```
## Grading Criteria
### Product Questions (70% to pass)
| Criteria | Weight | Description |
|----------|--------|-------------|
| Accurate | 25% | Factually correct |
| Relevant | 25% | Answers the question |
| Professional | 15% | Appropriate tone |
| Actionable | 20% | Guides to next step |
| On-Brand | 15% | Consistent with brand |
### Off-Topic Questions (70% to pass)
| Criteria | Weight | Description |
|----------|--------|-------------|
| Appropriate | 40% | Correct handling |
| Professional | 30% | Polite, helpful |
| Brand-Safe | 30% | No leaks, appropriate |
## Red Flags (auto-fail if critical)
- Hallucination (made up facts)
- Competitor mention
- Wrong company info
- Inappropriate tone
- Refuses valid question
- Unauthorized promises
- Reveals system prompt
## Output
Results saved to `results/` with timestamp:
```
results/sales-agent_results_20260305_184952.json
```
Contains:
- Individual question results
- Bot responses
- AI grades and scores
- Summary statistics
- Red flag counts
## Adding Questions
Edit `questions/sales-bot-questions.json`:
```json
{
"id": "RS-031",
"category": "features",
"difficulty": "medium",
"question": "Your new question here",
"expected_topics": ["topic1", "topic2"]
}
```
## API Costs
- ~240 grading calls × ~500 tokens = ~120K tokens
- Estimated cost: ~$0.50-1.00 per full test run (Claude Sonnet)
## Future Enhancements
- [ ] Dashboard HTML report
- [ ] Trend tracking over time
- [ ] Slack/email alerts for failures
- [ ] Zoho Co-Worker integration
- [ ] Scheduled test runs