Files
outline-sync/AGENTS.md
2026-03-07 20:54:59 +01:00

537 lines
16 KiB
Markdown

# AGENTS.md
This file provides guidance to AI agents (Claude, GPT-4, etc.) when working with this Outline export/import tool repository.
> **For all technical details, architecture info, common pitfalls, and code patterns, see this file.**
## Quick Reference
**Primary Language**: Python 3.11 + Bash
**Key Dependencies**: `requests`, `tqdm`
**Runtime**: Docker containers on `domnet` network
**API Base**: `http://outline:3000` (internal, bypasses SSO)
**Key Features**:
- Export all collections with full document hierarchy
- Import back to Outline preserving structure
- Automatic backups with 90%+ compression
- Dry-run mode for safe testing
- Retry logic for API reliability
## Usage
### Export (Backup)
```bash
# Run the export with tree visualization
./export_with_trees.sh
# Preview without exporting (dry run)
./export_with_trees.sh --dry-run
# Run with verbose output
./export_with_trees.sh -v
```
**Export CLI Options**:
```
--dry-run, -n Preview what would be exported without writing files
--output, -o DIR Output directory (overrides settings.json)
--verbose, -v Increase verbosity (-vv for debug)
--skip-verify Skip post-export verification
--skip-health-check Skip pre-export health check
--settings FILE Path to settings file (default: settings.json)
```
### Import (Restore)
```bash
# Import all collections from outline_export/
./import_to_outline.sh
# Preview what would be imported (no changes made)
./import_to_outline.sh --dry-run
# Import into a single timestamped collection
./import_to_outline.sh --single
# Import from a different directory
./import_to_outline.sh -d exports/
# Overwrite existing collections
./import_to_outline.sh --force
```
**Import CLI Options**:
```
-s, --single Import all into single timestamped collection
-n, --dry-run Preview operations without making changes
-d, --source DIR Source directory (default: outline_export)
-v, --verbose Increase verbosity (-vv for debug)
-f, --force Overwrite existing collections (instead of skip)
--settings FILE Path to settings file (default: settings.json)
-h, --help Show help message
```
### Running Python Scripts Directly
If you need to run the Python scripts directly (e.g., for debugging):
```bash
# Export
docker run --rm --network domnet \
-v "$(pwd):/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run"
# Import
docker run --rm --network domnet \
-v "$(pwd):/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -q requests tqdm && python3 outline_import.py --dry-run"
```
**Note**: The shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`) provide better UX with tree visualization and colored output.
## Agent Operating Guidelines
### 1. Configuration
Settings are in `settings.json`:
```json
{
"source": {
"url": "http://outline:3000",
"token": "your-api-token-here"
},
"export": {
"output_directory": "outline_export"
},
"advanced": {
"max_hierarchy_depth": 100
}
}
```
**Important**: `settings.json` contains secrets (API token) and should **never be committed to git**.
### 2. Architecture Understanding
This tool operates in a **Docker-isolated environment** to bypass Authentik SSO:
- All Python scripts run inside ephemeral Docker containers
- Network: `domnet` bridge allows direct access to Outline's internal API
- No persistent container state - dependencies installed on each run
**Critical Context**:
- The `http://outline:3000` URL only works inside the Docker network
- External access would require SSO authentication through Authentik
- This design is intentional for automated backup/restore operations
#### Export Flow
1. **Health Check**: Verify API connectivity
2. **Fetch Collections**: Via `/api/collections.list`
3. **Build Tree**: Get navigation tree via `/api/collections.documents` (source of truth for hierarchy)
4. **Fetch Content**: Full document content via `/api/documents.info` (with caching)
5. **Export Recursively**: Maintain parent-child structure
6. **Save Metadata**: `_collection_metadata.json` per collection
7. **Create Backup**: Archive previous export to `outline_backup_*.tar.gz`
8. **Verify**: Generate manifest with checksums
#### Import Flow
1. **Health Check**: Verify API connectivity
2. **Load Metadata**: Read `_collection_metadata.json` from each collection directory
3. **Build Tree**: Reconstruct document hierarchy from metadata
4. **Create Collections**: Via `/api/collections.create`
5. **Create Documents**: Via `/api/documents.create` with proper `parentDocumentId`
6. **Map IDs**: Track old IDs → new IDs to maintain hierarchy
7. **Display Progress**: Tree-style output with status indicators
#### Core Components Pipelines
**Export Pipeline:**
```
export_with_trees.sh → Docker container → outline_export_fixed.py
Fetches collections → Builds document tree → Exports markdown + metadata
Creates backup → Verifies integrity → Displays summary
```
**Import Pipeline:**
```
import_to_outline.sh → Docker container → outline_import.py
Reads metadata → Validates structure → Creates collections
Uploads documents → Maintains hierarchy → Reports status
```
### 3. Import Modes
Each subdirectory becomes a separate collection:
```
outline_export/
├── Bewerbungen/ → Creates "Bewerbungen" collection
├── Projekte/ → Creates "Projekte" collection
└── Privat/ → Creates "Privat" collection
```
#### Single Collection (`--single`)
All content goes into one timestamped collection:
```
outline_export/
├── Bewerbungen/ → Becomes parent doc "Bewerbungen"
├── Projekte/ → Becomes parent doc "Projekte"
└── Privat/ → Becomes parent doc "Privat"
All imported into: "import_20260119_143052" collection
```
### 4. Behavior & Duplicate Handling
#### Duplicate Handling
| Scenario | Default Behavior | With `--force` |
|----------|------------------|----------------|
| Collection exists | Skip entire collection | Delete and recreate |
| Document exists | Skip document | Update document |
#### Error Handling
**Import Errors**:
- **API connection failure**: Abort with error message
- **Collection creation fails**: Abort that collection, continue others
- **Document creation fails**: Log error, continue with siblings
- **Missing markdown file**: Log warning, skip document
- **Parent not found**: Create as root-level document
**Export Errors**:
- **API connection failure**: Abort before starting
- **Collection fetch fails**: Skip that collection, continue
- **Document fetch fails**: Retry 3x with backoff, then skip
- **Disk write fails**: Abort with error message
#### Rate Limiting
If Outline API returns 429 errors:
- Automatic retry with exponential backoff
- Up to 3 retry attempts per request
- Configurable delay between retries
#### Important Features & Behaviors
**Backup System**:
- Each export automatically backs up previous exports to `outline_backup_YYYYMMDD_HHMMSS.tar.gz`
- Old uncompressed export directory is deleted after backup
- Backups achieve **90%+ compression** on markdown content
- Safe to re-run exports - previous data is always preserved
**Reliability Features**:
- **Health check**: Verifies API connectivity before operations
- **Retry logic**: Failed API requests retry up to 3 times with exponential backoff
- **Caching**: Document content cached during single run to reduce API calls
- **Logging**: Structured logging with configurable verbosity levels (-v, -vv)
**Hierarchy Integrity**:
- The navigation tree (`/api/collections.documents`) is the **source of truth** for document hierarchy
- Import maintains parent-child relationships via `parentDocumentId` mapping
- Document counting is recursive to include all nested children
- Maximum depth limit (default: 100) prevents infinite recursion
### 5. File Structure Knowledge
```
outline-tools/
├── export_with_trees.sh # Main export entrypoint
#### Dry Run Testing
```bash
# Test export without writing files
./export_with_trees.sh --dry-run
# Test import without creating collections
./import_to_outline.sh --dry-run
```
#### Verification Checklist
- [ ] Health check passes before export/import
- [ ] Document count matches (compare tree output)
- [ ] Hierarchy preserved (check parent-child relationships)
- [ ] Metadata files valid JSON
- [ ] No API errors in logs
- [ ] Backup created successfully (export only)
### 8. Troubleshooting & Debug Mode
#### Common Issues
**"Connection refused" or "Name resolution failed"**
- **Cause**: Not running inside `domnet` Docker network
- **Solution**: Always use shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`)
**"Authorization failed" or 401/403 errors**
- **Cause**: Invalid or expired API token
- **Solution**: Update token in `settings.json`
**Documents appear at wrong hierarchy level after import**
- **Cause**: Metadata corruption or `parentDocumentId` mapping issue
- **Solution**: Re-export, verify `_collection_metadata.json` integrity, check `id_mapping` dictionary
**Import creates duplicate collections**
- **Cause**: Collection names differ (case, spaces, special chars)
- **Solution**: Use `--force` to replace, or manually delete old collections
**API returns 429 errors**
- **Cause**: Rate limiting from too many API requests
- **Solution**: Built-in retry logic handles this - increase `RETRY_DELAY` if persistent
#### Debug Mode
Run with `-vv` for detailed debug output:
```bash
./export_with_trees.sh -vv
./import_to_outline.sh -vv
```
This shows:
- Full API requests and responses
- Document ID mappings
- File operations
- Retry attempts
#### Quick Diagnostics
```bash
# Test API connectivity
curl -H "Authorization: Bearer $TOKEN" http://outline:3000/api/collections.list
# Check Docker network
docker network inspect domnet
# Run with verbose logging
./export_with_trees.sh -vv
```
### 9. Extending the Tool
#### Adding New CLI Options
**Bash wrapper** (`export_with_trees.sh`):
```bash
# Add option parsing
while [[ $# -gt 0 ]]; do
case $1 in
--my-option)
MY_OPTION="$2"
shift 2
;;
```
**Python script** (`outline_export_fixed.py`):
```python
# Add argument parser
parser.add_argument('--my-option', help='Description')
# Pass to Docker
docker_cmd="... python3 outline_export_fixed.py $@"
```
#### Adding New Export Formats
1. Create format converter function in `outline_export_fixed.py`
2. Add format option to CLI
3. Modify `write_document_to_file()` to call converter
4. Update metadata to track format
#### Custom Filtering
Add filter configuration to `settings.json`:
```json
{
"export": {
"filters": {
"exclude_tags": ["draft", "private"],
"include_collections": ["Public", "Docs"]
}
}
}
```
Then implement in `OutlineExporter.should_export_document()`.
### 10. Error Recovery
#### Partial Export Recovery
If export crashes mid-run:
1. Previous export is already backed up (if existed)
2. Partial export in `outline_export/` may be incomplete
3. Safe to re-run - will overwrite partial data
4. Check `manifest.json` to see what completed
#### Failed Import Recovery
If import fails partway:
1. Successfully created collections remain in Outline
2. Use `--force` to delete and retry, OR
3. Manually delete collections from Outline UI
4. Check logs for document ID where failure occurred
### 11. Performance Optimization
#### Reducing API Calls
- **Caching**: Document content cached during single run
- **Batching**: Not currently implemented (future enhancement)
- **Parallelization**: Not safe due to Outline API rate limits
#### Faster Exports
- Skip verification: `--skip-verify`
- Skip health check: `--skip-health-check` (risky)
- Reduce hierarchy depth: Adjust `max_hierarchy_depth` in settings
#### Faster Imports
- Single collection mode: `--single` (fewer collection creates)
- Disable verbose logging (default)
### 12. Security Considerations
#### Secrets Management
- `settings.json` contains API token
- **Never log** the token value
- **Never commit** `settings.json` to git
- Backups may contain sensitive content
#### Safe Practices
```bash
# Check git status before committing
git status
# Verify settings.json is ignored
grep settings.json .gitignore
# Sanitize logs before sharing
sed 's/Bearer [A-Za-z0-9_-]*/Bearer [REDACTED]/g' logs.txt
```
### 13. Common Agent Mistakes to Avoid
1. **Don't suggest running Python directly** - Always use Docker wrappers
2. **Don't hardcode the API URL** - It's environment-specific (use settings.json)
3. **Don't assume external API access** - Only works inside `domnet`
4. **Don't ignore dry-run mode** - Always test changes with `--dry-run` first
5. **Don't modify hierarchy logic lightly** - Parent-child relationships are fragile
6. **Don't skip error handling** - API can fail intermittently
7. **Don't forget to update both export and import** - Changes often affect both sides
### 14. Useful Code Patterns
#### Making Authenticated API Calls
```python
headers = {
"Authorization": f"Bearer {self.api_token}",
"Content-Type": "application/json"
}
response = requests.post(
f"{self.api_url}/api/endpoint",
json=payload,
headers=headers,
timeout=30
)
response.raise_for_status()
data = response.json()
```
#### Recursive Tree Traversal
```python
def process_tree(node, parent_id=None):
doc_id = node["id"]
process_document(doc_id, parent_id)
for child in node.get("children", []):
process_tree(child, doc_id)
```
#### Progress Display with tqdm
```python
from tqdm import tqdm
with tqdm(total=total_docs, desc="Exporting") as pbar:
for doc in documents:
process(doc)
pbar.update(1)
```
### 15. When to Ask for Clarification
Ask the user if:
- They want to modify API authentication method
- They need to export to a different Outline instance
- They want to filter by specific criteria not in settings
- They experience persistent API errors (might be Outline-specific issue)
- They need to handle very large wikis (>10,000 documents)
- They want to schedule automated backups (needs cron/systemd setup)
### 16. Recommended Improvements (Future)
Ideas for enhancing the tool:
- **Incremental exports**: Only export changed documents
- **Parallel imports**: Speed up large imports (carefully!)
- **Format converters**: Export to Notion, Confluence, etc.
- **Diff tool**: Compare exported versions
- **Search index**: Build searchable archive
- **Version history**: Track document changes over time
---
## Quick Decision Tree
```
User wants to modify the tool:
├─ Change export filtering? → Edit outline_export_fixed.py
├─ Change import behavior? → Edit outline_import.py
├─ Add CLI option? → Edit .sh wrapper + .py script
├─ Change output format? → Edit write_document_to_file()
├─ Fix API error? → Check retry logic and error handling
└─ Add new feature? → Review both export and import sides
User reports an error:
├─ Connection refused? → Check Docker network
├─ Auth error? → Verify API token in settings.json
├─ Hierarchy wrong? → Check id_mapping in import
├─ Missing documents? → Compare counts, check filters
└─ JSON error? → Validate metadata files
User wants to understand:
├─ How it works? → Refer to CLAUDE.md
├─ How to use? → Show CLI examples
├─ How to extend? → Point to sections 9-10 above
└─ How to troubleshoot? → Use section 8 checklist
```
---
## Additional Resources
- **Outline API Docs**: https://www.getoutline.com/developers
- **Python requests**: https://requests.readthedocs.io/
- **Docker networks**: https://docs.docker.com/network/
- **tqdm progress bars**: https://tqdm.github.io/
## Agent Self-Check
Before suggesting changes:
- [ ] Have I read the architecture section?
- [ ] Do I understand the Docker network requirement?
- [ ] Have I considered both export and import sides?
- [ ] Will my change maintain hierarchy integrity?
- [ ] Have I suggested testing with --dry-run?
- [ ] Have I checked for security implications?
- [ ] Is my suggestion compatible with Docker execution?