537 lines
16 KiB
Markdown
537 lines
16 KiB
Markdown
# AGENTS.md
|
|
|
|
This file provides guidance to AI agents (Claude, GPT-4, etc.) when working with this Outline export/import tool repository.
|
|
|
|
> **For all technical details, architecture info, common pitfalls, and code patterns, see this file.**
|
|
|
|
## Quick Reference
|
|
|
|
**Primary Language**: Python 3.11 + Bash
|
|
**Key Dependencies**: `requests`, `tqdm`
|
|
**Runtime**: Docker containers on `domnet` network
|
|
**API Base**: `http://outline:3000` (internal, bypasses SSO)
|
|
|
|
**Key Features**:
|
|
- Export all collections with full document hierarchy
|
|
- Import back to Outline preserving structure
|
|
- Automatic backups with 90%+ compression
|
|
- Dry-run mode for safe testing
|
|
- Retry logic for API reliability
|
|
|
|
## Usage
|
|
|
|
### Export (Backup)
|
|
|
|
```bash
|
|
# Run the export with tree visualization
|
|
./export_with_trees.sh
|
|
|
|
# Preview without exporting (dry run)
|
|
./export_with_trees.sh --dry-run
|
|
|
|
# Run with verbose output
|
|
./export_with_trees.sh -v
|
|
```
|
|
|
|
**Export CLI Options**:
|
|
```
|
|
--dry-run, -n Preview what would be exported without writing files
|
|
--output, -o DIR Output directory (overrides settings.json)
|
|
--verbose, -v Increase verbosity (-vv for debug)
|
|
--skip-verify Skip post-export verification
|
|
--skip-health-check Skip pre-export health check
|
|
--settings FILE Path to settings file (default: settings.json)
|
|
```
|
|
|
|
### Import (Restore)
|
|
|
|
```bash
|
|
# Import all collections from outline_export/
|
|
./import_to_outline.sh
|
|
|
|
# Preview what would be imported (no changes made)
|
|
./import_to_outline.sh --dry-run
|
|
|
|
# Import into a single timestamped collection
|
|
./import_to_outline.sh --single
|
|
|
|
# Import from a different directory
|
|
./import_to_outline.sh -d exports/
|
|
|
|
# Overwrite existing collections
|
|
./import_to_outline.sh --force
|
|
```
|
|
|
|
**Import CLI Options**:
|
|
```
|
|
-s, --single Import all into single timestamped collection
|
|
-n, --dry-run Preview operations without making changes
|
|
-d, --source DIR Source directory (default: outline_export)
|
|
-v, --verbose Increase verbosity (-vv for debug)
|
|
-f, --force Overwrite existing collections (instead of skip)
|
|
--settings FILE Path to settings file (default: settings.json)
|
|
-h, --help Show help message
|
|
```
|
|
|
|
### Running Python Scripts Directly
|
|
|
|
If you need to run the Python scripts directly (e.g., for debugging):
|
|
|
|
```bash
|
|
# Export
|
|
docker run --rm --network domnet \
|
|
-v "$(pwd):/work" \
|
|
-w /work \
|
|
python:3.11-slim \
|
|
bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run"
|
|
|
|
# Import
|
|
docker run --rm --network domnet \
|
|
-v "$(pwd):/work" \
|
|
-w /work \
|
|
python:3.11-slim \
|
|
bash -c "pip install -q requests tqdm && python3 outline_import.py --dry-run"
|
|
```
|
|
|
|
**Note**: The shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`) provide better UX with tree visualization and colored output.
|
|
|
|
## Agent Operating Guidelines
|
|
|
|
### 1. Configuration
|
|
|
|
Settings are in `settings.json`:
|
|
|
|
```json
|
|
{
|
|
"source": {
|
|
"url": "http://outline:3000",
|
|
"token": "your-api-token-here"
|
|
},
|
|
"export": {
|
|
"output_directory": "outline_export"
|
|
},
|
|
"advanced": {
|
|
"max_hierarchy_depth": 100
|
|
}
|
|
}
|
|
```
|
|
|
|
**Important**: `settings.json` contains secrets (API token) and should **never be committed to git**.
|
|
|
|
### 2. Architecture Understanding
|
|
|
|
This tool operates in a **Docker-isolated environment** to bypass Authentik SSO:
|
|
- All Python scripts run inside ephemeral Docker containers
|
|
- Network: `domnet` bridge allows direct access to Outline's internal API
|
|
- No persistent container state - dependencies installed on each run
|
|
|
|
**Critical Context**:
|
|
- The `http://outline:3000` URL only works inside the Docker network
|
|
- External access would require SSO authentication through Authentik
|
|
- This design is intentional for automated backup/restore operations
|
|
|
|
#### Export Flow
|
|
|
|
1. **Health Check**: Verify API connectivity
|
|
2. **Fetch Collections**: Via `/api/collections.list`
|
|
3. **Build Tree**: Get navigation tree via `/api/collections.documents` (source of truth for hierarchy)
|
|
4. **Fetch Content**: Full document content via `/api/documents.info` (with caching)
|
|
5. **Export Recursively**: Maintain parent-child structure
|
|
6. **Save Metadata**: `_collection_metadata.json` per collection
|
|
7. **Create Backup**: Archive previous export to `outline_backup_*.tar.gz`
|
|
8. **Verify**: Generate manifest with checksums
|
|
|
|
#### Import Flow
|
|
|
|
1. **Health Check**: Verify API connectivity
|
|
2. **Load Metadata**: Read `_collection_metadata.json` from each collection directory
|
|
3. **Build Tree**: Reconstruct document hierarchy from metadata
|
|
4. **Create Collections**: Via `/api/collections.create`
|
|
5. **Create Documents**: Via `/api/documents.create` with proper `parentDocumentId`
|
|
6. **Map IDs**: Track old IDs → new IDs to maintain hierarchy
|
|
7. **Display Progress**: Tree-style output with status indicators
|
|
|
|
#### Core Components Pipelines
|
|
|
|
**Export Pipeline:**
|
|
```
|
|
export_with_trees.sh → Docker container → outline_export_fixed.py
|
|
↓
|
|
Fetches collections → Builds document tree → Exports markdown + metadata
|
|
↓
|
|
Creates backup → Verifies integrity → Displays summary
|
|
```
|
|
|
|
**Import Pipeline:**
|
|
```
|
|
import_to_outline.sh → Docker container → outline_import.py
|
|
↓
|
|
Reads metadata → Validates structure → Creates collections
|
|
↓
|
|
Uploads documents → Maintains hierarchy → Reports status
|
|
```
|
|
|
|
### 3. Import Modes
|
|
|
|
Each subdirectory becomes a separate collection:
|
|
|
|
```
|
|
outline_export/
|
|
├── Bewerbungen/ → Creates "Bewerbungen" collection
|
|
├── Projekte/ → Creates "Projekte" collection
|
|
└── Privat/ → Creates "Privat" collection
|
|
```
|
|
|
|
#### Single Collection (`--single`)
|
|
|
|
All content goes into one timestamped collection:
|
|
|
|
```
|
|
outline_export/
|
|
├── Bewerbungen/ → Becomes parent doc "Bewerbungen"
|
|
├── Projekte/ → Becomes parent doc "Projekte"
|
|
└── Privat/ → Becomes parent doc "Privat"
|
|
|
|
All imported into: "import_20260119_143052" collection
|
|
```
|
|
|
|
### 4. Behavior & Duplicate Handling
|
|
|
|
#### Duplicate Handling
|
|
|
|
| Scenario | Default Behavior | With `--force` |
|
|
|----------|------------------|----------------|
|
|
| Collection exists | Skip entire collection | Delete and recreate |
|
|
| Document exists | Skip document | Update document |
|
|
|
|
#### Error Handling
|
|
|
|
**Import Errors**:
|
|
- **API connection failure**: Abort with error message
|
|
- **Collection creation fails**: Abort that collection, continue others
|
|
- **Document creation fails**: Log error, continue with siblings
|
|
- **Missing markdown file**: Log warning, skip document
|
|
- **Parent not found**: Create as root-level document
|
|
|
|
**Export Errors**:
|
|
- **API connection failure**: Abort before starting
|
|
- **Collection fetch fails**: Skip that collection, continue
|
|
- **Document fetch fails**: Retry 3x with backoff, then skip
|
|
- **Disk write fails**: Abort with error message
|
|
|
|
#### Rate Limiting
|
|
|
|
If Outline API returns 429 errors:
|
|
- Automatic retry with exponential backoff
|
|
- Up to 3 retry attempts per request
|
|
- Configurable delay between retries
|
|
|
|
#### Important Features & Behaviors
|
|
|
|
**Backup System**:
|
|
- Each export automatically backs up previous exports to `outline_backup_YYYYMMDD_HHMMSS.tar.gz`
|
|
- Old uncompressed export directory is deleted after backup
|
|
- Backups achieve **90%+ compression** on markdown content
|
|
- Safe to re-run exports - previous data is always preserved
|
|
|
|
**Reliability Features**:
|
|
- **Health check**: Verifies API connectivity before operations
|
|
- **Retry logic**: Failed API requests retry up to 3 times with exponential backoff
|
|
- **Caching**: Document content cached during single run to reduce API calls
|
|
- **Logging**: Structured logging with configurable verbosity levels (-v, -vv)
|
|
|
|
**Hierarchy Integrity**:
|
|
- The navigation tree (`/api/collections.documents`) is the **source of truth** for document hierarchy
|
|
- Import maintains parent-child relationships via `parentDocumentId` mapping
|
|
- Document counting is recursive to include all nested children
|
|
- Maximum depth limit (default: 100) prevents infinite recursion
|
|
|
|
### 5. File Structure Knowledge
|
|
|
|
```
|
|
outline-tools/
|
|
├── export_with_trees.sh # Main export entrypoint
|
|
|
|
#### Dry Run Testing
|
|
```bash
|
|
# Test export without writing files
|
|
./export_with_trees.sh --dry-run
|
|
|
|
# Test import without creating collections
|
|
./import_to_outline.sh --dry-run
|
|
```
|
|
|
|
#### Verification Checklist
|
|
- [ ] Health check passes before export/import
|
|
- [ ] Document count matches (compare tree output)
|
|
- [ ] Hierarchy preserved (check parent-child relationships)
|
|
- [ ] Metadata files valid JSON
|
|
- [ ] No API errors in logs
|
|
- [ ] Backup created successfully (export only)
|
|
|
|
### 8. Troubleshooting & Debug Mode
|
|
|
|
#### Common Issues
|
|
|
|
**"Connection refused" or "Name resolution failed"**
|
|
- **Cause**: Not running inside `domnet` Docker network
|
|
- **Solution**: Always use shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`)
|
|
|
|
**"Authorization failed" or 401/403 errors**
|
|
- **Cause**: Invalid or expired API token
|
|
- **Solution**: Update token in `settings.json`
|
|
|
|
**Documents appear at wrong hierarchy level after import**
|
|
- **Cause**: Metadata corruption or `parentDocumentId` mapping issue
|
|
- **Solution**: Re-export, verify `_collection_metadata.json` integrity, check `id_mapping` dictionary
|
|
|
|
**Import creates duplicate collections**
|
|
- **Cause**: Collection names differ (case, spaces, special chars)
|
|
- **Solution**: Use `--force` to replace, or manually delete old collections
|
|
|
|
**API returns 429 errors**
|
|
- **Cause**: Rate limiting from too many API requests
|
|
- **Solution**: Built-in retry logic handles this - increase `RETRY_DELAY` if persistent
|
|
|
|
#### Debug Mode
|
|
|
|
Run with `-vv` for detailed debug output:
|
|
|
|
```bash
|
|
./export_with_trees.sh -vv
|
|
./import_to_outline.sh -vv
|
|
```
|
|
|
|
This shows:
|
|
- Full API requests and responses
|
|
- Document ID mappings
|
|
- File operations
|
|
- Retry attempts
|
|
|
|
#### Quick Diagnostics
|
|
|
|
```bash
|
|
# Test API connectivity
|
|
curl -H "Authorization: Bearer $TOKEN" http://outline:3000/api/collections.list
|
|
|
|
# Check Docker network
|
|
docker network inspect domnet
|
|
|
|
# Run with verbose logging
|
|
./export_with_trees.sh -vv
|
|
```
|
|
|
|
### 9. Extending the Tool
|
|
|
|
#### Adding New CLI Options
|
|
|
|
**Bash wrapper** (`export_with_trees.sh`):
|
|
```bash
|
|
# Add option parsing
|
|
while [[ $# -gt 0 ]]; do
|
|
case $1 in
|
|
--my-option)
|
|
MY_OPTION="$2"
|
|
shift 2
|
|
;;
|
|
```
|
|
|
|
**Python script** (`outline_export_fixed.py`):
|
|
```python
|
|
# Add argument parser
|
|
parser.add_argument('--my-option', help='Description')
|
|
|
|
# Pass to Docker
|
|
docker_cmd="... python3 outline_export_fixed.py $@"
|
|
```
|
|
|
|
#### Adding New Export Formats
|
|
|
|
1. Create format converter function in `outline_export_fixed.py`
|
|
2. Add format option to CLI
|
|
3. Modify `write_document_to_file()` to call converter
|
|
4. Update metadata to track format
|
|
|
|
#### Custom Filtering
|
|
|
|
Add filter configuration to `settings.json`:
|
|
```json
|
|
{
|
|
"export": {
|
|
"filters": {
|
|
"exclude_tags": ["draft", "private"],
|
|
"include_collections": ["Public", "Docs"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Then implement in `OutlineExporter.should_export_document()`.
|
|
|
|
### 10. Error Recovery
|
|
|
|
#### Partial Export Recovery
|
|
If export crashes mid-run:
|
|
1. Previous export is already backed up (if existed)
|
|
2. Partial export in `outline_export/` may be incomplete
|
|
3. Safe to re-run - will overwrite partial data
|
|
4. Check `manifest.json` to see what completed
|
|
|
|
#### Failed Import Recovery
|
|
If import fails partway:
|
|
1. Successfully created collections remain in Outline
|
|
2. Use `--force` to delete and retry, OR
|
|
3. Manually delete collections from Outline UI
|
|
4. Check logs for document ID where failure occurred
|
|
|
|
### 11. Performance Optimization
|
|
|
|
#### Reducing API Calls
|
|
- **Caching**: Document content cached during single run
|
|
- **Batching**: Not currently implemented (future enhancement)
|
|
- **Parallelization**: Not safe due to Outline API rate limits
|
|
|
|
#### Faster Exports
|
|
- Skip verification: `--skip-verify`
|
|
- Skip health check: `--skip-health-check` (risky)
|
|
- Reduce hierarchy depth: Adjust `max_hierarchy_depth` in settings
|
|
|
|
#### Faster Imports
|
|
- Single collection mode: `--single` (fewer collection creates)
|
|
- Disable verbose logging (default)
|
|
|
|
### 12. Security Considerations
|
|
|
|
#### Secrets Management
|
|
- `settings.json` contains API token
|
|
- **Never log** the token value
|
|
- **Never commit** `settings.json` to git
|
|
- Backups may contain sensitive content
|
|
|
|
#### Safe Practices
|
|
```bash
|
|
# Check git status before committing
|
|
git status
|
|
|
|
# Verify settings.json is ignored
|
|
grep settings.json .gitignore
|
|
|
|
# Sanitize logs before sharing
|
|
sed 's/Bearer [A-Za-z0-9_-]*/Bearer [REDACTED]/g' logs.txt
|
|
```
|
|
|
|
### 13. Common Agent Mistakes to Avoid
|
|
|
|
1. **Don't suggest running Python directly** - Always use Docker wrappers
|
|
2. **Don't hardcode the API URL** - It's environment-specific (use settings.json)
|
|
3. **Don't assume external API access** - Only works inside `domnet`
|
|
4. **Don't ignore dry-run mode** - Always test changes with `--dry-run` first
|
|
5. **Don't modify hierarchy logic lightly** - Parent-child relationships are fragile
|
|
6. **Don't skip error handling** - API can fail intermittently
|
|
7. **Don't forget to update both export and import** - Changes often affect both sides
|
|
|
|
### 14. Useful Code Patterns
|
|
|
|
#### Making Authenticated API Calls
|
|
```python
|
|
headers = {
|
|
"Authorization": f"Bearer {self.api_token}",
|
|
"Content-Type": "application/json"
|
|
}
|
|
response = requests.post(
|
|
f"{self.api_url}/api/endpoint",
|
|
json=payload,
|
|
headers=headers,
|
|
timeout=30
|
|
)
|
|
response.raise_for_status()
|
|
data = response.json()
|
|
```
|
|
|
|
#### Recursive Tree Traversal
|
|
```python
|
|
def process_tree(node, parent_id=None):
|
|
doc_id = node["id"]
|
|
process_document(doc_id, parent_id)
|
|
|
|
for child in node.get("children", []):
|
|
process_tree(child, doc_id)
|
|
```
|
|
|
|
#### Progress Display with tqdm
|
|
```python
|
|
from tqdm import tqdm
|
|
|
|
with tqdm(total=total_docs, desc="Exporting") as pbar:
|
|
for doc in documents:
|
|
process(doc)
|
|
pbar.update(1)
|
|
```
|
|
|
|
### 15. When to Ask for Clarification
|
|
|
|
Ask the user if:
|
|
- They want to modify API authentication method
|
|
- They need to export to a different Outline instance
|
|
- They want to filter by specific criteria not in settings
|
|
- They experience persistent API errors (might be Outline-specific issue)
|
|
- They need to handle very large wikis (>10,000 documents)
|
|
- They want to schedule automated backups (needs cron/systemd setup)
|
|
|
|
### 16. Recommended Improvements (Future)
|
|
|
|
Ideas for enhancing the tool:
|
|
- **Incremental exports**: Only export changed documents
|
|
- **Parallel imports**: Speed up large imports (carefully!)
|
|
- **Format converters**: Export to Notion, Confluence, etc.
|
|
- **Diff tool**: Compare exported versions
|
|
- **Search index**: Build searchable archive
|
|
- **Version history**: Track document changes over time
|
|
|
|
---
|
|
|
|
## Quick Decision Tree
|
|
|
|
```
|
|
User wants to modify the tool:
|
|
├─ Change export filtering? → Edit outline_export_fixed.py
|
|
├─ Change import behavior? → Edit outline_import.py
|
|
├─ Add CLI option? → Edit .sh wrapper + .py script
|
|
├─ Change output format? → Edit write_document_to_file()
|
|
├─ Fix API error? → Check retry logic and error handling
|
|
└─ Add new feature? → Review both export and import sides
|
|
|
|
User reports an error:
|
|
├─ Connection refused? → Check Docker network
|
|
├─ Auth error? → Verify API token in settings.json
|
|
├─ Hierarchy wrong? → Check id_mapping in import
|
|
├─ Missing documents? → Compare counts, check filters
|
|
└─ JSON error? → Validate metadata files
|
|
|
|
User wants to understand:
|
|
├─ How it works? → Refer to CLAUDE.md
|
|
├─ How to use? → Show CLI examples
|
|
├─ How to extend? → Point to sections 9-10 above
|
|
└─ How to troubleshoot? → Use section 8 checklist
|
|
```
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- **Outline API Docs**: https://www.getoutline.com/developers
|
|
- **Python requests**: https://requests.readthedocs.io/
|
|
- **Docker networks**: https://docs.docker.com/network/
|
|
- **tqdm progress bars**: https://tqdm.github.io/
|
|
|
|
## Agent Self-Check
|
|
|
|
Before suggesting changes:
|
|
- [ ] Have I read the architecture section?
|
|
- [ ] Do I understand the Docker network requirement?
|
|
- [ ] Have I considered both export and import sides?
|
|
- [ ] Will my change maintain hierarchy integrity?
|
|
- [ ] Have I suggested testing with --dry-run?
|
|
- [ ] Have I checked for security implications?
|
|
- [ ] Is my suggestion compatible with Docker execution?
|