Add sync engine, web UI, Docker setup, and tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
536
AGENTS.md
Normal file
536
AGENTS.md
Normal file
@@ -0,0 +1,536 @@
|
||||
# AGENTS.md
|
||||
|
||||
This file provides guidance to AI agents (Claude, GPT-4, etc.) when working with this Outline export/import tool repository.
|
||||
|
||||
> **For all technical details, architecture info, common pitfalls, and code patterns, see this file.**
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Primary Language**: Python 3.11 + Bash
|
||||
**Key Dependencies**: `requests`, `tqdm`
|
||||
**Runtime**: Docker containers on `domnet` network
|
||||
**API Base**: `http://outline:3000` (internal, bypasses SSO)
|
||||
|
||||
**Key Features**:
|
||||
- Export all collections with full document hierarchy
|
||||
- Import back to Outline preserving structure
|
||||
- Automatic backups with 90%+ compression
|
||||
- Dry-run mode for safe testing
|
||||
- Retry logic for API reliability
|
||||
|
||||
## Usage
|
||||
|
||||
### Export (Backup)
|
||||
|
||||
```bash
|
||||
# Run the export with tree visualization
|
||||
./export_with_trees.sh
|
||||
|
||||
# Preview without exporting (dry run)
|
||||
./export_with_trees.sh --dry-run
|
||||
|
||||
# Run with verbose output
|
||||
./export_with_trees.sh -v
|
||||
```
|
||||
|
||||
**Export CLI Options**:
|
||||
```
|
||||
--dry-run, -n Preview what would be exported without writing files
|
||||
--output, -o DIR Output directory (overrides settings.json)
|
||||
--verbose, -v Increase verbosity (-vv for debug)
|
||||
--skip-verify Skip post-export verification
|
||||
--skip-health-check Skip pre-export health check
|
||||
--settings FILE Path to settings file (default: settings.json)
|
||||
```
|
||||
|
||||
### Import (Restore)
|
||||
|
||||
```bash
|
||||
# Import all collections from outline_export/
|
||||
./import_to_outline.sh
|
||||
|
||||
# Preview what would be imported (no changes made)
|
||||
./import_to_outline.sh --dry-run
|
||||
|
||||
# Import into a single timestamped collection
|
||||
./import_to_outline.sh --single
|
||||
|
||||
# Import from a different directory
|
||||
./import_to_outline.sh -d exports/
|
||||
|
||||
# Overwrite existing collections
|
||||
./import_to_outline.sh --force
|
||||
```
|
||||
|
||||
**Import CLI Options**:
|
||||
```
|
||||
-s, --single Import all into single timestamped collection
|
||||
-n, --dry-run Preview operations without making changes
|
||||
-d, --source DIR Source directory (default: outline_export)
|
||||
-v, --verbose Increase verbosity (-vv for debug)
|
||||
-f, --force Overwrite existing collections (instead of skip)
|
||||
--settings FILE Path to settings file (default: settings.json)
|
||||
-h, --help Show help message
|
||||
```
|
||||
|
||||
### Running Python Scripts Directly
|
||||
|
||||
If you need to run the Python scripts directly (e.g., for debugging):
|
||||
|
||||
```bash
|
||||
# Export
|
||||
docker run --rm --network domnet \
|
||||
-v "$(pwd):/work" \
|
||||
-w /work \
|
||||
python:3.11-slim \
|
||||
bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run"
|
||||
|
||||
# Import
|
||||
docker run --rm --network domnet \
|
||||
-v "$(pwd):/work" \
|
||||
-w /work \
|
||||
python:3.11-slim \
|
||||
bash -c "pip install -q requests tqdm && python3 outline_import.py --dry-run"
|
||||
```
|
||||
|
||||
**Note**: The shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`) provide better UX with tree visualization and colored output.
|
||||
|
||||
## Agent Operating Guidelines
|
||||
|
||||
### 1. Configuration
|
||||
|
||||
Settings are in `settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": {
|
||||
"url": "http://outline:3000",
|
||||
"token": "your-api-token-here"
|
||||
},
|
||||
"export": {
|
||||
"output_directory": "outline_export"
|
||||
},
|
||||
"advanced": {
|
||||
"max_hierarchy_depth": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Important**: `settings.json` contains secrets (API token) and should **never be committed to git**.
|
||||
|
||||
### 2. Architecture Understanding
|
||||
|
||||
This tool operates in a **Docker-isolated environment** to bypass Authentik SSO:
|
||||
- All Python scripts run inside ephemeral Docker containers
|
||||
- Network: `domnet` bridge allows direct access to Outline's internal API
|
||||
- No persistent container state - dependencies installed on each run
|
||||
|
||||
**Critical Context**:
|
||||
- The `http://outline:3000` URL only works inside the Docker network
|
||||
- External access would require SSO authentication through Authentik
|
||||
- This design is intentional for automated backup/restore operations
|
||||
|
||||
#### Export Flow
|
||||
|
||||
1. **Health Check**: Verify API connectivity
|
||||
2. **Fetch Collections**: Via `/api/collections.list`
|
||||
3. **Build Tree**: Get navigation tree via `/api/collections.documents` (source of truth for hierarchy)
|
||||
4. **Fetch Content**: Full document content via `/api/documents.info` (with caching)
|
||||
5. **Export Recursively**: Maintain parent-child structure
|
||||
6. **Save Metadata**: `_collection_metadata.json` per collection
|
||||
7. **Create Backup**: Archive previous export to `outline_backup_*.tar.gz`
|
||||
8. **Verify**: Generate manifest with checksums
|
||||
|
||||
#### Import Flow
|
||||
|
||||
1. **Health Check**: Verify API connectivity
|
||||
2. **Load Metadata**: Read `_collection_metadata.json` from each collection directory
|
||||
3. **Build Tree**: Reconstruct document hierarchy from metadata
|
||||
4. **Create Collections**: Via `/api/collections.create`
|
||||
5. **Create Documents**: Via `/api/documents.create` with proper `parentDocumentId`
|
||||
6. **Map IDs**: Track old IDs → new IDs to maintain hierarchy
|
||||
7. **Display Progress**: Tree-style output with status indicators
|
||||
|
||||
#### Core Components Pipelines
|
||||
|
||||
**Export Pipeline:**
|
||||
```
|
||||
export_with_trees.sh → Docker container → outline_export_fixed.py
|
||||
↓
|
||||
Fetches collections → Builds document tree → Exports markdown + metadata
|
||||
↓
|
||||
Creates backup → Verifies integrity → Displays summary
|
||||
```
|
||||
|
||||
**Import Pipeline:**
|
||||
```
|
||||
import_to_outline.sh → Docker container → outline_import.py
|
||||
↓
|
||||
Reads metadata → Validates structure → Creates collections
|
||||
↓
|
||||
Uploads documents → Maintains hierarchy → Reports status
|
||||
```
|
||||
|
||||
### 3. Import Modes
|
||||
|
||||
Each subdirectory becomes a separate collection:
|
||||
|
||||
```
|
||||
outline_export/
|
||||
├── Bewerbungen/ → Creates "Bewerbungen" collection
|
||||
├── Projekte/ → Creates "Projekte" collection
|
||||
└── Privat/ → Creates "Privat" collection
|
||||
```
|
||||
|
||||
#### Single Collection (`--single`)
|
||||
|
||||
All content goes into one timestamped collection:
|
||||
|
||||
```
|
||||
outline_export/
|
||||
├── Bewerbungen/ → Becomes parent doc "Bewerbungen"
|
||||
├── Projekte/ → Becomes parent doc "Projekte"
|
||||
└── Privat/ → Becomes parent doc "Privat"
|
||||
|
||||
All imported into: "import_20260119_143052" collection
|
||||
```
|
||||
|
||||
### 4. Behavior & Duplicate Handling
|
||||
|
||||
#### Duplicate Handling
|
||||
|
||||
| Scenario | Default Behavior | With `--force` |
|
||||
|----------|------------------|----------------|
|
||||
| Collection exists | Skip entire collection | Delete and recreate |
|
||||
| Document exists | Skip document | Update document |
|
||||
|
||||
#### Error Handling
|
||||
|
||||
**Import Errors**:
|
||||
- **API connection failure**: Abort with error message
|
||||
- **Collection creation fails**: Abort that collection, continue others
|
||||
- **Document creation fails**: Log error, continue with siblings
|
||||
- **Missing markdown file**: Log warning, skip document
|
||||
- **Parent not found**: Create as root-level document
|
||||
|
||||
**Export Errors**:
|
||||
- **API connection failure**: Abort before starting
|
||||
- **Collection fetch fails**: Skip that collection, continue
|
||||
- **Document fetch fails**: Retry 3x with backoff, then skip
|
||||
- **Disk write fails**: Abort with error message
|
||||
|
||||
#### Rate Limiting
|
||||
|
||||
If Outline API returns 429 errors:
|
||||
- Automatic retry with exponential backoff
|
||||
- Up to 3 retry attempts per request
|
||||
- Configurable delay between retries
|
||||
|
||||
#### Important Features & Behaviors
|
||||
|
||||
**Backup System**:
|
||||
- Each export automatically backs up previous exports to `outline_backup_YYYYMMDD_HHMMSS.tar.gz`
|
||||
- Old uncompressed export directory is deleted after backup
|
||||
- Backups achieve **90%+ compression** on markdown content
|
||||
- Safe to re-run exports - previous data is always preserved
|
||||
|
||||
**Reliability Features**:
|
||||
- **Health check**: Verifies API connectivity before operations
|
||||
- **Retry logic**: Failed API requests retry up to 3 times with exponential backoff
|
||||
- **Caching**: Document content cached during single run to reduce API calls
|
||||
- **Logging**: Structured logging with configurable verbosity levels (-v, -vv)
|
||||
|
||||
**Hierarchy Integrity**:
|
||||
- The navigation tree (`/api/collections.documents`) is the **source of truth** for document hierarchy
|
||||
- Import maintains parent-child relationships via `parentDocumentId` mapping
|
||||
- Document counting is recursive to include all nested children
|
||||
- Maximum depth limit (default: 100) prevents infinite recursion
|
||||
|
||||
### 5. File Structure Knowledge
|
||||
|
||||
```
|
||||
outline-tools/
|
||||
├── export_with_trees.sh # Main export entrypoint
|
||||
|
||||
#### Dry Run Testing
|
||||
```bash
|
||||
# Test export without writing files
|
||||
./export_with_trees.sh --dry-run
|
||||
|
||||
# Test import without creating collections
|
||||
./import_to_outline.sh --dry-run
|
||||
```
|
||||
|
||||
#### Verification Checklist
|
||||
- [ ] Health check passes before export/import
|
||||
- [ ] Document count matches (compare tree output)
|
||||
- [ ] Hierarchy preserved (check parent-child relationships)
|
||||
- [ ] Metadata files valid JSON
|
||||
- [ ] No API errors in logs
|
||||
- [ ] Backup created successfully (export only)
|
||||
|
||||
### 8. Troubleshooting & Debug Mode
|
||||
|
||||
#### Common Issues
|
||||
|
||||
**"Connection refused" or "Name resolution failed"**
|
||||
- **Cause**: Not running inside `domnet` Docker network
|
||||
- **Solution**: Always use shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`)
|
||||
|
||||
**"Authorization failed" or 401/403 errors**
|
||||
- **Cause**: Invalid or expired API token
|
||||
- **Solution**: Update token in `settings.json`
|
||||
|
||||
**Documents appear at wrong hierarchy level after import**
|
||||
- **Cause**: Metadata corruption or `parentDocumentId` mapping issue
|
||||
- **Solution**: Re-export, verify `_collection_metadata.json` integrity, check `id_mapping` dictionary
|
||||
|
||||
**Import creates duplicate collections**
|
||||
- **Cause**: Collection names differ (case, spaces, special chars)
|
||||
- **Solution**: Use `--force` to replace, or manually delete old collections
|
||||
|
||||
**API returns 429 errors**
|
||||
- **Cause**: Rate limiting from too many API requests
|
||||
- **Solution**: Built-in retry logic handles this - increase `RETRY_DELAY` if persistent
|
||||
|
||||
#### Debug Mode
|
||||
|
||||
Run with `-vv` for detailed debug output:
|
||||
|
||||
```bash
|
||||
./export_with_trees.sh -vv
|
||||
./import_to_outline.sh -vv
|
||||
```
|
||||
|
||||
This shows:
|
||||
- Full API requests and responses
|
||||
- Document ID mappings
|
||||
- File operations
|
||||
- Retry attempts
|
||||
|
||||
#### Quick Diagnostics
|
||||
|
||||
```bash
|
||||
# Test API connectivity
|
||||
curl -H "Authorization: Bearer $TOKEN" http://outline:3000/api/collections.list
|
||||
|
||||
# Check Docker network
|
||||
docker network inspect domnet
|
||||
|
||||
# Run with verbose logging
|
||||
./export_with_trees.sh -vv
|
||||
```
|
||||
|
||||
### 9. Extending the Tool
|
||||
|
||||
#### Adding New CLI Options
|
||||
|
||||
**Bash wrapper** (`export_with_trees.sh`):
|
||||
```bash
|
||||
# Add option parsing
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--my-option)
|
||||
MY_OPTION="$2"
|
||||
shift 2
|
||||
;;
|
||||
```
|
||||
|
||||
**Python script** (`outline_export_fixed.py`):
|
||||
```python
|
||||
# Add argument parser
|
||||
parser.add_argument('--my-option', help='Description')
|
||||
|
||||
# Pass to Docker
|
||||
docker_cmd="... python3 outline_export_fixed.py $@"
|
||||
```
|
||||
|
||||
#### Adding New Export Formats
|
||||
|
||||
1. Create format converter function in `outline_export_fixed.py`
|
||||
2. Add format option to CLI
|
||||
3. Modify `write_document_to_file()` to call converter
|
||||
4. Update metadata to track format
|
||||
|
||||
#### Custom Filtering
|
||||
|
||||
Add filter configuration to `settings.json`:
|
||||
```json
|
||||
{
|
||||
"export": {
|
||||
"filters": {
|
||||
"exclude_tags": ["draft", "private"],
|
||||
"include_collections": ["Public", "Docs"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then implement in `OutlineExporter.should_export_document()`.
|
||||
|
||||
### 10. Error Recovery
|
||||
|
||||
#### Partial Export Recovery
|
||||
If export crashes mid-run:
|
||||
1. Previous export is already backed up (if existed)
|
||||
2. Partial export in `outline_export/` may be incomplete
|
||||
3. Safe to re-run - will overwrite partial data
|
||||
4. Check `manifest.json` to see what completed
|
||||
|
||||
#### Failed Import Recovery
|
||||
If import fails partway:
|
||||
1. Successfully created collections remain in Outline
|
||||
2. Use `--force` to delete and retry, OR
|
||||
3. Manually delete collections from Outline UI
|
||||
4. Check logs for document ID where failure occurred
|
||||
|
||||
### 11. Performance Optimization
|
||||
|
||||
#### Reducing API Calls
|
||||
- **Caching**: Document content cached during single run
|
||||
- **Batching**: Not currently implemented (future enhancement)
|
||||
- **Parallelization**: Not safe due to Outline API rate limits
|
||||
|
||||
#### Faster Exports
|
||||
- Skip verification: `--skip-verify`
|
||||
- Skip health check: `--skip-health-check` (risky)
|
||||
- Reduce hierarchy depth: Adjust `max_hierarchy_depth` in settings
|
||||
|
||||
#### Faster Imports
|
||||
- Single collection mode: `--single` (fewer collection creates)
|
||||
- Disable verbose logging (default)
|
||||
|
||||
### 12. Security Considerations
|
||||
|
||||
#### Secrets Management
|
||||
- `settings.json` contains API token
|
||||
- **Never log** the token value
|
||||
- **Never commit** `settings.json` to git
|
||||
- Backups may contain sensitive content
|
||||
|
||||
#### Safe Practices
|
||||
```bash
|
||||
# Check git status before committing
|
||||
git status
|
||||
|
||||
# Verify settings.json is ignored
|
||||
grep settings.json .gitignore
|
||||
|
||||
# Sanitize logs before sharing
|
||||
sed 's/Bearer [A-Za-z0-9_-]*/Bearer [REDACTED]/g' logs.txt
|
||||
```
|
||||
|
||||
### 13. Common Agent Mistakes to Avoid
|
||||
|
||||
1. **Don't suggest running Python directly** - Always use Docker wrappers
|
||||
2. **Don't hardcode the API URL** - It's environment-specific (use settings.json)
|
||||
3. **Don't assume external API access** - Only works inside `domnet`
|
||||
4. **Don't ignore dry-run mode** - Always test changes with `--dry-run` first
|
||||
5. **Don't modify hierarchy logic lightly** - Parent-child relationships are fragile
|
||||
6. **Don't skip error handling** - API can fail intermittently
|
||||
7. **Don't forget to update both export and import** - Changes often affect both sides
|
||||
|
||||
### 14. Useful Code Patterns
|
||||
|
||||
#### Making Authenticated API Calls
|
||||
```python
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_token}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
response = requests.post(
|
||||
f"{self.api_url}/api/endpoint",
|
||||
json=payload,
|
||||
headers=headers,
|
||||
timeout=30
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
```
|
||||
|
||||
#### Recursive Tree Traversal
|
||||
```python
|
||||
def process_tree(node, parent_id=None):
|
||||
doc_id = node["id"]
|
||||
process_document(doc_id, parent_id)
|
||||
|
||||
for child in node.get("children", []):
|
||||
process_tree(child, doc_id)
|
||||
```
|
||||
|
||||
#### Progress Display with tqdm
|
||||
```python
|
||||
from tqdm import tqdm
|
||||
|
||||
with tqdm(total=total_docs, desc="Exporting") as pbar:
|
||||
for doc in documents:
|
||||
process(doc)
|
||||
pbar.update(1)
|
||||
```
|
||||
|
||||
### 15. When to Ask for Clarification
|
||||
|
||||
Ask the user if:
|
||||
- They want to modify API authentication method
|
||||
- They need to export to a different Outline instance
|
||||
- They want to filter by specific criteria not in settings
|
||||
- They experience persistent API errors (might be Outline-specific issue)
|
||||
- They need to handle very large wikis (>10,000 documents)
|
||||
- They want to schedule automated backups (needs cron/systemd setup)
|
||||
|
||||
### 16. Recommended Improvements (Future)
|
||||
|
||||
Ideas for enhancing the tool:
|
||||
- **Incremental exports**: Only export changed documents
|
||||
- **Parallel imports**: Speed up large imports (carefully!)
|
||||
- **Format converters**: Export to Notion, Confluence, etc.
|
||||
- **Diff tool**: Compare exported versions
|
||||
- **Search index**: Build searchable archive
|
||||
- **Version history**: Track document changes over time
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Tree
|
||||
|
||||
```
|
||||
User wants to modify the tool:
|
||||
├─ Change export filtering? → Edit outline_export_fixed.py
|
||||
├─ Change import behavior? → Edit outline_import.py
|
||||
├─ Add CLI option? → Edit .sh wrapper + .py script
|
||||
├─ Change output format? → Edit write_document_to_file()
|
||||
├─ Fix API error? → Check retry logic and error handling
|
||||
└─ Add new feature? → Review both export and import sides
|
||||
|
||||
User reports an error:
|
||||
├─ Connection refused? → Check Docker network
|
||||
├─ Auth error? → Verify API token in settings.json
|
||||
├─ Hierarchy wrong? → Check id_mapping in import
|
||||
├─ Missing documents? → Compare counts, check filters
|
||||
└─ JSON error? → Validate metadata files
|
||||
|
||||
User wants to understand:
|
||||
├─ How it works? → Refer to CLAUDE.md
|
||||
├─ How to use? → Show CLI examples
|
||||
├─ How to extend? → Point to sections 9-10 above
|
||||
└─ How to troubleshoot? → Use section 8 checklist
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Outline API Docs**: https://www.getoutline.com/developers
|
||||
- **Python requests**: https://requests.readthedocs.io/
|
||||
- **Docker networks**: https://docs.docker.com/network/
|
||||
- **tqdm progress bars**: https://tqdm.github.io/
|
||||
|
||||
## Agent Self-Check
|
||||
|
||||
Before suggesting changes:
|
||||
- [ ] Have I read the architecture section?
|
||||
- [ ] Do I understand the Docker network requirement?
|
||||
- [ ] Have I considered both export and import sides?
|
||||
- [ ] Will my change maintain hierarchy integrity?
|
||||
- [ ] Have I suggested testing with --dry-run?
|
||||
- [ ] Have I checked for security implications?
|
||||
- [ ] Is my suggestion compatible with Docker execution?
|
||||
Reference in New Issue
Block a user