# AGENTS.md This file provides guidance to AI agents (Claude, GPT-4, etc.) when working with this Outline export/import tool repository. > **For all technical details, architecture info, common pitfalls, and code patterns, see this file.** ## Quick Reference **Primary Language**: Python 3.11 + Bash **Key Dependencies**: `requests`, `tqdm` **Runtime**: Docker containers on `domnet` network **API Base**: `http://outline:3000` (internal, bypasses SSO) **Key Features**: - Export all collections with full document hierarchy - Import back to Outline preserving structure - Automatic backups with 90%+ compression - Dry-run mode for safe testing - Retry logic for API reliability ## Usage ### Export (Backup) ```bash # Run the export with tree visualization ./export_with_trees.sh # Preview without exporting (dry run) ./export_with_trees.sh --dry-run # Run with verbose output ./export_with_trees.sh -v ``` **Export CLI Options**: ``` --dry-run, -n Preview what would be exported without writing files --output, -o DIR Output directory (overrides settings.json) --verbose, -v Increase verbosity (-vv for debug) --skip-verify Skip post-export verification --skip-health-check Skip pre-export health check --settings FILE Path to settings file (default: settings.json) ``` ### Import (Restore) ```bash # Import all collections from outline_export/ ./import_to_outline.sh # Preview what would be imported (no changes made) ./import_to_outline.sh --dry-run # Import into a single timestamped collection ./import_to_outline.sh --single # Import from a different directory ./import_to_outline.sh -d exports/ # Overwrite existing collections ./import_to_outline.sh --force ``` **Import CLI Options**: ``` -s, --single Import all into single timestamped collection -n, --dry-run Preview operations without making changes -d, --source DIR Source directory (default: outline_export) -v, --verbose Increase verbosity (-vv for debug) -f, --force Overwrite existing collections (instead of skip) --settings FILE Path to settings file (default: settings.json) -h, --help Show help message ``` ### Running Python Scripts Directly If you need to run the Python scripts directly (e.g., for debugging): ```bash # Export docker run --rm --network domnet \ -v "$(pwd):/work" \ -w /work \ python:3.11-slim \ bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run" # Import docker run --rm --network domnet \ -v "$(pwd):/work" \ -w /work \ python:3.11-slim \ bash -c "pip install -q requests tqdm && python3 outline_import.py --dry-run" ``` **Note**: The shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`) provide better UX with tree visualization and colored output. ## Agent Operating Guidelines ### 1. Configuration Settings are in `settings.json`: ```json { "source": { "url": "http://outline:3000", "token": "your-api-token-here" }, "export": { "output_directory": "outline_export" }, "advanced": { "max_hierarchy_depth": 100 } } ``` **Important**: `settings.json` contains secrets (API token) and should **never be committed to git**. ### 2. Architecture Understanding This tool operates in a **Docker-isolated environment** to bypass Authentik SSO: - All Python scripts run inside ephemeral Docker containers - Network: `domnet` bridge allows direct access to Outline's internal API - No persistent container state - dependencies installed on each run **Critical Context**: - The `http://outline:3000` URL only works inside the Docker network - External access would require SSO authentication through Authentik - This design is intentional for automated backup/restore operations #### Export Flow 1. **Health Check**: Verify API connectivity 2. **Fetch Collections**: Via `/api/collections.list` 3. **Build Tree**: Get navigation tree via `/api/collections.documents` (source of truth for hierarchy) 4. **Fetch Content**: Full document content via `/api/documents.info` (with caching) 5. **Export Recursively**: Maintain parent-child structure 6. **Save Metadata**: `_collection_metadata.json` per collection 7. **Create Backup**: Archive previous export to `outline_backup_*.tar.gz` 8. **Verify**: Generate manifest with checksums #### Import Flow 1. **Health Check**: Verify API connectivity 2. **Load Metadata**: Read `_collection_metadata.json` from each collection directory 3. **Build Tree**: Reconstruct document hierarchy from metadata 4. **Create Collections**: Via `/api/collections.create` 5. **Create Documents**: Via `/api/documents.create` with proper `parentDocumentId` 6. **Map IDs**: Track old IDs → new IDs to maintain hierarchy 7. **Display Progress**: Tree-style output with status indicators #### Core Components Pipelines **Export Pipeline:** ``` export_with_trees.sh → Docker container → outline_export_fixed.py ↓ Fetches collections → Builds document tree → Exports markdown + metadata ↓ Creates backup → Verifies integrity → Displays summary ``` **Import Pipeline:** ``` import_to_outline.sh → Docker container → outline_import.py ↓ Reads metadata → Validates structure → Creates collections ↓ Uploads documents → Maintains hierarchy → Reports status ``` ### 3. Import Modes Each subdirectory becomes a separate collection: ``` outline_export/ ├── Bewerbungen/ → Creates "Bewerbungen" collection ├── Projekte/ → Creates "Projekte" collection └── Privat/ → Creates "Privat" collection ``` #### Single Collection (`--single`) All content goes into one timestamped collection: ``` outline_export/ ├── Bewerbungen/ → Becomes parent doc "Bewerbungen" ├── Projekte/ → Becomes parent doc "Projekte" └── Privat/ → Becomes parent doc "Privat" All imported into: "import_20260119_143052" collection ``` ### 4. Behavior & Duplicate Handling #### Duplicate Handling | Scenario | Default Behavior | With `--force` | |----------|------------------|----------------| | Collection exists | Skip entire collection | Delete and recreate | | Document exists | Skip document | Update document | #### Error Handling **Import Errors**: - **API connection failure**: Abort with error message - **Collection creation fails**: Abort that collection, continue others - **Document creation fails**: Log error, continue with siblings - **Missing markdown file**: Log warning, skip document - **Parent not found**: Create as root-level document **Export Errors**: - **API connection failure**: Abort before starting - **Collection fetch fails**: Skip that collection, continue - **Document fetch fails**: Retry 3x with backoff, then skip - **Disk write fails**: Abort with error message #### Rate Limiting If Outline API returns 429 errors: - Automatic retry with exponential backoff - Up to 3 retry attempts per request - Configurable delay between retries #### Important Features & Behaviors **Backup System**: - Each export automatically backs up previous exports to `outline_backup_YYYYMMDD_HHMMSS.tar.gz` - Old uncompressed export directory is deleted after backup - Backups achieve **90%+ compression** on markdown content - Safe to re-run exports - previous data is always preserved **Reliability Features**: - **Health check**: Verifies API connectivity before operations - **Retry logic**: Failed API requests retry up to 3 times with exponential backoff - **Caching**: Document content cached during single run to reduce API calls - **Logging**: Structured logging with configurable verbosity levels (-v, -vv) **Hierarchy Integrity**: - The navigation tree (`/api/collections.documents`) is the **source of truth** for document hierarchy - Import maintains parent-child relationships via `parentDocumentId` mapping - Document counting is recursive to include all nested children - Maximum depth limit (default: 100) prevents infinite recursion ### 5. File Structure Knowledge ``` outline-tools/ ├── export_with_trees.sh # Main export entrypoint #### Dry Run Testing ```bash # Test export without writing files ./export_with_trees.sh --dry-run # Test import without creating collections ./import_to_outline.sh --dry-run ``` #### Verification Checklist - [ ] Health check passes before export/import - [ ] Document count matches (compare tree output) - [ ] Hierarchy preserved (check parent-child relationships) - [ ] Metadata files valid JSON - [ ] No API errors in logs - [ ] Backup created successfully (export only) ### 8. Troubleshooting & Debug Mode #### Common Issues **"Connection refused" or "Name resolution failed"** - **Cause**: Not running inside `domnet` Docker network - **Solution**: Always use shell wrappers (`export_with_trees.sh`, `import_to_outline.sh`) **"Authorization failed" or 401/403 errors** - **Cause**: Invalid or expired API token - **Solution**: Update token in `settings.json` **Documents appear at wrong hierarchy level after import** - **Cause**: Metadata corruption or `parentDocumentId` mapping issue - **Solution**: Re-export, verify `_collection_metadata.json` integrity, check `id_mapping` dictionary **Import creates duplicate collections** - **Cause**: Collection names differ (case, spaces, special chars) - **Solution**: Use `--force` to replace, or manually delete old collections **API returns 429 errors** - **Cause**: Rate limiting from too many API requests - **Solution**: Built-in retry logic handles this - increase `RETRY_DELAY` if persistent #### Debug Mode Run with `-vv` for detailed debug output: ```bash ./export_with_trees.sh -vv ./import_to_outline.sh -vv ``` This shows: - Full API requests and responses - Document ID mappings - File operations - Retry attempts #### Quick Diagnostics ```bash # Test API connectivity curl -H "Authorization: Bearer $TOKEN" http://outline:3000/api/collections.list # Check Docker network docker network inspect domnet # Run with verbose logging ./export_with_trees.sh -vv ``` ### 9. Extending the Tool #### Adding New CLI Options **Bash wrapper** (`export_with_trees.sh`): ```bash # Add option parsing while [[ $# -gt 0 ]]; do case $1 in --my-option) MY_OPTION="$2" shift 2 ;; ``` **Python script** (`outline_export_fixed.py`): ```python # Add argument parser parser.add_argument('--my-option', help='Description') # Pass to Docker docker_cmd="... python3 outline_export_fixed.py $@" ``` #### Adding New Export Formats 1. Create format converter function in `outline_export_fixed.py` 2. Add format option to CLI 3. Modify `write_document_to_file()` to call converter 4. Update metadata to track format #### Custom Filtering Add filter configuration to `settings.json`: ```json { "export": { "filters": { "exclude_tags": ["draft", "private"], "include_collections": ["Public", "Docs"] } } } ``` Then implement in `OutlineExporter.should_export_document()`. ### 10. Error Recovery #### Partial Export Recovery If export crashes mid-run: 1. Previous export is already backed up (if existed) 2. Partial export in `outline_export/` may be incomplete 3. Safe to re-run - will overwrite partial data 4. Check `manifest.json` to see what completed #### Failed Import Recovery If import fails partway: 1. Successfully created collections remain in Outline 2. Use `--force` to delete and retry, OR 3. Manually delete collections from Outline UI 4. Check logs for document ID where failure occurred ### 11. Performance Optimization #### Reducing API Calls - **Caching**: Document content cached during single run - **Batching**: Not currently implemented (future enhancement) - **Parallelization**: Not safe due to Outline API rate limits #### Faster Exports - Skip verification: `--skip-verify` - Skip health check: `--skip-health-check` (risky) - Reduce hierarchy depth: Adjust `max_hierarchy_depth` in settings #### Faster Imports - Single collection mode: `--single` (fewer collection creates) - Disable verbose logging (default) ### 12. Security Considerations #### Secrets Management - `settings.json` contains API token - **Never log** the token value - **Never commit** `settings.json` to git - Backups may contain sensitive content #### Safe Practices ```bash # Check git status before committing git status # Verify settings.json is ignored grep settings.json .gitignore # Sanitize logs before sharing sed 's/Bearer [A-Za-z0-9_-]*/Bearer [REDACTED]/g' logs.txt ``` ### 13. Common Agent Mistakes to Avoid 1. **Don't suggest running Python directly** - Always use Docker wrappers 2. **Don't hardcode the API URL** - It's environment-specific (use settings.json) 3. **Don't assume external API access** - Only works inside `domnet` 4. **Don't ignore dry-run mode** - Always test changes with `--dry-run` first 5. **Don't modify hierarchy logic lightly** - Parent-child relationships are fragile 6. **Don't skip error handling** - API can fail intermittently 7. **Don't forget to update both export and import** - Changes often affect both sides ### 14. Useful Code Patterns #### Making Authenticated API Calls ```python headers = { "Authorization": f"Bearer {self.api_token}", "Content-Type": "application/json" } response = requests.post( f"{self.api_url}/api/endpoint", json=payload, headers=headers, timeout=30 ) response.raise_for_status() data = response.json() ``` #### Recursive Tree Traversal ```python def process_tree(node, parent_id=None): doc_id = node["id"] process_document(doc_id, parent_id) for child in node.get("children", []): process_tree(child, doc_id) ``` #### Progress Display with tqdm ```python from tqdm import tqdm with tqdm(total=total_docs, desc="Exporting") as pbar: for doc in documents: process(doc) pbar.update(1) ``` ### 15. When to Ask for Clarification Ask the user if: - They want to modify API authentication method - They need to export to a different Outline instance - They want to filter by specific criteria not in settings - They experience persistent API errors (might be Outline-specific issue) - They need to handle very large wikis (>10,000 documents) - They want to schedule automated backups (needs cron/systemd setup) ### 16. Recommended Improvements (Future) Ideas for enhancing the tool: - **Incremental exports**: Only export changed documents - **Parallel imports**: Speed up large imports (carefully!) - **Format converters**: Export to Notion, Confluence, etc. - **Diff tool**: Compare exported versions - **Search index**: Build searchable archive - **Version history**: Track document changes over time --- ## Quick Decision Tree ``` User wants to modify the tool: ├─ Change export filtering? → Edit outline_export_fixed.py ├─ Change import behavior? → Edit outline_import.py ├─ Add CLI option? → Edit .sh wrapper + .py script ├─ Change output format? → Edit write_document_to_file() ├─ Fix API error? → Check retry logic and error handling └─ Add new feature? → Review both export and import sides User reports an error: ├─ Connection refused? → Check Docker network ├─ Auth error? → Verify API token in settings.json ├─ Hierarchy wrong? → Check id_mapping in import ├─ Missing documents? → Compare counts, check filters └─ JSON error? → Validate metadata files User wants to understand: ├─ How it works? → Refer to CLAUDE.md ├─ How to use? → Show CLI examples ├─ How to extend? → Point to sections 9-10 above └─ How to troubleshoot? → Use section 8 checklist ``` --- ## Additional Resources - **Outline API Docs**: https://www.getoutline.com/developers - **Python requests**: https://requests.readthedocs.io/ - **Docker networks**: https://docs.docker.com/network/ - **tqdm progress bars**: https://tqdm.github.io/ ## Agent Self-Check Before suggesting changes: - [ ] Have I read the architecture section? - [ ] Do I understand the Docker network requirement? - [ ] Have I considered both export and import sides? - [ ] Will my change maintain hierarchy integrity? - [ ] Have I suggested testing with --dry-run? - [ ] Have I checked for security implications? - [ ] Is my suggestion compatible with Docker execution?