Files
outline-sync/AGENTS.md
2026-03-07 20:54:59 +01:00

16 KiB

AGENTS.md

This file provides guidance to AI agents (Claude, GPT-4, etc.) when working with this Outline export/import tool repository.

For all technical details, architecture info, common pitfalls, and code patterns, see this file.

Quick Reference

Primary Language: Python 3.11 + Bash
Key Dependencies: requests, tqdm
Runtime: Docker containers on domnet network
API Base: http://outline:3000 (internal, bypasses SSO)

Key Features:

  • Export all collections with full document hierarchy
  • Import back to Outline preserving structure
  • Automatic backups with 90%+ compression
  • Dry-run mode for safe testing
  • Retry logic for API reliability

Usage

Export (Backup)

# Run the export with tree visualization
./export_with_trees.sh

# Preview without exporting (dry run)
./export_with_trees.sh --dry-run

# Run with verbose output
./export_with_trees.sh -v

Export CLI Options:

--dry-run, -n       Preview what would be exported without writing files
--output, -o DIR    Output directory (overrides settings.json)
--verbose, -v       Increase verbosity (-vv for debug)
--skip-verify       Skip post-export verification
--skip-health-check Skip pre-export health check
--settings FILE     Path to settings file (default: settings.json)

Import (Restore)

# Import all collections from outline_export/
./import_to_outline.sh

# Preview what would be imported (no changes made)
./import_to_outline.sh --dry-run

# Import into a single timestamped collection
./import_to_outline.sh --single

# Import from a different directory
./import_to_outline.sh -d exports/

# Overwrite existing collections
./import_to_outline.sh --force

Import CLI Options:

-s, --single        Import all into single timestamped collection
-n, --dry-run       Preview operations without making changes
-d, --source DIR    Source directory (default: outline_export)
-v, --verbose       Increase verbosity (-vv for debug)
-f, --force         Overwrite existing collections (instead of skip)
--settings FILE     Path to settings file (default: settings.json)
-h, --help          Show help message

Running Python Scripts Directly

If you need to run the Python scripts directly (e.g., for debugging):

# Export
docker run --rm --network domnet \
    -v "$(pwd):/work" \
    -w /work \
    python:3.11-slim \
    bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run"

# Import
docker run --rm --network domnet \
    -v "$(pwd):/work" \
    -w /work \
    python:3.11-slim \
    bash -c "pip install -q requests tqdm && python3 outline_import.py --dry-run"

Note: The shell wrappers (export_with_trees.sh, import_to_outline.sh) provide better UX with tree visualization and colored output.

Agent Operating Guidelines

1. Configuration

Settings are in settings.json:

{
  "source": {
    "url": "http://outline:3000",
    "token": "your-api-token-here"
  },
  "export": {
    "output_directory": "outline_export"
  },
  "advanced": {
    "max_hierarchy_depth": 100
  }
}

Important: settings.json contains secrets (API token) and should never be committed to git.

2. Architecture Understanding

This tool operates in a Docker-isolated environment to bypass Authentik SSO:

  • All Python scripts run inside ephemeral Docker containers
  • Network: domnet bridge allows direct access to Outline's internal API
  • No persistent container state - dependencies installed on each run

Critical Context:

  • The http://outline:3000 URL only works inside the Docker network
  • External access would require SSO authentication through Authentik
  • This design is intentional for automated backup/restore operations

Export Flow

  1. Health Check: Verify API connectivity
  2. Fetch Collections: Via /api/collections.list
  3. Build Tree: Get navigation tree via /api/collections.documents (source of truth for hierarchy)
  4. Fetch Content: Full document content via /api/documents.info (with caching)
  5. Export Recursively: Maintain parent-child structure
  6. Save Metadata: _collection_metadata.json per collection
  7. Create Backup: Archive previous export to outline_backup_*.tar.gz
  8. Verify: Generate manifest with checksums

Import Flow

  1. Health Check: Verify API connectivity
  2. Load Metadata: Read _collection_metadata.json from each collection directory
  3. Build Tree: Reconstruct document hierarchy from metadata
  4. Create Collections: Via /api/collections.create
  5. Create Documents: Via /api/documents.create with proper parentDocumentId
  6. Map IDs: Track old IDs → new IDs to maintain hierarchy
  7. Display Progress: Tree-style output with status indicators

Core Components Pipelines

Export Pipeline:

export_with_trees.sh → Docker container → outline_export_fixed.py
    ↓
Fetches collections → Builds document tree → Exports markdown + metadata
    ↓
Creates backup → Verifies integrity → Displays summary

Import Pipeline:

import_to_outline.sh → Docker container → outline_import.py
    ↓
Reads metadata → Validates structure → Creates collections
    ↓
Uploads documents → Maintains hierarchy → Reports status

3. Import Modes

Each subdirectory becomes a separate collection:

outline_export/
├── Bewerbungen/     → Creates "Bewerbungen" collection
├── Projekte/        → Creates "Projekte" collection
└── Privat/          → Creates "Privat" collection

Single Collection (--single)

All content goes into one timestamped collection:

outline_export/
├── Bewerbungen/     → Becomes parent doc "Bewerbungen"
├── Projekte/        → Becomes parent doc "Projekte"
└── Privat/          → Becomes parent doc "Privat"

All imported into: "import_20260119_143052" collection

4. Behavior & Duplicate Handling

Duplicate Handling

Scenario Default Behavior With --force
Collection exists Skip entire collection Delete and recreate
Document exists Skip document Update document

Error Handling

Import Errors:

  • API connection failure: Abort with error message
  • Collection creation fails: Abort that collection, continue others
  • Document creation fails: Log error, continue with siblings
  • Missing markdown file: Log warning, skip document
  • Parent not found: Create as root-level document

Export Errors:

  • API connection failure: Abort before starting
  • Collection fetch fails: Skip that collection, continue
  • Document fetch fails: Retry 3x with backoff, then skip
  • Disk write fails: Abort with error message

Rate Limiting

If Outline API returns 429 errors:

  • Automatic retry with exponential backoff
  • Up to 3 retry attempts per request
  • Configurable delay between retries

Important Features & Behaviors

Backup System:

  • Each export automatically backs up previous exports to outline_backup_YYYYMMDD_HHMMSS.tar.gz
  • Old uncompressed export directory is deleted after backup
  • Backups achieve 90%+ compression on markdown content
  • Safe to re-run exports - previous data is always preserved

Reliability Features:

  • Health check: Verifies API connectivity before operations
  • Retry logic: Failed API requests retry up to 3 times with exponential backoff
  • Caching: Document content cached during single run to reduce API calls
  • Logging: Structured logging with configurable verbosity levels (-v, -vv)

Hierarchy Integrity:

  • The navigation tree (/api/collections.documents) is the source of truth for document hierarchy
  • Import maintains parent-child relationships via parentDocumentId mapping
  • Document counting is recursive to include all nested children
  • Maximum depth limit (default: 100) prevents infinite recursion

5. File Structure Knowledge

outline-tools/
├── export_with_trees.sh          # Main export entrypoint

#### Dry Run Testing
```bash
# Test export without writing files
./export_with_trees.sh --dry-run

# Test import without creating collections
./import_to_outline.sh --dry-run

Verification Checklist

  • Health check passes before export/import
  • Document count matches (compare tree output)
  • Hierarchy preserved (check parent-child relationships)
  • Metadata files valid JSON
  • No API errors in logs
  • Backup created successfully (export only)

8. Troubleshooting & Debug Mode

Common Issues

"Connection refused" or "Name resolution failed"

  • Cause: Not running inside domnet Docker network
  • Solution: Always use shell wrappers (export_with_trees.sh, import_to_outline.sh)

"Authorization failed" or 401/403 errors

  • Cause: Invalid or expired API token
  • Solution: Update token in settings.json

Documents appear at wrong hierarchy level after import

  • Cause: Metadata corruption or parentDocumentId mapping issue
  • Solution: Re-export, verify _collection_metadata.json integrity, check id_mapping dictionary

Import creates duplicate collections

  • Cause: Collection names differ (case, spaces, special chars)
  • Solution: Use --force to replace, or manually delete old collections

API returns 429 errors

  • Cause: Rate limiting from too many API requests
  • Solution: Built-in retry logic handles this - increase RETRY_DELAY if persistent

Debug Mode

Run with -vv for detailed debug output:

./export_with_trees.sh -vv
./import_to_outline.sh -vv

This shows:

  • Full API requests and responses
  • Document ID mappings
  • File operations
  • Retry attempts

Quick Diagnostics

# Test API connectivity
curl -H "Authorization: Bearer $TOKEN" http://outline:3000/api/collections.list

# Check Docker network
docker network inspect domnet

# Run with verbose logging
./export_with_trees.sh -vv

9. Extending the Tool

Adding New CLI Options

Bash wrapper (export_with_trees.sh):

# Add option parsing
while [[ $# -gt 0 ]]; do
    case $1 in
        --my-option)
            MY_OPTION="$2"
            shift 2
            ;;

Python script (outline_export_fixed.py):

# Add argument parser
parser.add_argument('--my-option', help='Description')

# Pass to Docker
docker_cmd="... python3 outline_export_fixed.py $@"

Adding New Export Formats

  1. Create format converter function in outline_export_fixed.py
  2. Add format option to CLI
  3. Modify write_document_to_file() to call converter
  4. Update metadata to track format

Custom Filtering

Add filter configuration to settings.json:

{
  "export": {
    "filters": {
      "exclude_tags": ["draft", "private"],
      "include_collections": ["Public", "Docs"]
    }
  }
}

Then implement in OutlineExporter.should_export_document().

10. Error Recovery

Partial Export Recovery

If export crashes mid-run:

  1. Previous export is already backed up (if existed)
  2. Partial export in outline_export/ may be incomplete
  3. Safe to re-run - will overwrite partial data
  4. Check manifest.json to see what completed

Failed Import Recovery

If import fails partway:

  1. Successfully created collections remain in Outline
  2. Use --force to delete and retry, OR
  3. Manually delete collections from Outline UI
  4. Check logs for document ID where failure occurred

11. Performance Optimization

Reducing API Calls

  • Caching: Document content cached during single run
  • Batching: Not currently implemented (future enhancement)
  • Parallelization: Not safe due to Outline API rate limits

Faster Exports

  • Skip verification: --skip-verify
  • Skip health check: --skip-health-check (risky)
  • Reduce hierarchy depth: Adjust max_hierarchy_depth in settings

Faster Imports

  • Single collection mode: --single (fewer collection creates)
  • Disable verbose logging (default)

12. Security Considerations

Secrets Management

  • settings.json contains API token
  • Never log the token value
  • Never commit settings.json to git
  • Backups may contain sensitive content

Safe Practices

# Check git status before committing
git status

# Verify settings.json is ignored
grep settings.json .gitignore

# Sanitize logs before sharing
sed 's/Bearer [A-Za-z0-9_-]*/Bearer [REDACTED]/g' logs.txt

13. Common Agent Mistakes to Avoid

  1. Don't suggest running Python directly - Always use Docker wrappers
  2. Don't hardcode the API URL - It's environment-specific (use settings.json)
  3. Don't assume external API access - Only works inside domnet
  4. Don't ignore dry-run mode - Always test changes with --dry-run first
  5. Don't modify hierarchy logic lightly - Parent-child relationships are fragile
  6. Don't skip error handling - API can fail intermittently
  7. Don't forget to update both export and import - Changes often affect both sides

14. Useful Code Patterns

Making Authenticated API Calls

headers = {
    "Authorization": f"Bearer {self.api_token}",
    "Content-Type": "application/json"
}
response = requests.post(
    f"{self.api_url}/api/endpoint",
    json=payload,
    headers=headers,
    timeout=30
)
response.raise_for_status()
data = response.json()

Recursive Tree Traversal

def process_tree(node, parent_id=None):
    doc_id = node["id"]
    process_document(doc_id, parent_id)
    
    for child in node.get("children", []):
        process_tree(child, doc_id)

Progress Display with tqdm

from tqdm import tqdm

with tqdm(total=total_docs, desc="Exporting") as pbar:
    for doc in documents:
        process(doc)
        pbar.update(1)

15. When to Ask for Clarification

Ask the user if:

  • They want to modify API authentication method
  • They need to export to a different Outline instance
  • They want to filter by specific criteria not in settings
  • They experience persistent API errors (might be Outline-specific issue)
  • They need to handle very large wikis (>10,000 documents)
  • They want to schedule automated backups (needs cron/systemd setup)

Ideas for enhancing the tool:

  • Incremental exports: Only export changed documents
  • Parallel imports: Speed up large imports (carefully!)
  • Format converters: Export to Notion, Confluence, etc.
  • Diff tool: Compare exported versions
  • Search index: Build searchable archive
  • Version history: Track document changes over time

Quick Decision Tree

User wants to modify the tool:
├─ Change export filtering? → Edit outline_export_fixed.py
├─ Change import behavior? → Edit outline_import.py
├─ Add CLI option? → Edit .sh wrapper + .py script
├─ Change output format? → Edit write_document_to_file()
├─ Fix API error? → Check retry logic and error handling
└─ Add new feature? → Review both export and import sides

User reports an error:
├─ Connection refused? → Check Docker network
├─ Auth error? → Verify API token in settings.json
├─ Hierarchy wrong? → Check id_mapping in import
├─ Missing documents? → Compare counts, check filters
└─ JSON error? → Validate metadata files

User wants to understand:
├─ How it works? → Refer to CLAUDE.md
├─ How to use? → Show CLI examples
├─ How to extend? → Point to sections 9-10 above
└─ How to troubleshoot? → Use section 8 checklist

Additional Resources

Agent Self-Check

Before suggesting changes:

  • Have I read the architecture section?
  • Do I understand the Docker network requirement?
  • Have I considered both export and import sides?
  • Will my change maintain hierarchy integrity?
  • Have I suggested testing with --dry-run?
  • Have I checked for security implications?
  • Is my suggestion compatible with Docker execution?