Initial commit: Export tools and import script requirements

- export_with_trees.sh: Bash wrapper for Outline export
- outline_export_fixed.py: Python export implementation
- IMPORT_SCRIPT.MD: PRD for import script (to be built)
- RALPH_PROMPT.md: Ralph Loop prompt for building import script
- CLAUDE.md: Project documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Claude
2026-01-19 22:33:55 +01:00
commit d9161f64f5
7 changed files with 2608 additions and 0 deletions

21
.gitignore vendored Normal file
View File

@@ -0,0 +1,21 @@
# Secrets
settings.json
# Export data (may contain sensitive content)
outline_export/
# Backups
outline_backup_*.tar.gz
# Python
__pycache__/
*.pyc
*.pyo
.pytest_cache/
# Ralph Loop state
.claude/
# IDE
.vscode/
.idea/

94
CLAUDE.md Normal file
View File

@@ -0,0 +1,94 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a tool for exporting Outline wiki data via API. The script runs inside a Docker container on the `domnet` network to bypass Authentik SSO authentication and access the internal Outline API directly (`http://outline:3000`).
## Usage
```bash
# Run the export with tree visualization
./export_with_trees.sh
# Preview without exporting (dry run)
./export_with_trees.sh --dry-run
# Run with verbose output
./export_with_trees.sh -v
```
### CLI Options
```
--dry-run, -n Preview what would be exported without writing files
--output, -o DIR Output directory (overrides settings.json)
--verbose, -v Increase verbosity (-vv for debug)
--skip-verify Skip post-export verification
--skip-health-check Skip pre-export health check
--settings FILE Path to settings file (default: settings.json)
```
### Running the Python Export Directly
```bash
docker run --rm --network domnet \
-v "$(pwd):/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py"
# With options
docker run --rm --network domnet \
-v "$(pwd):/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -q requests tqdm && python3 outline_export_fixed.py --dry-run"
```
## Architecture
### Docker Network Integration
- Script runs in Docker container attached to `domnet` bridge network
- Direct API access to `http://outline:3000` (internal) bypasses SSO
- Uses `python:3.11-slim` image with `requests` and `tqdm` dependencies
### Export Flow
1. Fetch collections via `/api/collections.list`
2. Get navigation tree via `/api/collections.documents` (source of truth for hierarchy)
3. Fetch full document content via `/api/documents.info` (with caching)
4. Export recursively maintaining parent-child structure
5. Save metadata (`_collection_metadata.json`) per collection
6. Generate manifest with checksums for verification
### Key Files
- `export_with_trees.sh` - Main export script with tree visualization
- `outline_export_fixed.py` - Core export logic with `OutlineExporter` class
- `settings.json` - API URL and token configuration (contains secrets)
- `outline_export/` - Output directory with markdown files and metadata
- `outline_backup_*.tar.gz` - Timestamped compressed backups
### Configuration
Settings are in `settings.json`:
- `source.url` - Internal Docker URL (`http://outline:3000`)
- `source.token` - Outline API token
- `export.output_directory` - Output path (default: `outline_export`)
- `advanced.max_hierarchy_depth` - Prevent infinite recursion (default: 100)
## Important Notes
### Security
- `settings.json` contains API token - never commit to git
- Backup files may contain sensitive wiki content
### Backup System
- Each export automatically backs up previous exports to `outline_backup_YYYYMMDD_HHMMSS.tar.gz`
- Old uncompressed export directory is deleted after backup
- Backups achieve 90%+ compression on markdown content
### Reliability Features
- **Health check**: Verifies API connectivity before export
- **Retry logic**: Failed API requests retry up to 3 times with exponential backoff
- **Logging**: Structured logging with configurable verbosity levels
### Document Counting
The navigation tree (`/api/collections.documents`) is the source of truth for document hierarchy. Document counting is recursive to include all nested children.

514
IMPORT_SCRIPT.MD Normal file
View File

@@ -0,0 +1,514 @@
# Outline Import Script - Product Requirements Document
**Document Version:** 1.0
**Created:** 2026-01-17
**Last Updated:** 2026-01-19
**Status:** Draft
---
## 1. Executive Summary
Create `import_to_outline.sh` - a companion script to the existing export tool that imports markdown files back into Outline. The script restores documents with their full hierarchy using metadata preserved during export, enabling disaster recovery, migration between Outline instances, and content restoration workflows.
---
## 2. Problem Statement
### Current State
- Export functionality exists via `export_with_trees.sh` and `outline_export_fixed.py`
- Exports include markdown content and `_collection_metadata.json` with full hierarchy
- No automated way to restore or migrate exported content back into Outline
### Pain Points
1. **Disaster Recovery**: Manual recreation of collections and documents after data loss
2. **Migration**: No tooling to move content between Outline instances
3. **Restore Workflow**: Cannot selectively restore deleted documents or collections
4. **Testing**: No way to verify export integrity via round-trip import
### Business Impact
- Hours of manual work to rebuild wiki structure after incidents
- Risk of hierarchy/relationship loss during manual restoration
- No confidence in backup validity without restore testing
---
## 3. Goals & Success Criteria
### Primary Goals
1. Restore exported markdown files to Outline with preserved hierarchy
2. Support both full restore and selective import workflows
3. Provide clear feedback on import progress and results
### Success Criteria
| Metric | Target |
|--------|--------|
| Document import success rate | ≥99% |
| Hierarchy accuracy | 100% parent-child relationships preserved |
| Performance | ≥10 documents/second |
| Dry-run accuracy | 100% match between preview and actual import |
### Non-Goals
- Image/attachment import (future enhancement)
- Conflict resolution with existing content (skip or fail)
- Real-time sync between instances
- User/permission migration
---
## 4. User Stories
### US-1: Disaster Recovery
> As an **administrator**, I want to **restore all collections from a backup** so that **I can recover from data loss**.
**Acceptance Criteria:**
- Import all collections from `outline_export/` directory
- Recreate exact hierarchy as shown in metadata
- Report success/failure summary
### US-2: Selective Restoration
> As a **user**, I want to **import a single collection** so that **I can restore specific content without affecting other data**.
**Acceptance Criteria:**
- Specify source directory containing single collection
- Create collection if it doesn't exist
- Skip import if collection already exists (configurable)
### US-3: Migration to New Instance
> As an **administrator**, I want to **import all content into a fresh Outline instance** so that **I can migrate to new infrastructure**.
**Acceptance Criteria:**
- Works against empty Outline instance
- Creates all collections and documents
- Preserves document nesting structure
### US-4: Safe Preview
> As a **user**, I want to **preview what will be imported** so that **I can verify before making changes**.
**Acceptance Criteria:**
- `--dry-run` flag shows all planned operations
- No API calls that modify data during dry run
- Output matches actual import behavior
### US-5: Consolidated Import
> As a **user**, I want to **import multiple collections into a single new collection** so that **I can reorganize content during import**.
**Acceptance Criteria:**
- `--single` mode creates timestamped collection
- Original collection names become top-level documents
- All nested hierarchy preserved under these parents
---
## 5. Functional Requirements
### 5.1 Import Modes
#### Mode 1: Collection-per-Folder (Default)
```
outline_export/
├── Bewerbungen/ → Creates "Bewerbungen" collection
├── Projekte/ → Creates "Projekte" collection
└── Privat/ → Creates "Privat" collection
```
**Behavior:**
- Each subdirectory in source becomes a separate collection
- Collection names match folder names exactly
- If collection exists: skip entire collection (default) or error
#### Mode 2: Single Collection (`--single`)
```
outline_export/
├── Bewerbungen/ → Becomes parent doc "Bewerbungen"
├── Projekte/ → Becomes parent doc "Projekte"
└── Privat/ → Becomes parent doc "Privat"
All imported into: "import_20260119_143052" collection
```
**Behavior:**
- Creates one collection named `import_YYYYMMDD_HHMMSS`
- Each original collection folder becomes a top-level parent document
- Original document hierarchy nested under these parents
### 5.2 Command-Line Interface
```bash
./import_to_outline.sh [OPTIONS]
Options:
-s, --single Import all into single timestamped collection
-n, --dry-run Preview operations without making changes
-d, --source DIR Source directory (default: outline_export)
-v, --verbose Increase output verbosity (-vv for debug)
-f, --force Overwrite existing collections (instead of skip)
--settings FILE Path to settings file (default: settings.json)
-h, --help Show help message
```
### 5.3 Document Creation Logic
#### Hierarchy Reconstruction Algorithm
```
1. Load _collection_metadata.json
2. Build document tree from `documents` array (using parent_id)
3. Topological sort: ensure parents created before children
4. For each document in sorted order:
a. Read markdown content from file
b. Map old parent_id → new parent_id (from creation responses)
c. Create document via API with parentDocumentId
d. Store id mapping: old_id → new_id
5. Verify: created count matches expected count
```
#### ID Mapping Example
```
Export metadata:
doc_A (id: abc-123, parent_id: null)
doc_B (id: def-456, parent_id: abc-123)
After creating doc_A:
id_map = { "abc-123": "new-789" }
Creating doc_B:
parent_id = id_map["abc-123"] = "new-789"
API call: create doc_B with parentDocumentId: "new-789"
```
### 5.4 Duplicate Handling
| Scenario | Default Behavior | With `--force` |
|----------|------------------|----------------|
| Collection exists | Skip entire collection | Delete and recreate |
| Document title exists in collection | Skip document | Update document |
### 5.5 Error Handling
| Error Type | Behavior |
|------------|----------|
| API connection failure | Abort with error message |
| Collection creation fails | Abort that collection, continue others |
| Document creation fails | Log error, continue with siblings |
| Missing markdown file | Log warning, skip document |
| Invalid metadata JSON | Abort that collection |
| Parent document not found | Create as root-level document |
---
## 6. Technical Design
### 6.1 Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ import_to_outline.sh │
│ (Bash wrapper - Docker execution, backup, UI) │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ outline_import.py │
│ (Python core - API calls, hierarchy logic) │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────┼─────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│ settings.json│ │ metadata │ │ Outline API │
│ (API config) │ │ .json │ │ (HTTP POST) │
└──────────────┘ └──────────┘ └──────────────┘
```
### 6.2 API Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/collections.list` | POST | Check existing collections |
| `/api/collections.create` | POST | Create new collection |
| `/api/collections.delete` | POST | Delete collection (--force mode) |
| `/api/documents.create` | POST | Create document with content |
| `/api/documents.list` | POST | Check existing documents |
| `/api/documents.update` | POST | Update document (--force mode) |
### 6.3 API Request Examples
#### Create Collection
```json
POST /api/collections.create
{
"name": "Bewerbungen",
"permission": "read_write"
}
```
#### Create Document
```json
POST /api/documents.create
{
"collectionId": "col-uuid-here",
"title": "DORA Metrics (Top 4)",
"text": "# DORA Metrics\n\nContent here...",
"parentDocumentId": "parent-uuid-or-null",
"publish": true
}
```
### 6.4 Data Structures
#### Input: `_collection_metadata.json`
```json
{
"id": "original-collection-uuid",
"name": "Bewerbungen",
"directory": "Bewerbungen",
"expected_count": 11,
"documents": [
{
"id": "doc-uuid",
"title": "Document Title",
"filename": "Document Title.md",
"parent_id": "parent-uuid-or-null",
"checksum": "sha256-hash",
"children": [...]
}
]
}
```
#### Runtime: ID Mapping
```python
id_map: Dict[str, str] = {
"old-uuid-1": "new-uuid-1",
"old-uuid-2": "new-uuid-2"
}
```
### 6.5 Docker Execution
```bash
docker run --rm --network domnet \
--user "$(id -u):$(id -g)" \
-e HOME=/tmp \
-v "$WORK_DIR:/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -qqq requests 2>/dev/null && \
python3 outline_import.py $CLI_ARGS"
```
---
## 7. User Interface
### 7.1 Progress Output
```
════════════════════════════════════════════════════════════
OUTLINE IMPORT
════════════════════════════════════════════════════════════
Source: outline_export/
Target: http://outline:3000
Mode: Collection per folder
Checking API connectivity... ✓
Bewerbungen/ (11 documents)
Creating collection... ✓ (id: 7f3a...)
├── CV.md ✓ created
├── DORA Metrics (Top 4).md ✓ created
├── Tipico.md ✓ created
│ ├── Pitch Tipico.md ✓ created
│ ├── Fragen 3. Runde.md ✓ created
│ ├── Tipico 3rd Party.md ✓ created
│ └── Tipico Top 10 Functions.md ✓ created
└── Ihre PVS.md ✓ created
├── Mobilepass.md ✓ created
├── PVS erster Call.md ✓ created
└── Fragen Dirk.md ✓ created
Projekte/ (8 documents)
Collection exists, skipping...
════════════════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════════════════
Collections: 1 created, 1 skipped, 0 errors
Documents: 11 created, 0 skipped, 0 errors
Duration: 2.3 seconds
════════════════════════════════════════════════════════════
```
### 7.2 Dry-Run Output
```
════════════════════════════════════════════════════════════
OUTLINE IMPORT (DRY RUN)
════════════════════════════════════════════════════════════
Source: outline_export/
Target: http://outline:3000
Mode: Collection per folder
[DRY RUN] No changes will be made
Bewerbungen/ (11 documents)
[DRY RUN] Would create collection "Bewerbungen"
[DRY RUN] Would create 11 documents:
├── CV.md
├── DORA Metrics (Top 4).md
├── Tipico.md
│ ├── Pitch Tipico.md
│ └── ...
└── ...
Projekte/ (8 documents)
[DRY RUN] Collection exists - would skip
════════════════════════════════════════════════════════════
DRY RUN SUMMARY
════════════════════════════════════════════════════════════
Would create: 1 collection, 11 documents
Would skip: 1 collection (exists)
════════════════════════════════════════════════════════════
```
### 7.3 Error Output
```
Bewerbungen/ (11 documents)
Creating collection... ✓
├── CV.md ✓ created
├── Missing Doc.md ✗ file not found
└── Tipico.md ✗ API error: 500
└── (children skipped due to parent failure)
```
---
## 8. Configuration
### 8.1 settings.json (shared with export)
```json
{
"source": {
"url": "http://outline:3000",
"token": "ol_api_xxxxxxxxxxxx"
},
"import": {
"source_directory": "outline_export",
"on_collection_exists": "skip",
"on_document_exists": "skip",
"default_permission": "read_write"
},
"advanced": {
"request_timeout": 30,
"retry_attempts": 3,
"retry_delay": 1.0,
"rate_limit_delay": 0.1
}
}
```
### 8.2 Configuration Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `import.source_directory` | string | `outline_export` | Default source path |
| `import.on_collection_exists` | enum | `skip` | `skip`, `error`, `merge` |
| `import.on_document_exists` | enum | `skip` | `skip`, `error`, `update` |
| `import.default_permission` | enum | `read_write` | `read`, `read_write` |
| `advanced.request_timeout` | int | 30 | API timeout in seconds |
| `advanced.retry_attempts` | int | 3 | Retries on failure |
| `advanced.rate_limit_delay` | float | 0.1 | Delay between API calls |
---
## 9. Testing Strategy
### 9.1 Test Cases
| ID | Category | Test Case | Expected Result |
|----|----------|-----------|-----------------|
| T1 | Happy Path | Import single collection | Collection + docs created |
| T2 | Happy Path | Import multiple collections | All collections created |
| T3 | Happy Path | Import nested hierarchy (3+ levels) | All parent-child relationships preserved |
| T4 | Duplicate | Collection already exists | Skip collection |
| T5 | Duplicate | Document title exists | Skip document |
| T6 | Error | Missing markdown file | Log warning, continue |
| T7 | Error | Invalid metadata JSON | Abort collection |
| T8 | Error | API unreachable | Abort with clear error |
| T9 | Mode | --single flag | Single timestamped collection |
| T10 | Mode | --dry-run flag | No API mutations |
| T11 | Mode | --force flag | Overwrites existing |
| T12 | Edge | Empty collection | Create empty collection |
| T13 | Edge | Special chars in title | Handled correctly |
| T14 | Edge | Very large document | Imported successfully |
### 9.2 Verification Methods
1. **Round-trip Test**: Export → Import to test instance → Export again → Compare checksums
2. **API Verification**: Query created documents and verify parent relationships
3. **Manual Inspection**: Spot-check imported content in Outline UI
---
## 10. Rollback & Recovery
### Pre-Import Safety
- No automatic backup (user should have export as backup)
- `--dry-run` always available to preview
### Rollback Procedure
If import fails partway through:
1. Note which collections were created (from import log)
2. Manually delete partial collections via Outline UI or API
3. Fix issue and re-run import
### Future Enhancement
- `--backup` flag to export existing content before import
- Transaction-like behavior: delete partially-imported collection on failure
---
## 11. Future Enhancements
| Priority | Enhancement | Description |
|----------|-------------|-------------|
| P1 | Attachment support | Import images and file attachments |
| P1 | Merge mode | Add documents to existing collections |
| P2 | Selective import | Import specific documents by path/pattern |
| P2 | Update mode | Update existing documents with new content |
| P3 | User mapping | Preserve authorship via user email mapping |
| P3 | Permission sync | Restore document-level permissions |
| P3 | Incremental import | Only import new/changed documents |
---
## 12. Implementation Checklist
- [ ] Create `outline_import.py` with core import logic
- [ ] Create `import_to_outline.sh` bash wrapper
- [ ] Implement collection creation
- [ ] Implement document creation with hierarchy
- [ ] Implement ID mapping for parent references
- [ ] Add `--dry-run` mode
- [ ] Add `--single` mode
- [ ] Add `--force` mode
- [ ] Add progress visualization
- [ ] Add error handling and reporting
- [ ] Add retry logic for API failures
- [ ] Update settings.json schema
- [ ] Write tests
- [ ] Update CLAUDE.md documentation
---
## 13. References
- **Export Script**: `export_with_trees.sh`, `outline_export_fixed.py`
- **Outline API Docs**: https://www.getoutline.com/developers
- **Metadata Format**: See `outline_export/*/_collection_metadata.json`
- **Settings Format**: See `settings.json`

202
RALPH_PROMPT.md Normal file
View File

@@ -0,0 +1,202 @@
# Outline Import Script - Ralph Loop Prompt
## Your Mission
Build `import_to_outline.sh` and `outline_import.py` - companion scripts to the existing export tools that import markdown files back into Outline wiki.
**Requirements Document:** Read `IMPORT_SCRIPT.MD` for full specifications.
**Reference Implementation:** Study `export_with_trees.sh` and `outline_export_fixed.py` for patterns.
---
## Iteration Protocol
Each iteration, follow this cycle:
### 1. Assess Current State
```bash
# Check what exists
ls -la *.sh *.py 2>/dev/null
git status
git log --oneline -5 2>/dev/null || echo "No git history"
```
### 2. Read Requirements (if needed)
- Review `IMPORT_SCRIPT.MD` for specifications
- Review `outline_export_fixed.py` for API patterns and settings.json structure
- Review `settings.json` for configuration format
### 3. Implement Next Phase
Work on the current phase until complete, then move to the next.
### 4. Test Your Work
- Run syntax checks: `python3 -m py_compile outline_import.py`
- Run bash checks: `bash -n import_to_outline.sh`
- Test `--help` output
- Test `--dry-run` mode against `outline_export/` directory
### 5. Commit Progress
```bash
git add -A && git commit -m "Phase X: description"
```
---
## Implementation Phases
### Phase 1: Core Python Structure
Create `outline_import.py` with:
- [ ] `OutlineImporter` class with settings loading (copy pattern from `outline_export_fixed.py`)
- [ ] API helper methods: `_api_request()`, `_get_collections()`, `_create_collection()`
- [ ] Argument parsing with all CLI options from spec
- [ ] Basic logging setup
**Verification:** `python3 -m py_compile outline_import.py` passes
### Phase 2: Metadata Loading
- [ ] Load `_collection_metadata.json` from each collection directory
- [ ] Build document tree from `documents` array
- [ ] Implement topological sort for parent-before-child ordering
- [ ] Handle missing/invalid metadata gracefully
**Verification:** Can parse metadata from `outline_export/*/`
### Phase 3: Collection Import Logic
- [ ] Check if collection exists via `/api/collections.list`
- [ ] Create collection via `/api/collections.create`
- [ ] Handle `--force` mode (delete and recreate)
- [ ] Skip existing collections by default
**Verification:** `--dry-run` shows correct collection operations
### Phase 4: Document Import with Hierarchy
- [ ] Read markdown content from files
- [ ] Create documents via `/api/documents.create`
- [ ] Maintain ID mapping: `old_id -> new_id`
- [ ] Set `parentDocumentId` using mapped IDs
- [ ] Handle missing parent (create as root-level)
**Verification:** `--dry-run` shows correct document hierarchy
### Phase 5: Single Collection Mode
- [ ] Implement `--single` flag
- [ ] Create timestamped collection name `import_YYYYMMDD_HHMMSS`
- [ ] Convert original collection folders to parent documents
- [ ] Preserve nested hierarchy under these parents
**Verification:** `--dry-run --single` shows consolidated structure
### Phase 6: Progress Visualization
- [ ] Tree-style output matching spec (├──, └──, │)
- [ ] Status indicators (✓ created, ✗ error, ○ skipped)
- [ ] Summary statistics (collections/documents created/skipped/errors)
- [ ] Duration tracking
**Verification:** Output matches examples in IMPORT_SCRIPT.MD Section 7
### Phase 7: Bash Wrapper Script
Create `import_to_outline.sh` with:
- [ ] Docker execution (matching `export_with_trees.sh` pattern)
- [ ] CLI argument passthrough
- [ ] Help text
- [ ] Pre-flight checks (settings.json exists, source directory exists)
**Verification:** `./import_to_outline.sh --help` works
### Phase 8: Error Handling & Polish
- [ ] Retry logic for API failures (3 attempts, exponential backoff)
- [ ] Proper error messages for all failure modes
- [ ] Rate limiting delay between API calls
- [ ] Verbose/debug output levels
**Verification:** All error scenarios from spec handled
---
## Success Criteria
All of the following must be true:
1. **Files exist:** `import_to_outline.sh` and `outline_import.py`
2. **Syntax valid:** Both pass syntax checks without errors
3. **Help works:** `./import_to_outline.sh --help` shows usage
4. **Dry-run works:** `./import_to_outline.sh --dry-run` parses `outline_export/` and shows planned operations
5. **Single mode:** `./import_to_outline.sh --dry-run --single` shows consolidated import plan
6. **Matches spec:** Output format matches IMPORT_SCRIPT.MD Section 7 examples
---
## Completion Signal
When ALL success criteria are met, output:
```
<promise>IMPORT SCRIPT COMPLETE</promise>
```
**Do not output this promise until:**
- Both files exist and pass syntax checks
- `--help` displays properly
- `--dry-run` successfully parses metadata and shows planned operations
- Output format matches the specification
---
## Anti-Patterns to Avoid
1. **Don't skip phases** - Complete each phase before moving on
2. **Don't forget commits** - Commit after each successful phase
3. **Don't ignore errors** - Fix syntax/import errors before proceeding
4. **Don't deviate from spec** - Follow IMPORT_SCRIPT.MD precisely
5. **Don't over-engineer** - Implement exactly what's specified, no more
---
## Helpful Context
### API Endpoint Examples (from spec)
```python
# Create collection
POST /api/collections.create
{"name": "Bewerbungen", "permission": "read_write"}
# Create document
POST /api/documents.create
{
"collectionId": "col-uuid",
"title": "Document Title",
"text": "# Content\n\nMarkdown here...",
"parentDocumentId": "parent-uuid-or-null",
"publish": true
}
```
### Docker Execution Pattern
```bash
docker run --rm --network domnet \
--user "$(id -u):$(id -g)" \
-e HOME=/tmp \
-v "$WORK_DIR:/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -qqq requests 2>/dev/null && python3 outline_import.py $CLI_ARGS"
```
### Settings Structure (existing in settings.json)
```json
{
"source": {
"url": "http://outline:3000",
"token": "ol_api_xxx"
}
}
```
---
## Current Iteration
Read the files, check git history, determine which phase you're on, and continue from there.
If starting fresh: Begin with Phase 1.

217
README.md Normal file
View File

@@ -0,0 +1,217 @@
# Outline Export Tool
Export Outline wiki data with full hierarchy and tree visualization.
## Quick Start
### 1. Configure Settings
Ensure `settings.json` contains your Outline API token:
```bash
cat settings.json
```
### 2. Run Export
```bash
./export_with_trees.sh
```
## Command Line Options
```bash
# Standard export with tree visualization
./export_with_trees.sh
# Preview without exporting (dry run)
./export_with_trees.sh --dry-run
# Verbose output
./export_with_trees.sh -v
# Debug output
./export_with_trees.sh -vv
# Skip verification step
./export_with_trees.sh --skip-verify
# Custom output directory
./export_with_trees.sh -o /path/to/output
```
### All CLI Options
| Option | Short | Description |
|--------|-------|-------------|
| `--dry-run` | `-n` | Preview without writing files |
| `--output` | `-o` | Output directory (overrides settings) |
| `--verbose` | `-v` | Increase verbosity (-vv for debug) |
| `--skip-verify` | | Skip post-export verification |
| `--skip-health-check` | | Skip pre-export health check |
| `--settings` | | Path to settings file |
## What It Does
1. **Health check** - Verifies API connectivity and authentication
2. **Shows current structure** - Tree view from Outline API
3. **Backs up previous exports** - Timestamped `.tar.gz` archives
4. **Exports all documents** - With full hierarchy preserved
5. **Shows exported structure** - Tree view of files
6. **Verifies counts** - Compares API vs exported documents
## Features
- **Retry logic**: Failed API requests retry up to 3 times with exponential backoff
- **Health check**: Verifies API before starting export
- **Dry-run mode**: Preview what would be exported
- **Structured logging**: Configurable verbosity levels
- **Document caching**: Prevents duplicate API fetches
- **Checksum verification**: Ensures export integrity
## File Structure
```
outline-tools/
├── export_with_trees.sh # Main export script
├── outline_export_fixed.py # Python export logic
├── settings.json # API configuration
├── CLAUDE.md # AI assistant docs
├── README.md # This file
└── Output (created after export):
├── exports/ # Exported documents
└── exports_backup_*.tar.gz # Previous backups
```
## Configuration
### settings.json
```json
{
"source": {
"url": "http://outline:3000",
"token": "ol_api_..."
},
"export": {
"output_directory": "exports",
"verify_after_export": true
},
"advanced": {
"max_hierarchy_depth": 100,
"api_timeout_seconds": 30,
"progress_bar": true,
"max_retries": 3,
"retry_backoff": 1.0
}
}
```
### Configuration Options
| Section | Key | Default | Description |
|---------|-----|---------|-------------|
| source | url | - | Outline API URL |
| source | token | - | API authentication token |
| export | output_directory | exports | Where to save files |
| export | verify_after_export | true | Run verification after export |
| advanced | max_hierarchy_depth | 100 | Prevent infinite recursion |
| advanced | progress_bar | true | Show progress bars |
| advanced | max_retries | 3 | API retry attempts |
| advanced | retry_backoff | 1.0 | Retry backoff multiplier |
## How It Works
### Docker Network Access
The script runs in a Docker container on `domnet` to access Outline internally:
```
export_with_trees.sh → Docker Container (domnet) → outline:3000
```
This bypasses Authentik SSO that would block external HTTPS requests.
### Export Process
1. **Health check** - Verify API connectivity
2. **Fetch collections** from Outline API
3. **Build hierarchy** from navigation tree (source of truth)
4. **Export recursively** maintaining parent-child structure
5. **Save metadata** per collection
6. **Verify** document counts and checksums
## Output Format
### Collection Structure
```
exports/
├── Collection_Name/
│ ├── _collection_metadata.json
│ ├── Document.md
│ └── Child_Document.md
├── export_metadata.json
└── manifest.json
```
### Document Format
```markdown
# Document Title
<!-- Document ID: abc123 -->
<!-- Created: 2025-01-13T10:00:00Z -->
<!-- Updated: 2025-01-13T14:30:00Z -->
<!-- URL: https://outline.domverse.de/doc/... -->
---
Document content here...
```
## Troubleshooting
### Health Check Fails
```bash
# Check if Outline is accessible
docker exec -it outline curl -s http://localhost:3000/api/auth.info
# Verify API token
docker run --rm --network domnet python:3.11-slim \
python3 -c "import requests; r=requests.post('http://outline:3000/api/auth.info', headers={'Authorization': 'Bearer YOUR_TOKEN'}); print(r.status_code)"
```
### Docker Permission Denied
```bash
sudo usermod -aG docker $USER
newgrp docker
```
### Container Not Found
```bash
# Verify Outline is running
docker ps | grep outline
```
### Verification Fails
```bash
# Clean start
rm -rf exports/
./export_with_trees.sh
```
### API Errors
Check `exports/export_errors.json` for details on failed documents.
## Security
- `settings.json` contains API token - never commit to git
- Backup files may contain sensitive wiki content
- Consider restricting file permissions:
```bash
chmod 600 settings.json
chmod 700 exports/
```
---
**Last Updated:** 2026-01-14

529
export_with_trees.sh Executable file
View File

@@ -0,0 +1,529 @@
#!/bin/bash
#
# Outline Export Script with Tree Visualization
# Exports all Outline documents with full hierarchy and shows side-by-side tree comparison
#
# Usage: ./export_with_trees.sh [OPTIONS]
# Options are passed through to the Python script (--dry-run, -v, etc.)
#
set -e # Exit on error
# Capture CLI arguments to pass to Python
CLI_ARGS="$@"
# Colors for output
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color
# Configuration
WORK_DIR="$(pwd)"
SETTINGS_FILE="$WORK_DIR/settings.json"
EXPORT_DIR="$WORK_DIR/outline_export"
echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
echo -e "${BLUE} OUTLINE EXPORT${NC}"
echo -e "${BLUE}════════════════════════════════════════════════════════════${NC}"
echo ""
# Check if settings.json exists
if [ ! -f "$SETTINGS_FILE" ]; then
echo -e "${RED}Error: settings.json not found${NC}"
exit 1
fi
# Extract API details from settings.json
API_URL=$(jq -r '.source.url' "$SETTINGS_FILE")
API_TOKEN=$(jq -r '.source.token' "$SETTINGS_FILE")
# Backup old export if it exists
if [ -d "$EXPORT_DIR" ]; then
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$WORK_DIR/outline_backup_${TIMESTAMP}.tar.gz"
echo -e "${YELLOW}Backing up previous export...${NC}"
tar -czf "$BACKUP_FILE" -C "$WORK_DIR" "outline_export" 2>/dev/null
echo -e "${GREEN}✓ Backup: $BACKUP_FILE ($(du -sh "$BACKUP_FILE" | cut -f1))${NC}"
rm -rf "$EXPORT_DIR"
fi
echo -e "${GREEN}Exporting documents...${NC}"
echo ""
# Run the export with CLI arguments (as current user to avoid root-owned files)
docker run --rm --network domnet \
--user "$(id -u):$(id -g)" \
-e HOME=/tmp \
-v "$WORK_DIR:/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -qqq requests tqdm 2>/dev/null && python3 outline_export_fixed.py $CLI_ARGS"
echo ""
# Create Python script for side-by-side tree comparison
cat > "$WORK_DIR/.tree_compare.py" << 'PYTHON_SCRIPT'
#!/usr/bin/env python3
"""
Side-by-side comparison of Outline online vs exported files.
Matches documents row by row and highlights differences.
"""
import sys
import re
import shutil
import requests
from pathlib import Path
# Colors
GREEN = '\033[0;32m'
RED = '\033[0;31m'
YELLOW = '\033[1;33m'
BLUE = '\033[0;34m'
CYAN = '\033[0;36m'
BOLD = '\033[1m'
DIM = '\033[2m'
RESET = '\033[0m'
def get_terminal_width():
try:
return shutil.get_terminal_size().columns
except:
return 120
def normalize_filename(name):
"""Normalize a name for comparison (handles / -> _ conversion etc)."""
# Replace characters that filesystems don't allow
normalized = name.replace('/', '_').replace('\\', '_')
normalized = normalized.replace(':', '_').replace('*', '_')
normalized = normalized.replace('?', '_').replace('"', '_')
normalized = normalized.replace('<', '_').replace('>', '_')
normalized = normalized.replace('|', '_')
return normalized.strip()
def get_online_docs(api_url, api_token):
"""Fetch all documents from Outline API, organized by collection."""
headers = {
"Authorization": f"Bearer {api_token}",
"Content-Type": "application/json"
}
response = requests.post(f"{api_url}/api/collections.list", headers=headers, json={})
collections = response.json().get("data", [])
collections = sorted(collections, key=lambda c: c.get('name', ''))
# Build collection ID to name mapping
coll_id_to_name = {c['id']: c['name'] for c in collections}
# Fetch all documents with timestamps using documents.list
all_docs_response = requests.post(
f"{api_url}/api/documents.list",
headers=headers,
json={"limit": 1000} # Get all docs
)
all_docs = all_docs_response.json().get("data", [])
# Create timestamp lookup by (collection_name, normalized_title)
timestamp_lookup = {}
for doc in all_docs:
coll_id = doc.get("collectionId")
coll_name = coll_id_to_name.get(coll_id, "Unknown")
title = doc.get("title", "Untitled")
norm_title = normalize_filename(title)
timestamp_lookup[(coll_name, norm_title)] = doc.get("updatedAt")
result = {}
for coll in collections:
coll_name = coll['name']
result[coll_name] = []
# Get navigation tree
nav_response = requests.post(
f"{api_url}/api/collections.documents",
headers=headers,
json={"id": coll["id"]}
)
nav_tree = nav_response.json().get("data", [])
def collect_docs(nodes):
docs = []
for node in nodes:
title = node.get("title", "Untitled")
norm_title = normalize_filename(title)
has_children = len(node.get("children", [])) > 0
updated_at = timestamp_lookup.get((coll_name, norm_title))
docs.append({
'title': title,
'normalized': norm_title,
'has_children': has_children,
'updatedAt': updated_at
})
if has_children:
docs.extend(collect_docs(node.get("children", [])))
return docs
result[coll_name] = collect_docs(nav_tree)
return result
def get_export_docs(export_dir):
"""Get all exported documents, organized by collection."""
import os
export_path = Path(export_dir)
result = {}
if not export_path.exists():
return result
for coll_dir in sorted(export_path.iterdir()):
if coll_dir.is_dir():
coll_name = coll_dir.name
docs = []
for md_file in sorted(coll_dir.glob("*.md")):
title = md_file.stem
if title: # Skip empty filenames
mtime = os.path.getmtime(md_file)
docs.append({
'title': title,
'normalized': normalize_filename(title),
'path': md_file,
'mtime': mtime
})
result[coll_name] = docs
return result
def match_and_compare(online_docs, export_docs):
"""Match online and export docs, return comparison data per collection."""
from datetime import datetime
all_collections = sorted(set(online_docs.keys()) | set(export_docs.keys()))
comparison = []
for coll_name in all_collections:
online_list = online_docs.get(coll_name, [])
export_list = export_docs.get(coll_name, [])
# Create lookup by normalized name
export_lookup = {d['normalized']: d for d in export_list}
online_lookup = {d['normalized']: d for d in online_list}
rows = []
matched_export = set()
# First pass: match online docs to export
for doc in sorted(online_list, key=lambda d: d['title'].lower()):
norm = doc['normalized']
if norm in export_lookup:
export_doc = export_lookup[norm]
# Check freshness
freshness = 'current' # default
if doc.get('updatedAt') and export_doc.get('mtime'):
online_dt = datetime.fromisoformat(doc['updatedAt'].replace('Z', '+00:00'))
online_ts = online_dt.timestamp()
export_ts = export_doc['mtime']
# Allow 60s tolerance
if export_ts < online_ts - 60:
freshness = 'stale'
rows.append({
'online': doc['title'],
'export': export_doc['title'],
'status': 'match',
'is_folder': doc['has_children'],
'freshness': freshness
})
matched_export.add(norm)
else:
rows.append({
'online': doc['title'],
'export': None,
'status': 'missing',
'is_folder': doc['has_children'],
'freshness': None
})
# Second pass: find extra export docs
for doc in sorted(export_list, key=lambda d: d['title'].lower()):
if doc['normalized'] not in matched_export:
rows.append({
'online': None,
'export': doc['title'],
'status': 'extra',
'is_folder': False,
'freshness': None
})
# Sort rows: matched first, then missing, then extra
rows.sort(key=lambda r: (
0 if r['status'] == 'match' else (1 if r['status'] == 'missing' else 2),
(r['online'] or r['export'] or '').lower()
))
comparison.append({
'collection': coll_name,
'rows': rows,
'online_count': len(online_list),
'export_count': len(export_list)
})
return comparison
def print_comparison(comparison):
"""Print the side-by-side comparison with status indicators."""
term_width = get_terminal_width()
col_width = (term_width - 10) // 2 # -10 for separators and status icons
total_online = 0
total_export = 0
total_matched = 0
total_missing = 0
total_extra = 0
total_stale = 0
print(f"\n{BLUE}{'═' * term_width}{RESET}")
print(f"{BOLD}{CYAN}{'ONLINE':<{col_width}} {'':5} {'EXPORTED':<{col_width}}{RESET}")
print(f"{BLUE}{'═' * term_width}{RESET}")
for coll in comparison:
total_online += coll['online_count']
total_export += coll['export_count']
# Collection header
coll_matched = sum(1 for r in coll['rows'] if r['status'] == 'match')
coll_missing = sum(1 for r in coll['rows'] if r['status'] == 'missing')
coll_extra = sum(1 for r in coll['rows'] if r['status'] == 'extra')
coll_stale = sum(1 for r in coll['rows'] if r.get('freshness') == 'stale')
total_matched += coll_matched
total_missing += coll_missing
total_extra += coll_extra
total_stale += coll_stale
if coll_missing == 0 and coll_extra == 0:
coll_status = f"{GREEN}✓{RESET}"
else:
coll_status = f"{RED}✗{RESET}"
header = f"{coll['collection']}/ ({coll['online_count']} → {coll['export_count']})"
print(f"\n{BOLD}{YELLOW}{header}{RESET} {coll_status}")
print(f"{BLUE}{'─' * term_width}{RESET}")
for row in coll['rows']:
online_name = row['online'] or ''
export_name = row['export'] or ''
# Add folder indicator
if row['is_folder'] and online_name:
online_name = f"📁 {online_name}"
# Truncate if needed
if len(online_name) > col_width - 1:
online_name = online_name[:col_width-4] + '...'
if len(export_name) > col_width - 1:
export_name = export_name[:col_width-4] + '...'
# Status and colors
if row['status'] == 'match':
# Freshness indicator
if row.get('freshness') == 'stale':
freshness = f"{YELLOW}●{RESET}"
else:
freshness = f"{GREEN}●{RESET}"
status = f"{GREEN}✓{RESET}{freshness}"
left = f"{online_name}"
right = f"{export_name}"
elif row['status'] == 'missing':
status = f"{RED}✗{RESET} "
left = f"{RED}{online_name}{RESET}"
right = f"{DIM}---{RESET}"
else: # extra
status = f"{YELLOW}+{RESET} "
left = f"{DIM}---{RESET}"
right = f"{YELLOW}{export_name}{RESET}"
# Calculate visible width (without ANSI codes)
def visible_len(s):
return len(re.sub(r'\033\[[0-9;]*m', '', s))
left_pad = col_width - visible_len(left)
right_pad = col_width - visible_len(right)
print(f" {left}{' ' * max(0, left_pad)} {status} {right}")
# Summary
print(f"\n{BLUE}{'═' * term_width}{RESET}")
print(f"{BOLD}SUMMARY:{RESET}")
print(f" Online: {total_online} documents")
print(f" Exported: {total_export} documents")
print(f" {GREEN}✓● Matched & current: {total_matched - total_stale}{RESET}")
if total_stale > 0:
print(f" {YELLOW}✓● Matched but stale: {total_stale} (export older than online){RESET}")
if total_missing > 0:
print(f" {RED}✗ Missing: {total_missing} (online but not exported){RESET}")
if total_extra > 0:
print(f" {YELLOW}+ Extra: {total_extra} (exported but not online){RESET}")
if total_missing == 0 and total_extra == 0 and total_stale == 0:
print(f"\n{GREEN}✓ All documents exported and current!{RESET}")
elif total_missing == 0 and total_extra == 0:
print(f"\n{YELLOW}⚠ All documents exported but {total_stale} are stale{RESET}")
print()
def get_latest_changes(api_url, api_token, limit=3):
"""Fetch the most recently updated documents."""
headers = {
"Authorization": f"Bearer {api_token}",
"Content-Type": "application/json"
}
response = requests.post(
f"{api_url}/api/documents.list",
headers=headers,
json={
"sort": "updatedAt",
"direction": "DESC",
"limit": limit
}
)
docs = response.json().get("data", [])
result = []
for doc in docs:
# Get collection name
coll_id = doc.get("collectionId")
coll_name = "Unknown"
if coll_id:
coll_response = requests.post(
f"{api_url}/api/collections.info",
headers=headers,
json={"id": coll_id}
)
coll_data = coll_response.json().get("data", {})
coll_name = coll_data.get("name", "Unknown")
result.append({
'title': doc.get("title", "Untitled"),
'collection': coll_name,
'updatedAt': doc.get("updatedAt"),
'normalized': normalize_filename(doc.get("title", "Untitled"))
})
return result
def find_export_file(export_dir, collection, normalized_title):
"""Find the exported file matching the document."""
export_path = Path(export_dir)
# Try exact collection match first
coll_dir = export_path / collection
if coll_dir.exists():
for md_file in coll_dir.glob("*.md"):
if normalize_filename(md_file.stem) == normalized_title:
return md_file
# Try all collections (in case of name mismatch)
for coll_dir in export_path.iterdir():
if coll_dir.is_dir():
for md_file in coll_dir.glob("*.md"):
if normalize_filename(md_file.stem) == normalized_title:
return md_file
return None
def print_latest_changes(latest_docs, export_dir):
"""Print the latest changes section."""
term_width = get_terminal_width()
from datetime import datetime
import os
print(f"\n{BLUE}{'═' * term_width}{RESET}")
print(f"{BOLD}{CYAN}LATEST CHANGES (verify actuality){RESET}")
print(f"{BLUE}{'─' * term_width}{RESET}")
for i, doc in enumerate(latest_docs, 1):
title = doc['title']
collection = doc['collection']
updated_at = doc['updatedAt']
# Parse online timestamp
if updated_at:
# Handle ISO format with timezone
online_dt = datetime.fromisoformat(updated_at.replace('Z', '+00:00'))
online_str = online_dt.strftime("%Y-%m-%d %H:%M:%S")
else:
online_str = "Unknown"
# Find export file
export_file = find_export_file(export_dir, collection, doc['normalized'])
if export_file and export_file.exists():
export_mtime = os.path.getmtime(export_file)
export_dt = datetime.fromtimestamp(export_mtime)
export_str = export_dt.strftime("%Y-%m-%d %H:%M:%S")
# Compare (export should be same time or newer)
if updated_at:
# Convert online to local timestamp for comparison
online_ts = online_dt.timestamp()
if export_mtime >= online_ts - 60: # Allow 60s tolerance
status = f"{GREEN}✓{RESET}"
else:
status = f"{YELLOW}⚠ older{RESET}"
else:
status = f"{GREEN}✓{RESET}"
else:
export_str = "NOT FOUND"
status = f"{RED}✗{RESET}"
# Print entry
print(f"\n {BOLD}{i}. {title}{RESET}")
print(f" {DIM}Collection:{RESET} {collection}")
print(f" {DIM}Online:{RESET} {online_str}")
print(f" {DIM}Exported:{RESET} {export_str} {status}")
print(f"\n{BLUE}{'═' * term_width}{RESET}")
def main():
if len(sys.argv) != 4:
print("Usage: script.py <API_URL> <API_TOKEN> <EXPORT_DIR>")
sys.exit(1)
api_url = sys.argv[1]
api_token = sys.argv[2]
export_dir = sys.argv[3]
# Get documents from both sources
online_docs = get_online_docs(api_url, api_token)
export_docs = get_export_docs(export_dir)
# Match and compare
comparison = match_and_compare(online_docs, export_docs)
# Print results
print_comparison(comparison)
# Get and print latest changes
latest_docs = get_latest_changes(api_url, api_token, limit=3)
print_latest_changes(latest_docs, export_dir)
if __name__ == "__main__":
main()
PYTHON_SCRIPT
# Run the side-by-side tree comparison (use /work/outline_export as container path)
docker run --rm --network domnet \
--user "$(id -u):$(id -g)" \
-e HOME=/tmp \
-v "$WORK_DIR:/work" \
-w /work \
python:3.11-slim \
bash -c "pip install -qqq requests 2>/dev/null && python3 /work/.tree_compare.py '$API_URL' '$API_TOKEN' '/work/outline_export'"
# Cleanup
rm -f "$WORK_DIR/.tree_compare.py"
echo ""

1031
outline_export_fixed.py Executable file

File diff suppressed because it is too large Load Diff