Documentation
From zero to processing audio in five steps. No signup, no API keys — just HTTP requests and stablecoins.
POST your audio file to /v1/assets. You get back an asset_id and asset_token that you use to reference the file in all subsequent operations. Assets are stored securely in R2 and expire after 30 days.
$ curl -X POST https://api.soundhalo.com/v1/assets \ -F "file=@podcast.wav" { "asset_id": "ast_a1b2c3d4", "asset_token": "atok_x9y8z7", "mime_type": "audio/wav", "duration_sec": 1847.3, "size_bytes": 58312960 }
Run /v1/inspect/preflight to get a full quality report — noise levels, clipping, silence, transcription readiness score, and a recommended processing pipeline. All inspect operations reference your asset by ID.
$ curl -X POST https://api.soundhalo.com/v1/inspect/preflight \ -H "Content-Type: application/json" \ -d '{"input":{"asset_ref":{"asset_id":"ast_a1b2c3d4","asset_token":"atok_x9y8z7"}}}' { "file_info": { "duration_sec": 1847.3, "format": "wav" }, "issues": [ { "type": "background_noise", "severity": "high" } ], "transcription_readiness": { "score": 42, "level": "poor" }, "recommended_pipelines": [ { "goal": "transcription", "steps": ["denoise", "normalize", "transcode"] } ] }
Each transform is a separate endpoint. Call /v1/transform/denoise, /v1/transform/normalize-loudness, or any other operation individually. Each returns a new derived asset.
$ curl -X POST https://api.soundhalo.com/v1/transform/denoise \ -H "Content-Type: application/json" \ -H "X-Payment-Proof: <signed-tx>" \ -d '{"input":{"asset_ref":{"asset_id":"ast_a1b2c3d4","asset_token":"atok_x9y8z7"}},"options":{"strength":"medium"}}' { "output_asset": { "asset_id": "ast_e5f6g7h8", "kind": "derived" }, "summary": { "input_duration_sec": 1847.3 } }
Call /v1/extract/transcribe for speech-to-text. Three providers — OpenAI Whisper, ElevenLabs Scribe v2, Gemini Flash — with automatic fallback. Supports async mode for long files via X-Execution-Mode: async.
$ curl -X POST https://api.soundhalo.com/v1/extract/transcribe \ -H "Content-Type: application/json" \ -H "X-Payment-Proof: <signed-tx>" \ -d '{"input":{"asset_ref":{"asset_id":"ast_e5f6g7h8","asset_token":"atok_x9y8z7"}}}' { "text": "Welcome to the show...", "segments": [ { "start": 0.0, "end": 4.2, "text": "Welcome to the show..." } ], "detected_language": "en", "duration_sec": 1847.3 }
Skip the manual steps. Workflows chain operations for you — /v1/workflows/prepare-for-transcription runs denoise, trim, normalize, and transcode in one call. Also available: prepare-for-podcast and run-recommended-pipeline.
# Agent workflow pseudocode asset = POST /v1/assets ← free (upload) report = POST /v1/inspect/preflight ← $0.005/req quote = POST /v1/quotes ← free (estimate) clean = POST /v1/workflows/prepare-for-transcription text = POST /v1/extract/transcribe ← $0.05/min # Total for 30 min audio: ~$1.85 USDC # No API keys. No accounts. Just HTTP.
Payment
Every paid endpoint follows the same three-step flow. Your agent handles it automatically — no human interaction required.
Agent sends a standard HTTP request to the endpoint. No auth headers needed.
Server responds with 402 and an exact USDC price. Agent evaluates and decides whether to proceed.
Agent signs payment and re-sends request. Server verifies, processes audio, returns result.
Reference
Full API reference available at api.soundhalo.com/v1/openapi.json