Documentation

Getting started

From zero to processing audio in five steps. No signup, no API keys — just HTTP requests and stablecoins.

Upload your audio

POST your audio file to /v1/assets. You get back an asset_id and asset_token that you use to reference the file in all subsequent operations. Assets are stored securely in R2 and expire after 30 days.

upload asset

$ curl -X POST https://api.soundhalo.com/v1/assets \
    -F "file=@podcast.wav"

{
  "asset_id": "ast_a1b2c3d4",
  "asset_token": "atok_x9y8z7",
  "mime_type": "audio/wav",
  "duration_sec": 1847.3,
  "size_bytes": 58312960
}

Inspect your audio

Run /v1/inspect/preflight to get a full quality report — noise levels, clipping, silence, transcription readiness score, and a recommended processing pipeline. All inspect operations reference your asset by ID.

inspect / preflight

$ curl -X POST https://api.soundhalo.com/v1/inspect/preflight \
    -H "Content-Type: application/json" \
    -d '{"input":{"asset_ref":{"asset_id":"ast_a1b2c3d4","asset_token":"atok_x9y8z7"}}}'

{
  "file_info": { "duration_sec": 1847.3, "format": "wav" },
  "issues": [
    { "type": "background_noise", "severity": "high" }
  ],
  "transcription_readiness": { "score": 42, "level": "poor" },
  "recommended_pipelines": [
    { "goal": "transcription", "steps": ["denoise", "normalize", "transcode"] }
  ]
}

Transform the audio

Each transform is a separate endpoint. Call /v1/transform/denoise, /v1/transform/normalize-loudness, or any other operation individually. Each returns a new derived asset.

transform / denoise

$ curl -X POST https://api.soundhalo.com/v1/transform/denoise \
    -H "Content-Type: application/json" \
    -H "X-Payment-Proof: <signed-tx>" \
    -d '{"input":{"asset_ref":{"asset_id":"ast_a1b2c3d4","asset_token":"atok_x9y8z7"}},"options":{"strength":"medium"}}'

{
  "output_asset": {
    "asset_id": "ast_e5f6g7h8",
    "kind": "derived"
  },
  "summary": { "input_duration_sec": 1847.3 }
}

Transcribe to text

Call /v1/extract/transcribe for speech-to-text. Three providers — OpenAI Whisper, ElevenLabs Scribe v2, Gemini Flash — with automatic fallback. Supports async mode for long files via X-Execution-Mode: async.

extract / transcribe

$ curl -X POST https://api.soundhalo.com/v1/extract/transcribe \
    -H "Content-Type: application/json" \
    -H "X-Payment-Proof: <signed-tx>" \
    -d '{"input":{"asset_ref":{"asset_id":"ast_e5f6g7h8","asset_token":"atok_x9y8z7"}}}'

{
  "text": "Welcome to the show...",
  "segments": [
    { "start": 0.0, "end": 4.2, "text": "Welcome to the show..." }
  ],
  "detected_language": "en",
  "duration_sec": 1847.3
}

Or use a workflow

Skip the manual steps. Workflows chain operations for you — /v1/workflows/prepare-for-transcription runs denoise, trim, normalize, and transcode in one call. Also available: prepare-for-podcast and run-recommended-pipeline.

workflows

# Agent workflow pseudocode

asset  = POST /v1/assets           ← free (upload)
report = POST /v1/inspect/preflight ← $0.005/req
quote  = POST /v1/quotes           ← free (estimate)
clean  = POST /v1/workflows/prepare-for-transcription
text   = POST /v1/extract/transcribe ← $0.05/min

# Total for 30 min audio: ~$1.85 USDC
# No API keys. No accounts. Just HTTP.

Payment

x402 payment flow

Every paid endpoint follows the same three-step flow. Your agent handles it automatically — no human interaction required.

Request

Agent sends a standard HTTP request to the endpoint. No auth headers needed.

Quote

Server responds with 402 and an exact USDC price. Agent evaluates and decides whether to proceed.

Pay & Process

Agent signs payment and re-sends request. Server verifies, processes audio, returns result.

Reference

Endpoint overview

GET /healthz Service status check FREE

GET /v1/catalog All available operations with pricing FREE

GET /v1/pricing Current rates for all operations FREE

POST /v1/quotes Get cost estimate before processing FREE

POST /v1/assets Upload, manage, download audio assets FREE

GET /v1/jobs/:id Poll async job status FREE

POST /v1/inspect/* file, silence, clipping, noise, preflight, speakers, readiness $0.001–0.005/req

POST /v1/transform/* denoise, trim-edges, remove-silences, change-speed, normalize, transcode $0.003–0.01/min

POST /v1/structure/* Voice activity detection, speaker diarization $0.01–0.02/min

POST /v1/extract/transcribe Speech-to-text (OpenAI, ElevenLabs, Gemini) $0.05/min

POST /v1/workflows/* prepare-for-transcription, prepare-for-podcast, run-recommended varies

Full API reference available at api.soundhalo.com/v1/openapi.json