Composable Audio Infrastructure

THE AUDIO UTILITY API

FOR AI AGENTS

Inspect, transform, transcribe, and structure audio with composable HTTP endpoints. Pay per minute via x402 — no API keys, no accounts required.

Try the API Read Docs

inspect / preflight

$ curl -X POST \  api.soundhalo.com/v1/inspect/preflight \  -d '{"input":{"asset_ref":{...}}}' {  "file_info": { "duration_sec": 324.7 },  "issues": [    {"type":"background_noise","severity":"high"}  ],  "transcription_readiness": 42,  "recommended_pipelines":    ["denoise","normalize","transcode"]}

Built for agents.

Not audio editors.

The old waySoundHalo

Timeline UI editors

▶HTTP endpoints

Monthly subscriptions

▶Pay per minute

API keys & accounts

▶x402 — no account needed

Manual operations

▶Fully autonomous

Monolithic tools

▶Composable pipelines

Hidden costs

▶Quoted pricing upfront

Endpoints

Five composable APIs

POST /v1/inspect/*

Inspect

Preflight analysis, noise estimation, clipping detection, silence mapping, transcription readiness scoring, and pipeline recommendation.

Pre-flight check before processing

POST /v1/transform/*

Transform

Denoise, normalize loudness, trim edges, remove long silences, change speed, and transcode for ASR.

Clean noisy recordings for transcription

POST /v1/structure/*

Structure

Voice Activity Detection and speaker diarization. Know who spoke and when.

Segment multi-speaker meetings

POST /v1/extract/*

Extract

Transcribe with OpenAI Whisper, ElevenLabs Scribe, or Gemini Flash. Timestamps, speaker labels, multi-language.

Generate transcripts from voice notes

POST /v1/workflows/*

Workflows

Pre-built multi-step pipelines. Prepare for transcription, optimize for podcast, or run the recommended pipeline from a preflight inspection.

Inspect→

Transform→

Structure→

Extract

How it works

Four steps. Zero config.

Upload

POST your audio file to /v1/assets. Get back an asset_id and token. Supports WAV, MP3, FLAC, OGG, and more.

Inspect

Run /v1/inspect/preflight to get a quality report, detected issues, readiness score, and recommended pipeline with cost quote.

Pay & Run

x402 handles payment inline. Your agent pays per minute or per request depending on the operation. No accounts.

Get Results

Receive structured JSON: clean audio assets, transcripts, diarization segments, or full pipeline output.

Use Cases

Built for real workflows

Support call processing

Ingest recorded support calls, diarize speakers, transcribe, and pipe to your CRM — fully automated.

Voice note cleanup

Denoise and normalize field recordings or voice memos before archiving or transcription.

Podcast production

Remove background noise, normalize levels, and generate timestamped transcripts for show notes.

Interview transcription

Diarize speakers, transcribe with timestamps, and deliver structured interview data to your pipeline.

Quality pre-check

Inspect uploaded audio for issues before expensive processing. Reject or flag bad recordings early.

Pricing

Pay per minute. That's it.

Inspect

$0.001/ req

Quality analysis, noise estimation, clipping, silence, pipeline recommendation

Transform

$0.003/ min

Denoise, normalize, trim, speed change, transcode

Structure

$0.01/ min

Voice activity detection and speaker diarization

Transcribe

$0.05/ min

Speech-to-text with timestamps, multi-provider fallback

x402 protocol · no accounts required · cost quoted before execution

Edge NetworkGlobal low-latency

Auto-scalingHeavy processing

Secure StorageEncrypted at rest

99.98%Uptime SLA

OpenAPIFull spec

x402 PaymentsNo accounts needed

FAQ

SoundHalo is designed as HTTP-native infrastructure that AI agents can call autonomously. No UI, no manual steps — just clean endpoints that return structured JSON. Agents can inspect audio, decide on a processing pipeline, pay via x402, and get results without human intervention.

Absolutely. Any HTTP client works — cURL, Postman, your backend code. The API is designed for agents but built on standard HTTP. If you can make a POST request, you can use SoundHalo.

We use the x402 payment protocol. Your agent pays per request with stablecoins, inline with the HTTP call. No signup, no billing dashboard, no key management. Payment is the authentication.

x402 extends HTTP with a native payment layer. When your agent hits a paid endpoint, it receives a 402 response with a price quote. The agent pays and re-sends the request. It's like HTTP auth, but with money — purpose-built for autonomous agent commerce.

No. SoundHalo is infrastructure, not a creative tool. There's no timeline, no waveform view, no drag-and-drop. It's a set of composable HTTP endpoints for programmatic audio processing — inspect, transform, transcribe, and structure.