Composable Audio Infrastructure
Inspect, transform, transcribe, and structure audio with composable HTTP endpoints. Pay per minute via x402 — no API keys, no accounts required.
$ curl -X POST \ api.soundhalo.com/v1/inspect/preflight \ -d '{"input":{"asset_ref":{...}}}' { "file_info": { "duration_sec": 324.7 }, "issues": [ {"type":"background_noise","severity":"high"} ], "transcription_readiness": 42, "recommended_pipelines": ["denoise","normalize","transcode"]}Endpoints
POST /v1/inspect/*Preflight analysis, noise estimation, clipping detection, silence mapping, transcription readiness scoring, and pipeline recommendation.
Pre-flight check before processing
POST /v1/transform/*Denoise, normalize loudness, trim edges, remove long silences, change speed, and transcode for ASR.
Clean noisy recordings for transcription
POST /v1/structure/*Voice Activity Detection and speaker diarization. Know who spoke and when.
Segment multi-speaker meetings
POST /v1/extract/*Transcribe with OpenAI Whisper, ElevenLabs Scribe, or Gemini Flash. Timestamps, speaker labels, multi-language.
Generate transcripts from voice notes
POST /v1/workflows/*Pre-built multi-step pipelines. Prepare for transcription, optimize for podcast, or run the recommended pipeline from a preflight inspection.
How it works
POST your audio file to /v1/assets. Get back an asset_id and token. Supports WAV, MP3, FLAC, OGG, and more.
Run /v1/inspect/preflight to get a quality report, detected issues, readiness score, and recommended pipeline with cost quote.
x402 handles payment inline. Your agent pays per minute or per request depending on the operation. No accounts.
Receive structured JSON: clean audio assets, transcripts, diarization segments, or full pipeline output.
Use Cases
Ingest recorded support calls, diarize speakers, transcribe, and pipe to your CRM — fully automated.
Denoise and normalize field recordings or voice memos before archiving or transcription.
Remove background noise, normalize levels, and generate timestamped transcripts for show notes.
Diarize speakers, transcribe with timestamps, and deliver structured interview data to your pipeline.
Inspect uploaded audio for issues before expensive processing. Reject or flag bad recordings early.
Pricing
Quality analysis, noise estimation, clipping, silence, pipeline recommendation
Denoise, normalize, trim, speed change, transcode
Voice activity detection and speaker diarization
Speech-to-text with timestamps, multi-provider fallback
x402 protocol · no accounts required · cost quoted before execution
SoundHalo is designed as HTTP-native infrastructure that AI agents can call autonomously. No UI, no manual steps — just clean endpoints that return structured JSON. Agents can inspect audio, decide on a processing pipeline, pay via x402, and get results without human intervention.
Absolutely. Any HTTP client works — cURL, Postman, your backend code. The API is designed for agents but built on standard HTTP. If you can make a POST request, you can use SoundHalo.
We use the x402 payment protocol. Your agent pays per request with stablecoins, inline with the HTTP call. No signup, no billing dashboard, no key management. Payment is the authentication.
x402 extends HTTP with a native payment layer. When your agent hits a paid endpoint, it receives a 402 response with a price quote. The agent pays and re-sends the request. It's like HTTP auth, but with money — purpose-built for autonomous agent commerce.
No. SoundHalo is infrastructure, not a creative tool. There's no timeline, no waveform view, no drag-and-drop. It's a set of composable HTTP endpoints for programmatic audio processing — inspect, transform, transcribe, and structure.