# Vocametrix API Reference _Last updated: 2026-05-08_ Base URL: `https://platform.vocametrix.com` This document is the canonical AI-readable mirror of https://www.vocametrix.com/api-docs. It is generated at build time from the same source data the interactive page consumes. ## Authentication All endpoints require an API key passed in the `X-API-Key` header: ``` X-API-Key: your-api-key ``` Get a key from https://www.vocametrix.com/registration. See https://www.vocametrix.com/pricing for tier limits. ## Audio Upload Pattern Most analysis endpoints (Pronunciation, Speech-to-Text, etc.) follow a three-step pattern: 1. `POST /api/get-blob-url` — receive a temporary signed URL (`uploadURL`) and a reference URL (`blobURL`), valid 1 hour. 2. `PUT ` — upload the audio file directly to Azure Blob Storage with headers `x-ms-blob-type: BlockBlob` and `Content-Type: audio/wav`. 3. `POST /api/` — pass the `blobURL` plus per-endpoint parameters to run the analysis. AVQI uses an alternative pattern via `POST /api/assignFileId` returning a `fileId` consumed by `GET /api/calculate-avqi`. ## Table of Contents 1. Pronunciation Assessment API 2. Speech to Text API 3. Text to Speech API 4. Acoustic Voice Quality Index (AVQI) Calculator API 5. Dysphonia Severity Index (DSI) Calculator API 6. Cepstral Peak Prominence (CPP) Calculator API 7. Multi-band Harmonics-to-Noise Ratio (HNR) Calculator API 8. Advanced Spectral Measures Calculator API 9. Formant Statistics Analysis (F1-F4) Calculator API 10. S/Z Ratio Analysis Calculator API 11. Glottal-to-Noise Excitation Ratio (GNE) Calculator API 12. H1*-H2* Voice Source Analysis API 13. Voice Range Profile (VRP) Analysis 14. Jitter & Shimmer Perturbation Analysis API v.03 (Scientifically Validated) 15. Extended Acoustic Calculators (ABI, Voice Dynamics, Prosody Similarity) 16. Audio Measures (Sound Level, eGeMAPS) 17. Phoneme & Audio Classification (Phonemes-Live, Stuttering, Estonian Vowel) 18. Speech Coaching Analysis API 19. Therapy Plan Generator API (Python workflow) 20. AI Agents --- ## Pronunciation Assessment API **Group:** Core Speech Analysis Advanced speech analysis that provides comprehensive pronunciation scoring with multiple recognition fields and detailed scoring metrics. Supports 30+ language locales with detailed phoneme-level feedback. ### Endpoints #### POST https://platform.vocametrix.com/api/get-blob-url Get a secure URL for uploading audio files to Azure Blob Storage **Parameters:** _(none)_ **Response:** - `uploadURL` — Secure URL for uploading the audio file (1 hour validity) - `blobURL` — Secure URL for accessing the uploaded file (1 hour validity) #### POST https://platform.vocametrix.com/api/pronunciation-assessment Perform pronunciation assessment on uploaded audio **Parameters:** - `blobURL` — The secure URL received from get-blob-url endpoint - `referenceText` — The text to compare against the audio - `locale` — The language locale (e.g., en-US, fr-FR) **Response:** - `accuracyScore` — Phoneme-level pronunciation accuracy score (0-100) - `fluencyScore` — Natural speech flow and timing score (0-100) - `completenessScore` — Coverage of reference text score (0-100) - `prosodyScore` — Speech qualities assessment score (0-100) - `pronScore` — Overall weighted pronunciation score - `error` — Error details with specific rejection reasons if applicable #### POST https://platform.vocametrix.com/api/pronunciation-assessment-with-pitch Pronunciation assessment enriched with per-word pitch contours. Same audio upload + Azure pronunciation evaluation as the standard endpoint, plus a parselmouth pass that samples F0 over each word's timespan. Audio files longer than 30 s are processed via a continuous chunked path. The blob is auto-deleted from Azure after the response, and audio usage is recorded against the API key. **Parameters:** - `blobURL` — The secure URL received from /api/get-blob-url - `referenceText` — The text to compare against the audio - `locale` — The language locale (e.g., en-US, fr-FR) **Response:** - `RecognitionStatus` — Azure recognition status (e.g., "Success") - `Offset` — Recognition start offset in 100-nanosecond units - `Duration` — Recognition duration in 100-nanosecond units - `DisplayText` — The recognized text with proper formatting - `NBest` — Array of N-best recognition hypotheses. Each NBest[i] contains the standard Azure pronunciation fields (Confidence, Lexical, ITN, MaskedITN, Display, PronunciationAssessment, Words[]) — and each NBest[i].Words[j] is augmented with an additional `Pitch` array containing F0 samples (Hz) measured over the word's timespan via parselmouth. Words[j] also carry the standard Azure fields: Word, Offset, Duration, PronunciationAssessment {AccuracyScore, ErrorType}, Phonemes[], Syllables[]. ### Capabilities **Text Recognition Fields:** - Raw recognized words - Inverse-text-normalized form with number/abbreviation transformations - ITN form with optional profanity masking - Enhanced text with punctuation and capitalization **Comprehensive Scoring:** - AccuracyScore: Phoneme-level pronunciation accuracy (0-100) - FluencyScore: Natural use of silent breaks and timing (0-100) - CompletenessScore: Pronunciation coverage of reference text (0-100) - ProsodyScore: Speech qualities - stress, intonation, rhythm (0-100) - PronScore: Weighted combination overall score **Language Support:** - 30+ locales across multiple languages - Region-specific accent recognition - Detailed phoneme-level analysis (en-US only) - Language-specific pronunciation rules **Error Handling & Logging:** - Comprehensive error tracking and reporting - Detailed API usage logs with duration tracking - Automatic file cleanup after processing - Built-in validation for file duration and API limits ### Example (cURL) ```bash # Step 1: Get upload URL curl -X POST "https://platform.vocametrix.com/api/get-blob-url" \ -H "X-API-Key: your-api-key" # Step 2: Upload audio file curl -X PUT "" \ -H "x-ms-blob-type: BlockBlob" \ -H "Content-Type: audio/wav" \ --data-binary "@./speech.wav" # Step 3: Get assessment curl -X POST "https://platform.vocametrix.com/api/pronunciation-assessment" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "blobURL": "received-blob-url", "referenceText": "Hello world", "locale": "en-US" }' ``` --- ## Speech to Text API **Group:** Core Speech Analysis High-accuracy asynchronous speech recognition with detailed word-level confidence scores and timestamps. Supports batch processing of large audio files and 30+ language locales. ### Endpoints #### POST https://platform.vocametrix.com/api/get-blob-url Get a secure URL for uploading audio files to Azure Blob Storage **Parameters:** _(none)_ **Response:** - `uploadURL` — Secure URL for uploading the audio file (1 hour validity) - `blobURL` — Secure URL for accessing the uploaded file (1 hour validity) #### POST https://platform.vocametrix.com/api/offline-speech-to-text Submit an uploaded audio file for asynchronous transcription. Returns immediately with a transcription job ID — the actual transcript is delivered later via Server-Sent Events on /api/transcription-progress/:transcriptionId. **Parameters:** - `blobURL` — The secure URL received from get-blob-url endpoint - `locale` — The language locale (e.g., en-US, fr-FR) **Response:** - `success` — Boolean indicating the job was accepted - `transcriptionId` — Job identifier — pass this to /api/transcription-progress/:transcriptionId to subscribe to progress and final result - `message` — Human-readable status message confirming the job was queued #### GET https://platform.vocametrix.com/api/transcription-progress/:transcriptionId Server-Sent Events (SSE) stream delivering transcription progress events and the final transcript. The terminal event has status "Succeeded" (note the capitalization) with a contentUrl pointing to the JSON transcript, or status "failed" with an error message. IMPORTANT: this endpoint does NOT accept the X-API-Key header — the browser EventSource API cannot send custom headers, so authentication is via query string: pass ?apiKey=YOUR_KEY. Sending an X-API-Key header alone returns 401. PROCESSING TIME: server-side transcription runs at ≈0.3–0.65× realtime (a 60-minute audio file takes 20–40 minutes of server processing; a 4-hour file takes 75–155 minutes). Use this rule of thumb for client/proxy timeouts: timeout_minutes = audio_duration_minutes × 0.7 + 30 min buffer (so a 1 h file ≥ 70 min, a 4 h file ≥ 220 min). Reverse proxies (nginx/App Service/CloudFront) typically have 60–300 s idle timeouts by default — raise yours to ≥ 4 hours for hour-long content. AUDIO LIMITS: Azure Speech (the underlying engine) caps each transcription at ~4 hours per file. For longer recordings, split the audio before upload. **Parameters:** - `transcriptionId` — Path parameter — job identifier returned by /api/offline-speech-to-text. - `apiKey` — REQUIRED query string parameter (e.g. ?apiKey=YOUR_KEY). The X-API-Key header is ignored on this endpoint due to EventSource limitations. **Response:** - `status` — Job state. Values are mixed-case and not normalized — handle case-insensitively. Observed in production: "polling", "Running", "Succeeded" (terminal success — note the capitalization), "failed" (terminal failure). Anything other than Succeeded/failed (case-insensitive) means in-progress; keep listening. - `progress` — Optional integer 0–100 indicating processing progress (not always emitted). - `contentUrl` — Present only on the terminal "Succeeded" event — URL to fetch the JSON transcript (word-level timestamps and confidence). - `error` — Present only on the terminal "failed" event — error message describing why processing aborted. ### Capabilities **Text Recognition Fields:** - Raw recognized words with confidence scores - Inverse-text-normalized form with number/abbreviation transformations - Cleaned display text with proper capitalization and punctuation - Detailed word-level timestamp information **Analysis Capabilities:** - Word-level confidence scoring (0-100) - Precise word timing with millisecond accuracy - High accuracy across multiple accents - Noise-resistant recognition algorithms **Language Support:** - 30+ locales across multiple languages - Region-specific dialect recognition - Support for mixed-language content ### Example (cURL) ```bash # Step 1: Get upload URL curl -X POST "https://platform.vocametrix.com/api/get-blob-url" \ -H "X-API-Key: your-api-key" # Response: {"uploadURL":"","blobURL":""} # Step 2: Upload audio file directly to Azure Blob Storage curl -X PUT "" \ -H "x-ms-blob-type: BlockBlob" \ -H "Content-Type: audio/wav" \ --data-binary "@./speech.wav" # Step 3: Submit transcription job (returns immediately with a transcriptionId) curl -X POST "https://platform.vocametrix.com/api/offline-speech-to-text" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "blobURL": "", "locale": "en-US" }' # Response: {"success":true,"transcriptionId":"abc123","message":"Transcription queued"} # Step 4: Subscribe to Server-Sent Events for progress + final result. # IMPORTANT: this endpoint authenticates via QUERY STRING (?apiKey=...). # The X-API-Key header is ignored here due to browser EventSource limitations. # Status values are case-mixed in practice ("polling", "Running", "Succeeded", "failed"). # Terminal SUCCESS = "Succeeded" (NOT "completed"). Compare case-insensitively. # Processing rate: ≈0.3-0.65× realtime. Rule: timeout = audio_minutes × 0.7 + 30 min buffer. # 1 h audio → keep open ≥ 70 min (--max-time 4200) # 4 h audio → keep open ≥ 220 min (--max-time 13200) # Azure caps each file at ~4 h — split longer recordings before upload. curl -N --max-time 13200 -X GET \ "https://platform.vocametrix.com/api/transcription-progress/abc123?apiKey=your-api-key" \ -H "Accept: text/event-stream" # Step 5: Once a "Succeeded" event arrives, fetch the JSON transcript from contentUrl curl -X GET "" -H "X-API-Key: your-api-key" ``` --- ## Text to Speech API **Group:** Core Speech Analysis High-quality neural voice synthesis with customizable voice styles, speaking rates, and natural prosody. Supports 30+ language locales with premium neural voices. ### Endpoints #### POST https://platform.vocametrix.com/api/text-to-speech Convert text to high-quality speech using neural voice synthesis **Parameters:** - `text` — The text to convert to speech (max 1000 characters) - `voice` — Voice name (optional, default: en-US-AriaNeural) - `style` — Speaking style (optional, default: friendly) - `rate` — Speaking rate (optional, default: 1.0) - `language` — Language locale (optional, default: en-US) **Response:** - `success` — Boolean indicating if synthesis was successful - `audio` — Base64 encoded audio data - `format` — Audio format (wav) - `voice` — Voice used for synthesis - `style` — Speaking style applied - `textLength` — Length of input text processed #### POST https://platform.vocametrix.com/api/text-to-speech/generate-with-timing Synthesize speech via ElevenLabs (model: eleven_multilingual_v2) and return per-character timing alongside the audio. Ideal for karaoke-style highlighting, lip-sync, or precise word-by-word playback control. Voice and provider key are server-side. **Parameters:** - `text` — Text to synthesize (1–2500 characters). REQUIRED. - `isSSML` — Boolean (optional, default false). Currently accepted but not applied to the request body — flag is reserved for future SSML support. **Response:** - `success` — Boolean — true on a successful synthesis - `audio_base64` — Base64-encoded MP3 audio (ElevenLabs default format) - `alignment` — Object with arrays of equal length: `{ characters: string[], character_start_times_seconds: number[], character_end_times_seconds: number[] }`. Each i-th entry gives the start/end time in seconds of the i-th character of the synthesized audio. - `normalized_alignment` — Same shape as `alignment`, but computed against the post-text-normalization sequence (numbers expanded, abbreviations expanded, etc.) — use this when your highlighter must follow what was actually pronounced, not the literal input text. ### Capabilities **Voice Options:** - Premium neural voices with natural intonation - Multiple voice styles (friendly, cheerful, sad, angry, etc.) - Adjustable speaking rate and pitch control - Gender and age-specific voice selection **Audio Output:** - High-quality 24kHz 16-bit mono PCM format - Base64 encoded audio for easy integration - Instant audio generation and delivery - Crystal clear speech with natural prosody **Language Support:** - 30+ locales across multiple languages - Native pronunciation for each locale - Region-specific accent variations ### Example (cURL) ```bash # Convert text to speech curl -X POST "https://platform.vocametrix.com/api/text-to-speech" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello, this is a test of the text-to-speech API.", "voice": "en-US-AriaNeural", "style": "friendly", "rate": "1.0", "language": "en-US" }' ``` --- ## Acoustic Voice Quality Index (AVQI) Calculator API **Group:** Voice Quality Metrics AVQI results in a comprehensive score between 0 (normal voice) and 10 (severely dysphonic voice), with pathological thresholds validated for multiple languages. The AVQI combines sustained vowel /a/ and continuous speech recordings using six acoustic parameters to provide clinically validated voice quality assessment for speech pathology and voice disorder evaluation. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-avqi Calculate AVQI metrics from connected speech and sustained vowel audio recordings **Parameters:** - `csFileId` — ID of the connected speech audio file (previously uploaded) - REQUIRED - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `version` — AVQI algorithm version: "v02.03" or "v03.01" - REQUIRED **Response:** - `AVQI` — Acoustic Voice Quality Index - final composite score (0-10 scale, dimensionless). Values above language-specific threshold indicate potential voice pathology - `CPPS` — Smoothed Cepstral Peak Prominence measuring harmonic structure (dB). Higher values indicate better periodicity and voice quality - `HNR` — Mean Harmonics-to-Noise Ratio across concatenated CS+SV signal (dB). Higher values indicate less noise relative to harmonic content - `Shimmer` — Period-to-period amplitude variation expressed as percentage (%). Lower values indicate more stable amplitude control - `Shimmer_dB` — Period-to-period amplitude variation in logarithmic scale (dB). Lower values indicate better amplitude stability - `LTAS_slope` — Spectral slope between 0-1kHz and 1-10kHz frequency ranges (dB/octave). More negative values indicate steeper spectral roll-off - `LTAS_tilt` — Spectral tilt of trend line through long-term average spectrum 1-10kHz (dB/octave). Indicates overall spectral balance and voice quality #### POST https://platform.vocametrix.com/api/assignFileId Upload audio files for AVQI analysis **Parameters:** - `audio` — file - Audio file for analysis (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-avqi endpoint ### Capabilities **Comprehensive Voice Quality Metrics:** - AVQI Score: Final composite index (0-10 scale) - higher values indicate more severe dysphonia - CPPS: Smoothed Cepstral Peak Prominence measuring harmonic structure (dB) - HNR: Harmonics-to-Noise Ratio across concatenated signal (dB) - Shimmer Local: Period-to-period amplitude variation (%) - Shimmer Local dB: Amplitude variation in logarithmic scale (dB) - LTAS Slope: Spectral slope between 0-1kHz and 1-10kHz (dB/octave) - LTAS Tilt: Spectral tilt trend line from 1-10kHz (dB/octave) **Validated Algorithm Versions & Thresholds:** - v02.03 (Maryn et al. 2010): English (3.29), Dutch (2.95), German (2.70), Turkish (2.89) - v03.01 (Barsties & Maryn 2015): French (2.33), Dutch (2.43), Spanish (2.28), German (1.85), Italian (2.35) - Additional validated languages: Japanese, Korean, Portuguese, Lithuanian, Persian, Finnish, Malayalam, Kannada - Language-specific pathological thresholds established through clinical validation studies **Standardized Recording Protocol:** - Connected Speech (CS): Phonetically balanced 3-second passages in target language - English: "When the sunlight strikes raindrops in the air, they act like a prism and form a rainbow" - Sustained Vowel (SV): /a/ vowel sustained for minimum 5 seconds at comfortable pitch/loudness - Automatic processing: High-pass filtering (34Hz), voiced segment detection, 3-second stable portion extraction - Clinical-grade recording quality essential for reliable acoustic analysis **Clinical Applications & Validation:** - Voice quality assessment and pathology screening in clinical settings - Pre/post voice therapy treatment outcome monitoring - Diagnostic support for voice disorders with validated sensitivity/specificity - Research applications in voice science with standardized methodology - Multi-language clinical validation for international voice assessment ### Example (cURL) ```bash # Step 1: Upload connected speech file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./connected_speech.wav" \ -F "email=user@example.com" # Response: {"fileId":"cs12345"} # Step 2: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv67890"} # Step 3: Calculate AVQI with clinical interpretation curl -X GET "https://platform.vocametrix.com/api/calculate-avqi?csFileId=cs12345&svFileId=sv67890&version=v03.01" \ -H "X-API-Key: your-api-key" # Response: { # "AVQI": 4.32, # "CPPS": 10.78, # "HNR": 15.43, # "Shimmer": 6.72, # "Shimmer_dB": 0.59, # "LTAS_slope": -25.36, # "LTAS_tilt": -10.84 # } ``` --- ## Dysphonia Severity Index (DSI) Calculator API **Group:** Voice Quality Metrics Calculate the Dysphonia Severity Index (DSI), a multivariate measure combining four weighted acoustic measures: Maximum Phonation Time (MPT), highest fundamental frequency (F₀-High), softest intensity (I_low), and jitter PPQ5. This comprehensive index provides objective assessment of voice quality with strong correlation to perceptual dysphonia ratings. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-dsi Calculate DSI from a previously uploaded sustained vowel (SV) recording and required voice-range parameters. All parameters are passed as query string parameters (GET). **Parameters:** - `svFileId` — string - ID of the sustained vowel audio file (returned by /api/assignFileId) - REQUIRED - `mpt` — number - Maximum Phonation Time in seconds (MPT). Must be a positive number > 0 - REQUIRED - `maximumF0` — number - Highest fundamental frequency in Hz (F0-High). Must be a positive number > 0 - REQUIRED - `minimumIntensity` — number - Softest intensity level in dB SPL (I_low). Numeric value - REQUIRED - `age` — integer - Patient age in years (1-120). Used for clinical interpretation - REQUIRED - `gender` — integer - Patient gender code: 1=Male, 2=Female, 3=Other/Unknown. Used for F0 validation/context - REQUIRED - `version` — string - Optional DSI algorithm version (example: "v02.01") - OPTIONAL **Response:** - `DSI` — number - Dysphonia Severity Index score (approx range -5 to +5). Example: 1.85 - `JITTER` — number - Period Perturbation Quotient (PPQ5) as a percentage or fraction (parsed as number). Example: 0.45 - `MPT` — number - Maximum Phonation Time in seconds (echoes input MPT or measured). Example: 15.2 - `MINLEVEL` — number - Minimum intensity level in dB SPL. Example: 45.5 - `MAXF0` — number - Maximum fundamental frequency in Hz. Example: 440.0 - `PATIENT_AGE` — number - Patient age (integer). Example: 35 - `PATIENT_GENDER` — string - Human-readable gender label (e.g. "Female") - `AGE_GROUP` — string - Age group label used for interpretation (e.g. "Young adult / Middle-aged") - `PRESBYLARYNGIS_RISK` — number - Age-related voice change risk flag (0 or 1). Example: 0 - `EXPECTED_F0_RANGE` — string - Gender-specific expected F0 range. Example: "150-350 Hz" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for DSI analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-dsi endpoint ### Capabilities **DSI Calculation:** - DSI Score ranging from +5 (excellent voice) to -5 (severely dysphonic) - Automatic presbylaryngis risk assessment for patients ≥65 years - Gender-specific F0 validation (Male: 80-250 Hz, Female: 150-350 Hz) - Age-adjusted clinical interpretation **Required Parameters:** - Maximum Phonation Time (MPT) - longest sustainable /a/ phonation - Maximum F0 (F₀-High) - highest achievable fundamental frequency - Minimum Intensity (I_low) - softest intensity level in dB SPL - Patient age and gender for clinical context **Algorithm Implementation:** - Formula: DSI = 1.127 + 0.164×MPT - 0.038×I_low + 0.0053×F₀-High - 5.30×Jitter(PPQ5) - Jitter PPQ5 calculated from most stable portion of sustained /a/ - Cross-correlation method for precise fundamental frequency detection - Final 3 seconds extracted for optimal voice stability analysis **Clinical Applications:** - Objective dysphonia severity assessment - Voice therapy outcome monitoring - Research applications in voice science - Clinical correlation with perceptual voice quality ratings ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate DSI with voice range profile parameters curl -X GET "https://platform.vocametrix.com/api/calculate-dsi?svFileId=sv12345&mpt=15.2&maximumF0=440&minimumIntensity=45.5&age=35&gender=2&version=v02.01" \ -H "X-API-Key: your-api-key" # Response: { # "DSI": 2.85, # "JITTER": 0.45, # "MPT": 15.2, # "MINLEVEL": 45.5, # "MAXF0": 440.0, # "PATIENT_AGE": 35, # "PATIENT_GENDER": "Female", # "AGE_GROUP": "Young adult / Middle-aged", # "PRESBYLARYNGIS_RISK": 0, # "EXPECTED_F0_RANGE": "150-350 Hz" # } ``` --- ## Cepstral Peak Prominence (CPP) Calculator API **Group:** Voice Quality Metrics Calculate Cepstral Peak Prominence (CPP), a robust measure of voice quality that quantifies the relative prominence of the cepstral peak. CPP correlates strongly with perceived voice quality and is particularly valuable for assessing dysphonia severity across various voice disorders. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-cpp Calculate CPP and complementary voice measures from sustained vowel recording **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `version` — CPP algorithm version (default: "v01") - OPTIONAL **Response:** - `CPP` — Cepstral Peak Prominence value in dB. Example: 12.45 - `VOICE_QUALITY` — Clinical voice quality assessment. Example: "Good voice quality" - `SEVERITY` — Dysphonia severity classification. Example: "Normal" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for CPP analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-cpp endpoint ### Capabilities **Voice Quality Assessment:** - CPP value in dB (higher values indicate better voice quality) - Complementary F0 statistics with pitch tracking validation - Harmonics-to-Noise Ratio (HNR) for comprehensive analysis - Jitter measurements for periodicity assessment **Algorithm Implementation:** - High-pass filtering (34 Hz cutoff) for noise reduction - Middle 3-second extraction for optimal spectral stability - PowerCepstrogram analysis with standard Praat parameters - Peak search range 60-330 Hz with parabolic interpolation **Clinical Thresholds:** - Normal voice: CPP ≥ 20 dB - Mild dysphonia: CPP 15-20 dB - Moderate dysphonia: CPP 10-15 dB - Severe dysphonia: CPP < 10 dB **Recording Requirements:** - Sustained vowel /a/ for at least 3 seconds (preferably 5 seconds) - Comfortable pitch and loudness level - Minimum 1 second duration (recordings <1s rejected) - High-quality recording with minimal background noise ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate CPP curl -X GET "https://platform.vocametrix.com/api/calculate-cpp?svFileId=sv12345" \ -H "X-API-Key: your-api-key" # Response: { # "CPP": 18.45, # "VOICE_QUALITY": "Mildly dysphonic", # "SEVERITY": "Mild" # } ``` --- ## Multi-band Harmonics-to-Noise Ratio (HNR) Calculator API **Group:** Voice Quality Metrics Analyze voice quality using multi-band HNR analysis that provides frequency-specific assessment across different spectral regions with age and gender-specific adjustments. This analysis reveals how noise components are distributed across the frequency spectrum, offering insights into specific laryngeal pathophysiology. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-hnr-multiband Calculate multi-band HNR with age and gender-specific clinical interpretation **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `age` — Patient age in years (integer) - REQUIRED for age-specific analysis - `gender` — Patient gender: 1=Male, 2=Female, 3=Other/Unknown (integer) - REQUIRED - `version` — HNR algorithm version (default: "v01") - OPTIONAL **Response:** - `HNR_FULL` — Full-spectrum HNR (80-8000 Hz) in dB. Example: 15.4 - `HNR_LOW` — Low-frequency HNR (80-500 Hz) in dB. Example: 18.2 - `HNR_MID` — Mid-frequency HNR (500-1500 Hz) in dB. Example: 16.8 - `HNR_HIGH` — High-frequency HNR (1500-3500 Hz) in dB. Example: 12.3 - `MEAN_F0` — Mean fundamental frequency in Hz. Example: 195.4 - `F0_STD` — Standard deviation of F0. Example: 8.5 - `GENDER` — Gender classification result. Example: "Female" - `NOISE_RATIO_LOW` — Low-frequency noise percentage. Example: 12.5 - `NOISE_RATIO_MID` — Mid-frequency noise percentage. Example: 18.2 - `NOISE_RATIO_HIGH` — High-frequency noise percentage. Example: 25.7 - `HNR_SLOPE` — Spectral tilt across frequency bands. Example: -0.68 - `OVERALL_QUALITY` — Clinical voice quality assessment. Example: "Good voice quality" - `SEVERITY` — Voice quality severity classification. Example: "Normal" - `NOISE_PATTERN` — Dominant noise pattern classification. Example: "Low-frequency emphasis" - `BREATHINESS` — Breathiness assessment. Example: "Minimal breathiness" - `PATIENT_AGE` — Patient age used in analysis. Example: 35 - `PATIENT_GENDER` — Patient gender classification. Example: "Female" - `AGE_GROUP` — Age group classification. Example: "Young Adult" - `AGE_NOTE` — Age-related clinical note. Example: "Age within normal vocal range" - `F0_NOTE` — F0 validation note. Example: "F0 within expected range for gender/age" - `GENDER_NOTE` — Gender-specific note. Example: "Female vocal characteristics confirmed" - `EXPECTED_F0_MIN` — Expected minimum F0 for age/gender. Example: 165.4 - `EXPECTED_F0_MAX` — Expected maximum F0 for age/gender. Example: 294.3 - `HNR_ADJUSTMENT` — Age-based threshold adjustment applied. Example: 0.8 - `ADJUSTED_HNR_FULL` — Age-adjusted full-spectrum HNR. Example: 16.2 #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for HNR analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-hnr endpoint ### Capabilities **Multi-band Analysis:** - Full-spectrum HNR (80-8000 Hz) for overall harmonicity assessment - Low-frequency HNR (80-500 Hz) for subglottal turbulence detection - Mid-frequency HNR (500-1500 Hz) for glottal closure assessment - High-frequency HNR (1500-3500 Hz) for supraglottal turbulence **Age-Specific Adjustments:** - Young adults (18-44): Standard thresholds (0 dB adjustment) - Middle-aged (45-64): -0.5 dB adjustment for early aging - Elderly (≥65): -1.0 dB adjustment for presbylaryngis - Gender-specific F0 validation and expected ranges **Clinical Assessment:** - Noise pattern analysis (Low/Mid/High-frequency dominant) - Breathiness assessment with age-adjusted thresholds - Overall voice quality classification - HNR slope analysis for spectral tilt assessment **Recording Requirements:** - Sustained vowel /a/ for at least 3 seconds - Patient age and gender required for accurate interpretation - High-quality recording with minimal background noise - Comfortable pitch and loudness level ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate multi-band HNR with age/gender curl -X GET "https://platform.vocametrix.com/api/calculate-hnr-multiband?svFileId=sv12345&age=45&gender=2" \ -H "X-API-Key: your-api-key" # Response: { # "HNR_FULL": 15.4, # "HNR_LOW": 18.2, # "HNR_MID": 16.8, # "HNR_HIGH": 12.3, # "MEAN_F0": 195.4, # "F0_STD": 8.5, # "GENDER": "Female", # "NOISE_RATIO_LOW": 12.5, # "NOISE_RATIO_MID": 18.2, # "NOISE_RATIO_HIGH": 25.7, # "HNR_SLOPE": -0.68, # "OVERALL_QUALITY": "Good voice quality", # "SEVERITY": "Normal", # "NOISE_PATTERN": "Low-frequency emphasis", # "BREATHINESS": "Minimal breathiness", # "PATIENT_AGE": 45, # "PATIENT_GENDER": "Female", # "AGE_GROUP": "Middle-aged Adult", # "AGE_NOTE": "Age within normal vocal range", # "F0_NOTE": "F0 within expected range for gender/age", # "GENDER_NOTE": "Female vocal characteristics confirmed", # "EXPECTED_F0_MIN": 165.4, # "EXPECTED_F0_MAX": 294.3, # "HNR_ADJUSTMENT": 0.8, # "ADJUSTED_HNR_FULL": 16.2 # } ``` --- ## Advanced Spectral Measures Calculator API **Group:** Advanced Voice Analysis Comprehensive spectral analysis providing advanced characterization of voice timbre and resonance beyond basic LTAS measures. This analysis calculates spectral moments, clinical spectral indices, and source-filter interaction measures with age and gender-specific adjustments for accurate clinical interpretation. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-spectral-advanced Calculate advanced spectral measures with age and gender-specific clinical interpretation **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `age` — Patient age in years (integer) - REQUIRED for age-specific adjustments - `gender` — Patient gender: 1=Male, 2=Female, 3=Other/Unknown (integer) - REQUIRED - `version` — Spectral analysis algorithm version (default: "v01") - OPTIONAL **Response:** - `SPECTRAL_MEAN` — Weighted average frequency (center of gravity) in Hz. Example: 1847.3 - `SPECTRAL_SD` — Standard deviation of spectral distribution in Hz. Example: 623.8 - `SPECTRAL_SKEWNESS` — Third moment indicating spectral asymmetry. Example: 0.85 - `SPECTRAL_KURTOSIS` — Fourth moment indicating spectral peakedness. Example: 2.34 - `ALPHA_RATIO` — Energy ratio below/above 1 kHz in dB. Example: 4.2 - `L1_L0` — First formant to fundamental energy difference in dB. Example: -8.5 - `H1_H2` — First to second harmonic amplitude difference in dB. Example: 3.8 - `H1_A1` — First harmonic to first formant amplitude difference in dB. Example: -5.2 - `H1_A3` — First harmonic to third formant amplitude difference in dB. Example: -15.7 - `SPECTRAL_FLUX` — Rate of spectral change in Hz. Example: 234.6 - `LTAS_SLOPE` — Spectral slope 0-1 kHz vs 1-4 kHz in dB/octave. Example: -12.4 - `LTAS_TILT` — Trend line slope across 1-4 kHz in dB/octave. Example: -8.9 - `MEAN_F0` — Mean fundamental frequency in Hz. Example: 195.8 - `F1_MEAN` — Mean first formant frequency in Hz. Example: 685.4 - `F2_MEAN` — Mean second formant frequency in Hz. Example: 1247.8 - `F3_MEAN` — Mean third formant frequency in Hz. Example: 2856.1 - `GENDER` — Gender classification result. Example: "Female" - `VOICE_PATTERN` — Clinical classification. Example: "Balanced voice pattern" - `BREATHINESS_LEVEL` — Breathiness assessment. Example: "Minimal breathiness" - `VOICE_STABILITY` — Stability classification. Example: "Good stability" - `SPECTRAL_HEALTH` — Overall spectral assessment. Example: "Healthy spectral characteristics" - `PATIENT_AGE` — Patient age used in analysis. Example: 35 - `PATIENT_GENDER` — Patient gender classification. Example: "Female" - `AGE_GROUP` — Age group classification. Example: "Young Adult" - `AGE_NOTE` — Age-related clinical note. Example: "Age within normal vocal range" - `F0_NOTE` — F0 validation note. Example: "F0 within expected range" - `F0_VALIDITY` — F0 validation status. Example: "Valid" - `GENDER_SPECTRAL_NOTE` — Gender-specific spectral note. Example: "Female spectral characteristics confirmed" - `EXPECTED_F0_MIN` — Expected minimum F0 for age/gender. Example: 165.4 - `EXPECTED_F0_MAX` — Expected maximum F0 for age/gender. Example: 294.3 - `SPECTRAL_MEAN_EXPECTED` — Expected spectral mean for demographics. Example: 1923.5 #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for spectral analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-spectral endpoint ### Capabilities **Spectral Moments Analysis:** - Spectral Mean: Weighted average frequency (spectral center of gravity) - Spectral Standard Deviation: Frequency spread indicating bandwidth - Spectral Skewness: Tilt direction (positive = low-frequency emphasis) - Spectral Kurtosis: Peakedness indicating voice focus/dispersion **Clinical Spectral Indices:** - Alpha Ratio: Energy below/above 1 kHz (breathiness indicator) - L1-L0 Difference: F1 to F0 energy (vocal efficiency measure) - H1-H2, H1-A1, H1-A3: Source-filter interaction measures - Spectral Flux: Voice instability through spectral change rate **Voice Pattern Classification:** - Hyperfunctional vs Hypofunctional voice differentiation - Breathiness level assessment (No significant/Mild/Significant) - Voice stability classification (Stable/Unstable-rough/Overly focused) - Overall spectral health assessment with composite scoring **Age-Gender Adjustments:** - Young adults (<45): Standard thresholds - Middle-aged (45-64): -50 Hz spectral mean adjustment - Elderly (≥65): -100 Hz spectral mean adjustment - Gender-specific formant analysis with validation ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate advanced spectral measures with demographics curl -X GET "https://platform.vocametrix.com/api/calculate-spectral?svFileId=sv12345&age=35&gender=2" \ -H "X-API-Key: your-api-key" # Response: { # "SPECTRAL_MEAN": 1850.2, # "SPECTRAL_SD": 1245.8, # "SPECTRAL_SKEWNESS": 0.65, # "SPECTRAL_KURTOSIS": 2.85, # "ALPHA_RATIO": 2.3, # "L1_L0": 15.4, # "H1_H2": 3.8, # "H1_A1": -8.2, # "H1_A3": -15.6, # "SPECTRAL_FLUX": 285.4, # "LTAS_SLOPE": -12.5, # "LTAS_TILT": -8.9, # "VOICE_PATTERN": "Balanced", # "BREATHINESS_LEVEL": "No significant breathiness", # "VOICE_STABILITY": "Stable", # "SPECTRAL_HEALTH": "Normal" # } ``` --- ## Formant Statistics Analysis (F1-F4) Calculator API **Group:** Advanced Voice Analysis Comprehensive formant analysis providing detailed assessment of articulatory precision, tongue/jaw positioning, and vocal tract resonance characteristics. This analysis calculates statistical measures for each formant frequency (F1-F4) with gender-specific validation and clinical interpretation for detecting dysarthria, hypernasality, and articulatory disorders. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-formant-statistics Calculate comprehensive formant statistics with gender-specific validation and clinical interpretation **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `age` — Patient age in years (integer) - REQUIRED for clinical context - `gender` — Patient gender: 1=Male, 2=Female, 3=Other/Unknown (integer) - REQUIRED - `version` — Formant analysis algorithm version (default: "v01") - OPTIONAL **Response:** - `F1_MEAN` — Mean first formant frequency in Hz. Example: 685.4 - `F1_STD` — F1 standard deviation in Hz. Example: 45.2 - `F1_CV` — F1 coefficient of variation in %. Example: 6.6 - `F1_MEDIAN` — F1 median frequency in Hz. Example: 682.1 - `F1_IQR` — F1 interquartile range in Hz. Example: 38.7 - `F1_RANGE` — F1 frequency range in Hz. Example: 142.5 - `F2_MEAN` — Mean second formant frequency in Hz. Example: 1247.8 - `F2_STD` — F2 standard deviation in Hz. Example: 78.3 - `F2_CV` — F2 coefficient of variation in %. Example: 6.3 - `F2_MEDIAN` — F2 median frequency in Hz. Example: 1245.6 - `F2_IQR` — F2 interquartile range in Hz. Example: 65.1 - `F2_RANGE` — F2 frequency range in Hz. Example: 234.8 - `F3_MEAN` — Mean third formant frequency in Hz. Example: 2856.1 - `F3_STD` — F3 standard deviation in Hz. Example: 112.4 - `F3_CV` — F3 coefficient of variation in %. Example: 3.9 - `F3_MEDIAN` — F3 median frequency in Hz. Example: 2851.7 - `F3_IQR` — F3 interquartile range in Hz. Example: 89.3 - `F3_RANGE` — F3 frequency range in Hz. Example: 387.2 - `F4_MEAN` — Mean fourth formant frequency in Hz. Example: 3847.5 - `F4_STD` — F4 standard deviation in Hz. Example: 145.7 - `F4_CV` — F4 coefficient of variation in %. Example: 3.8 - `F4_MEDIAN` — F4 median frequency in Hz. Example: 3842.9 - `F4_IQR` — F4 interquartile range in Hz. Example: 123.8 - `F4_RANGE` — F4 frequency range in Hz. Example: 512.6 - `F2_F1_DIFF` — F2-F1 frequency difference in Hz. Example: 562.4 - `F3_F2_DIFF` — F3-F2 frequency difference in Hz. Example: 1608.3 - `F4_F3_DIFF` — F4-F3 frequency difference in Hz. Example: 991.4 - `VOWEL_SPACE_DISTANCE` — Distance from neutral vowel in Hz. Example: 234.6 - `FORMANT_STABILITY_INDEX` — Articulatory consistency (0-100). Example: 87.3 - `ARTICULATORY_PRECISION` — Clinical precision level. Example: "Good articulatory precision" - `PRECISION_LEVEL` — Severity classification. Example: "Normal precision" - `VOWEL_QUALITY` — Gender-specific assessment. Example: "Normal female vowel production" - `HYPERNASALITY_RISK` — Hypernasality screening. Example: "Low risk" - `GENDER` — Gender classification. Example: "Female" - `VOICE_PATTERN` — Clinical pattern assessment. Example: "Normal formant pattern" - `PATIENT_AGE` — Patient age in years. Example: 35 - `PATIENT_GENDER` — Patient gender classification. Example: "Female" - `AGE_GROUP` — Age-based grouping. Example: "Young Adult" - `AGE_NOTE` — Age-related clinical note. Example: "Age within normal vocal range" - `GENDER_FORMANT_NOTE` — Gender-specific formant note. Example: "Formant frequencies consistent with female anatomy" - `EXPECTED_F1_RANGE` — Expected F1 range for demographics. Example: "650-720 Hz" - `EXPECTED_F2_RANGE` — Expected F2 range for demographics. Example: "1200-1300 Hz" - `EXPECTED_F3_RANGE` — Expected F3 range for demographics. Example: "2800-2900 Hz" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for formant analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-formant endpoint ### Capabilities **Statistical Measures per Formant:** - Mean frequency: Average formant frequency indicating articulation - Standard Deviation: Variability indicating articulatory stability - Coefficient of Variation: Normalized variability for comparison - Median and Quartiles: Robust central tendency measures - Range Analysis: Extreme variation assessment **Formant Relationships:** - F2-F1 Difference: Critical for vowel identification and tongue advancement - F3-F2 Difference: Related to vocal tract length and lip rounding - Vowel Space Distance: Euclidean distance from neutral position - Formant Stability Index: Average CV across F1-F3 for consistency **Clinical Assessments:** - Articulatory Precision: High/Moderate/Low based on variability - Vowel Quality: Gender-specific /a/ vowel production assessment - Hypernasality Risk: F1 elevation and F2-F1 compression screening - Precision Level: Normal/Mild/Significant impairment classification **Gender-Specific Analysis:** - Male parameters: 5000 Hz max frequency, expected ranges validated - Female parameters: 5500 Hz max frequency, higher formant ranges - Burg method with gender-optimized LPC analysis - Expected formant ranges with clinical alerts for deviations ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate formant statistics with demographics curl -X GET "https://platform.vocametrix.com/api/calculate-formant?svFileId=sv12345&age=28&gender=2" \ -H "X-API-Key: your-api-key" # Response: { # "F1_MEAN": 685.2, # "F1_STD": 45.8, # "F1_CV": 6.7, # "F2_MEAN": 1247.5, # "F2_STD": 78.3, # "F2_CV": 6.3, # "F3_MEAN": 2850.1, # "F3_STD": 125.4, # "F3_CV": 4.4, # "F2_F1_DIFF": 562.3, # "VOWEL_SPACE_DISTANCE": 385.7, # "FORMANT_STABILITY_INDEX": 5.8, # "ARTICULATORY_PRECISION": "High precision", # "PRECISION_LEVEL": "Normal", # "VOWEL_QUALITY": "Within expected range", # "HYPERNASALITY_RISK": "Low risk" # } ``` --- ## S/Z Ratio Analysis Calculator API **Group:** Advanced Voice Analysis The S/Z ratio is a simple but powerful clinical tool for detecting glottal insufficiency and vocal fold paralysis by comparing maximum phonation time for voiceless /s/ versus voiced /z/ sounds. This analysis provides rapid screening for laryngeal pathology with age-specific interpretation and comprehensive quality assessment. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-sz-ratio Calculate S/Z ratio with age-specific interpretation and quality assessment **Parameters:** - `sFileId` — string - ID of the /s/ sound audio file (previously uploaded) - REQUIRED - `zFileId` — string - ID of the /z/ sound audio file (previously uploaded) - REQUIRED - `patientAge` — number - Patient age in years (for presbylaryngis assessment) - REQUIRED - `gender` — string - Patient gender ("male" or "female") - REQUIRED - `clinicalContext` — string - Clinical context ("screening", "therapy", "neurological", "post-surgical") - default: "screening" - `silenceThreshold` — number - Silence detection threshold in dB - default: -30 - `minVoicedDuration` — number - Minimum voiced segment duration in seconds - default: 0.1 - `minVoicelessDuration` — number - Minimum voiceless segment duration in seconds - default: 0.1 - `edgePadding` — number - Edge padding to avoid artifacts in seconds - default: 0.05 - `maxRecordingDuration` — number - Maximum recording duration in seconds - default: 30 - `chunkDuration` — number - Analysis chunk duration in seconds - default: 10 - `version` — string - S/Z analysis algorithm version - default: "v01" **Response:** - `S_PHONATION_TIME` — Total sustained /s/ phonation duration in seconds. Example: 18.7 - `Z_PHONATION_TIME` — Total sustained /z/ phonation duration in seconds. Example: 19.2 - `SZ_RATIO` — S/Z phonation time ratio (primary clinical measure). Example: 0.97 - `S_EFFICIENCY` — Percentage of /s/ recording with actual phonation. Example: 85.4 - `Z_EFFICIENCY` — Percentage of /z/ recording with actual phonation. Example: 87.2 - `S_PHONATION_SEGMENTS` — Number of continuous /s/ phonation segments. Example: 3 - `Z_PHONATION_SEGMENTS` — Number of continuous /z/ phonation segments. Example: 2 - `S_LONGEST_SEGMENT` — /s/ longest continuous segment duration in seconds. Example: 12.5 - `Z_LONGEST_SEGMENT` — /z/ longest continuous segment duration in seconds. Example: 15.8 - `MEAN_F0_Z` — Mean fundamental frequency during /z/ phonation in Hz. Example: 195.8 - `F0_STD_Z` — Standard deviation of F0 during /z/ phonation in Hz. Example: 8.3 - `F0_CV_Z` — Coefficient of variation of F0 during /z/ in %. Example: 4.2 - `RELIABILITY_SCORE` — Composite reliability score (0-4). Example: 3.8 - `PRESBYLARYNGIS_RISK` — Age-related voice changes risk (0/1). Example: 0 - `SZ_INTERPRETATION` — Clinical classification with age adjustment. Example: "Normal S/Z ratio" - `CLINICAL_SIGNIFICANCE` — Pathophysiology interpretation. Example: "No evidence of vocal fold mass lesions" - `RISK_LEVEL` — Stratified risk assessment. Example: "Low" - `RELIABILITY` — Test reliability classification. Example: "High reliability" - `AGE_ADJUSTMENT` — Age-specific interpretation note. Example: "No age-related adjustments needed" - `GENDER` — Gender classification. Example: "Female" - `PATIENT_AGE` — Patient age in years. Example: 35 - `PATIENT_GENDER` — Patient gender classification. Example: "Female" - `AGE_GROUP` — Age-based clinical grouping. Example: "Young Adult" - `AGE_NOTE` — Age-related clinical note. Example: "Age within normal vocal range" - `GENDER_NOTE` — Gender-specific clinical note. Example: "Female-typical phonation patterns" - `EXPECTED_SZ_RANGE` — Expected S/Z ratio range for demographics. Example: "0.85-1.15" - `PHONATION_QUALITY` — Overall phonation quality assessment. Example: "Good phonatory control" #### POST https://platform.vocametrix.com/api/assignFileId Upload /s/ and /z/ sound audio files for S/Z ratio analysis **Parameters:** - `audio` — file - /s/ or /z/ sound audio file (WAV format preferred) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-sz-ratio endpoint ### Capabilities **S/Z Ratio Calculation:** - S/Z Ratio: Primary measure comparing phonation times - Age-specific thresholds: Male ≥70 years (1.5), Female ≥65 years (1.5), younger (1.4) - Presbylaryngis risk assessment with automatic age adjustment - Multi-parameter reliability scoring for test quality validation **Phonation Analysis:** - /s/ phonation time: Voiceless airflow control assessment - /z/ phonation time: Voiced glottal closure efficiency - Phonation efficiency: Percentage of recording with actual phonation - Segment analysis: Number and duration of phonation segments **Voice Analysis Integration:** - F0 analysis from /z/ recordings with gender-specific validation - Pitch stability assessment during voicing - Gender-specific pitch ranges (Male: 75-500 Hz, Female: 100-600 Hz) - Clinical context integration for specialized assessments **Quality Control:** - Automatic segment extraction for long recordings - Silence detection with configurable thresholds - Edge padding to avoid artifacts - Reliability scoring based on duration and efficiency ### Example (cURL) ```bash # Step 1: Upload /s/ sound file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./s_sound.wav" \ -F "email=user@example.com" # Response: {"fileId":"s12345"} # Step 2: Upload /z/ sound file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./z_sound.wav" \ -F "email=user@example.com" # Response: {"fileId":"z67890"} # Step 3: Calculate S/Z ratio with demographics and clinical context curl -X GET "https://platform.vocametrix.com/api/calculate-sz-ratio?sFileId=s12345&zFileId=z67890&patientAge=68&gender=female&clinicalContext=screening&silenceThreshold=-30&version=v01" \ -H "X-API-Key: your-api-key" # Response: { # "S_PHONATION_TIME": 12.5, # "Z_PHONATION_TIME": 8.2, # "SZ_RATIO": 1.52, # "S_EFFICIENCY": 78.5, # "Z_EFFICIENCY": 72.3, # "MEAN_F0_Z": 185.4, # "F0_STD_Z": 12.8, # "RELIABILITY_SCORE": 3, # "PRESBYLARYNGIS_RISK": 1, # "SZ_INTERPRETATION": "Mildly elevated (age-adjusted)", # "CLINICAL_SIGNIFICANCE": "Possible mild glottal insufficiency with presbylaryngis consideration", # "RISK_LEVEL": "Moderate", # "RELIABILITY": "High" # } ``` --- ## Glottal-to-Noise Excitation Ratio (GNE) Calculator API **Group:** Advanced Voice Analysis The Glottal-to-Noise Excitation Ratio (GNE) quantifies the amount of noise versus voicing in the glottal excitation source. GNE is particularly sensitive to vocal roughness and breathiness, providing complementary information to traditional voice quality measures. This calculator implements the validated Michaelis et al. (1997) algorithm using Praat's native GNE function, which is the reference standard method used in voice research worldwide. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-gne Calculate GNE using the validated Michaelis algorithm implemented in Praat **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED **Response:** - `GNE_VALUE` — Glottal-to-Noise Excitation Ratio on 0-1 scale. Example: 0.425 - `GNE_DB` — GNE expressed in decibel scale for clinical interpretation. Example: -7.4 - `MEAN_F0` — Average fundamental frequency in Hz. Example: 185.3 - `F0_STD` — F0 standard deviation indicating pitch stability in Hz. Example: 8.3 - `HNR` — Harmonics-to-noise ratio in dB (for comparison with GNE). Example: 12.8 - `GENDER` — Gender classification based on F0 range. Example: "Female" - `VOICE_QUALITY` — Qualitative assessment of voice quality. Example: "Borderline normal" - `CLINICAL_RECOMMENDATION` — Contextual clinical guidance based on GNE value. Example: "Monitor voice quality. Consider vocal hygiene if symptoms persist." - `METHOD` — Calculation method identifier. Example: "Native_Praat_GNE" - `ANALYSIS_TYPE` — Analysis type descriptor. Example: "Native_Praat_GNE" - `FREQUENCY_RANGE` — Frequency range analyzed. Example: "500-4500 Hz" - `BANDWIDTH` — Bandwidth used in analysis. Example: "1000 Hz" - `STEP_SIZE` — Step size for frequency analysis. Example: "80 Hz" - `ALGORITHM` — Algorithm reference. Example: "Michaelis_et_al_1997" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for GNE analysis **Parameters:** - `audio` — file - Sustained vowel /a/ audio file (WAV format preferred, minimum 0.5 seconds) - REQUIRED - `email` — string - Email associated with the user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with the calculate-gne endpoint ### Capabilities **GNE Calculation Method:** - Validated Algorithm: Uses the original Michaelis et al. (1997) method based on maximum correlation between Hilbert envelopes - Praat Implementation: Leverages Praat's scientifically validated "To Harmonicity (gne)" function - Robust Measurement: Independent of jitter and shimmer variations, unlike alternative noise measures - Research-Grade: Results directly comparable to published scientific literature **Audio Processing:** - Automatic Validation: Checks sampling frequency (minimum 8000 Hz) and duration (minimum 0.5 seconds) - Smart Segmentation: Extracts stable middle portion of recording, removing onset and offset artifacts - Optimal Analysis: Uses 1.5-second segment from longer recordings for consistent measurement - Frequency Range: Analyzes 500-4500 Hz with 1000 Hz bandwidth and 80 Hz step size **Clinical Interpretation:** - Normal Voice: GNE ≥ 0.5 (evidence-based threshold from research literature) - Borderline Normal: GNE 0.4-0.5 (monitor for symptoms) - Mild Dysfunction: GNE 0.3-0.4 (voice therapy evaluation recommended) - Moderate Dysfunction: GNE 0.25-0.3 (therapy strongly recommended; consider ENT evaluation) - Severe Dysfunction: GNE < 0.25 (urgent ENT/voice specialist referral recommended) **Integration with Voice Dashboard:** - Complementary Analysis: Designed to work alongside separate calculators for HNR, jitter, shimmer, and other acoustic measures - Focused Assessment: Provides specialized GNE measurement without duplicating other voice metrics - F0 Analysis: Includes fundamental frequency extraction and gender classification - Comprehensive Reports: Combine GNE results with other voice parameters for complete voice assessment ### Example (cURL) ```bash # Step 1: Upload sustained vowel file curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate GNE curl -X GET "https://platform.vocametrix.com/api/calculate-gne?svFileId=sv12345" \ -H "X-API-Key: your-api-key" # Response: { # "GNE_VALUE": 0.425, # "GNE_DB": -7.4, # "MEAN_F0": 185.3, # "F0_STD": 8.3, # "HNR": 12.8, # "GENDER": "Female", # "VOICE_QUALITY": "Borderline normal", # "CLINICAL_RECOMMENDATION": "Monitor voice quality. Consider vocal hygiene if symptoms persist.", # "METHOD": "Native_Praat_GNE", # "ANALYSIS_TYPE": "Native_Praat_GNE", # "FREQUENCY_RANGE": "500-4500 Hz", # "BANDWIDTH": "1000 Hz", # "STEP_SIZE": "80 Hz", # "ALGORITHM": "Michaelis_et_al_1997" # } ``` --- ## H1*-H2* Voice Source Analysis API **Group:** Advanced Voice Analysis Professional voice source analysis providing formant-corrected H1*-H2* measure using scientifically validated methods. Features frame-by-frame analysis with 3-period windows, Praat Burg formant extraction (F1, F2), and Iseli & Alwan (2004) formant correction. Suitable for voice quality assessment and research applications. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-h1-h2 Calculate formant-corrected H1*-H2* voice source measure **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED - `gender` — Speaker gender for optimal formant analysis: "male"/"1", "female"/"2", or "other"/"3" - REQUIRED. Determines formant ceiling: Female=5500Hz, Male=5000Hz, Other=F0-adaptive **Response:** - `H1_H2` — Raw (uncorrected) H1-H2 in dB. Amplitude difference between first and second harmonics without formant correction. Use for comparison with corrected value. Example: 5.88 - `H1_H2_CORRECTED` — Formant-corrected H1*-H2* in dB. Primary measure accounting for vocal tract resonance effects using Iseli & Alwan (2004) algorithm. Higher values may indicate breathier voice quality. Example: 10.93 - `MEAN_F0` — Mean fundamental frequency in Hz. Averaged across all valid frames in the stable vowel portion. Example: 134.76 - `F1_MEAN` — Mean first formant frequency in Hz. Extracted using Praat Burg algorithm. Vowel height correlate. Example: 473.00 - `F2_MEAN` — Mean second formant frequency in Hz. Extracted using Praat Burg algorithm. Vowel frontness/backness correlate. Example: 1183.97 - `B1_MEAN` — Estimated first formant bandwidth in Hz. Calculated using Hawk & Miller (1995) formula: B1 = 50 + (F1/10). Used for formant correction. Example: 97.30 - `B2_MEAN` — Estimated second formant bandwidth in Hz. Calculated using Hawk & Miller (1995) formula: B2 = 70 + (F2/50). Used for formant correction. Example: 93.68 - `FRAMES_ANALYZED` — Number of valid frames analyzed. Frame shift = 1ms, window = 3 pitch periods. Higher counts indicate longer recordings or more stable voicing. Example: 2977 - `GENDER` — Gender classification used for formant analysis. Determines formant ceiling frequency. Example: "Male" or "Female (F0-based)" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for H1*-H2* analysis **Parameters:** - `audio` — file - Sustained vowel audio (WAV format preferred, /a/ vowel recommended, 1-3 seconds duration) - REQUIRED - `email` — string - Email associated with user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with calculate-h1-h2 endpoint ### Capabilities **Core H1*-H2* Measurement:** - H1*-H2*: Formant-corrected spectral tilt using Iseli & Alwan (2004) algorithm - H1-H2: Raw (uncorrected) spectral tilt for comparison - Frame-by-frame analysis with 3-period windows - Peak finding for accurate harmonic amplitude measurement - Gender-adaptive formant analysis parameters **Formant Analysis (F1 and F2 Only):** - F1 and F2 extraction using Praat Burg algorithm - Automatic bandwidth estimation (Hawk & Miller 1995 formula) - 25ms window, 50Hz pre-emphasis for formant tracking - Optimized for H1*-H2* correction accuracy **F0 Extraction:** - Praat cross-correlation method for pitch tracking - Robust F0 estimation across voice types (75-600 Hz range) - Mean F0 calculation across stable vowel portion **Scientific Methodology:** - Implements Iseli & Alwan (2004) formant correction using F1, F2 - Frame-by-frame averaging (1ms frame shift) - Arithmetic mean for dB value averaging - Scientifically validated harmonic amplitude measurement - Gender-adaptive formant ceiling (Female: 5500 Hz, Male: 5000 Hz) **Implementation Notes:** - Uses Praat Burg algorithm for formant extraction - High-pass filtering (34 Hz) for noise reduction - Extracts stable middle portion (1-3 seconds) - Professional-grade harmonic analysis - Scientifically valid for research and clinical applications ### Example (cURL) ```bash # Step 1: Upload sustained vowel file (/a/ vowel recommended) curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel_a.wav" \ -F "email=user@example.com" # Response: {"fileId":"sv12345"} # Step 2: Calculate H1*-H2* with formant correction curl -X GET "https://platform.vocametrix.com/api/calculate-h1-h2?svFileId=sv12345&gender=male" \ -H "X-API-Key: your-api-key" # Response: { # "H1_H2": 5.88, # "H1_H2_CORRECTED": 10.93, # "MEAN_F0": 134.76, # "F1_MEAN": 473.00, # "F2_MEAN": 1183.97, # "B1_MEAN": 97.30, # "B2_MEAN": 93.68, # "FRAMES_ANALYZED": 2977, # "GENDER": "Male" # } ``` --- ## Voice Range Profile (VRP) Analysis **Group:** Advanced Voice Analysis Comprehensive voice range measurement through sustained vowel glissando analysis. Measures vocal range in semitones and octaves from lowest to highest comfortable pitch, providing age and gender-specific assessments of vocal capabilities and limitations. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-ambitus Analyzes voice range profile from sustained vowel glissando recording **Parameters:** - `svFileId` — File ID of the uploaded glissando recording (sustained vowel from low to high pitch) - `age` — Representative age from age group (Child: 8, Adolescent: 15, Adult: 30). Affects pitch tracking parameters and range expectations - `gender` — Patient gender (1=Male, 2=Female). Influences expected range and analysis settings **Response:** - `AMBITUS_SEMITONES` — Voice range in semitones (12 × log₂(F0_max/F0_min)) - `AMBITUS_OCTAVES` — Voice range in octaves (semitones ÷ 12) - `FREQUENCY_RATIO` — Ratio of highest to lowest frequency (F0_max/F0_min) - `RANGE_CLASSIFICATION` — Age-appropriate range assessment (Excellent/Normal/Reduced/Severely reduced) - `F0_MIN` — Minimum fundamental frequency in Hz - `F0_MAX` — Maximum fundamental frequency in Hz - `F0_MEAN` — Mean fundamental frequency in Hz - `F0_MEDIAN` — Median fundamental frequency in Hz - `PATIENT_AGE` — Patient age used in analysis - `PATIENT_GENDER` — Patient gender (Male/Female) - `AGE_GROUP` — Age category (Child/Adolescent/Adult) - `PITCH_FLOOR` — Lower limit for pitch tracking in Hz - `PITCH_CEILING` — Upper limit for pitch tracking in Hz - `ANALYSIS_METHOD` — Analysis technique used (Voice Range Profile) ### Capabilities **Voice Range Measurements:** - Ambitus in semitones and octaves - Fundamental frequency range (F0 min/max) - Frequency ratio calculation - Age and gender-specific range classification **Statistical Analysis:** - F0 mean and median values - Age-appropriate pitch tracking parameters - Automated range assessment - Clinical severity classification **Clinical Applications:** - Vocal capability assessment - Voice disorder screening - Pre/post therapy comparison - Professional voice evaluation - Pediatric voice assessment **Age-Specific Analysis:** - Pediatric ranges (prepubescent children) - Adolescent voice considerations - Adult gender-specific norms - Age-appropriate pitch tracking limits ### Example (cURL) ```bash # Voice Range Profile (Ambitus) Analysis with cURL # Step 1: Upload glissando recording curl -X POST "https://api.vocametrix.com/upload-audio" \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "audio=@./patient_glissando.wav" # Response: {"fileId": "abc123def456"} # Step 2: Calculate voice range profile # Age groups: Child (8), Adolescent (15), Adult (30) curl -X GET "https://api.vocametrix.com/api/calculate-ambitus?svFileId=abc123def456&age=30&gender=1" \ -H "Authorization: Bearer YOUR_API_KEY" # Expected Response: # { # "AMBITUS_SEMITONES": 28.5, # "AMBITUS_OCTAVES": 2.4, # "FREQUENCY_RATIO": 6.8, # "RANGE_CLASSIFICATION": "Normal (2-3 octaves)", # "F0_MIN": 85.2, # "F0_MAX": 579.4, # "F0_MEAN": 198.7, # "F0_MEDIAN": 175.3, # "PATIENT_AGE": 30, # "PATIENT_GENDER": "Male", # "AGE_GROUP": "Adult", # "PITCH_FLOOR": 50, # "PITCH_CEILING": 500, # "ANALYSIS_METHOD": "Voice Range Profile (VRP)" # } ``` --- ## Jitter & Shimmer Perturbation Analysis API v.03 (Scientifically Validated) **Group:** Advanced Voice Analysis Research-grade acoustic analysis of vocal fold vibration regularity. Measures cycle-to-cycle frequency (jitter) and amplitude (shimmer) perturbations with automatic recording quality assessment. Uses scientifically validated thresholds (Teixeira & Gonçalves 2014) and proper dBFS-based intensity measurement (Brockmann et al. 2011). Inclusive pitch range (75-500 Hz) suitable for all voice types. ### Endpoints #### GET https://platform.vocametrix.com/api/jitter-shimmer Calculate jitter, shimmer, and recording quality metrics with clinical interpretation **Parameters:** - `svFileId` — ID of the sustained vowel audio file (previously uploaded) - REQUIRED. Minimum 1 second, recommended 3+ seconds for reliability **Response:** - `RECORDING_QUALITY` — Quality assessment: "Good" (-25 to -10 dBFS), "Suboptimal" (-30 to -25 or -10 to -6 dBFS), "Poor" (<-30 or >-6 dBFS). Example: "Good" - `RECORDING_QUALITY_WARNING` — Detailed warning message if quality is suboptimal or poor. Empty string if good. Example: "CAUTION: Recording level is low (<-25 dBFS). Consider re-recording with slightly higher volume for more reliable results." - `MEAN_INTENSITY_DBFS` — Mean recording intensity in dBFS (decibels relative to full scale). Target range: -25 to -10 dBFS. Example: -18.3 - `RMS_AMPLITUDE` — Root-mean-square amplitude (0-1 scale). Used to calculate dBFS. Example: 0.122 - `JITTER_LOCAL_PERCENT` — Local jitter (period-to-period variation) in percent. Primary jitter measure. Normal: ≤1.04%. Example: 0.85 - `JITTER_PPQ5_PERCENT` — 5-period perturbation quotient in percent. Smoothed jitter over 5 periods. Example: 0.78 - `JITTER_RAP_PERCENT` — Relative average perturbation in percent. Average over 3 consecutive periods. Example: 0.82 - `JITTER_DDP_PERCENT` — Difference of differences of periods in percent. Sensitive to small perturbations. Example: 1.23 - `SHIMMER_LOCAL_PERCENT` — Local shimmer (peak-to-peak amplitude variation) in percent. Primary shimmer measure. Normal: ≤3.81%. Example: 2.45 - `SHIMMER_LOCAL_DB` — Local shimmer in decibels. Logarithmic amplitude perturbation. Example: 0.215 - `SHIMMER_APQ3_PERCENT` — 3-period amplitude perturbation quotient in percent. Short-term shimmer. Example: 2.31 - `SHIMMER_APQ5_PERCENT` — 5-period amplitude perturbation quotient in percent. Medium-term shimmer. Example: 2.38 - `SHIMMER_APQ11_PERCENT` — 11-period amplitude perturbation quotient in percent. Long-term shimmer trends. Example: 2.52 - `SHIMMER_DDA_PERCENT` — Difference of differences of amplitudes in percent. Sensitive to amplitude changes. Example: 3.47 - `MEAN_F0` — Mean fundamental frequency in Hz. Averaged across voiced segments. Example: 185.7 - `F0_STD` — Standard deviation of F0 in Hz. Pitch variability indicator. Example: 7.2 - `F0_CV` — Coefficient of variation of F0 in percent. (F0_STD / MEAN_F0) × 100. Pitch stability metric. Example: 3.88 - `VOICELESS_PERCENT` — Percentage of voiceless (unvoiced) frames. Indicates phonation breaks. Example: 5.2 - `NUMBER_OF_PERIODS` — Number of vocal fold vibration periods analyzed. Minimum 20 required, 100+ optimal. Example: 245 - `INSUFFICIENT_PERIODS` — Flag indicating insufficient periods: 1 if <100 periods (reduced reliability), 0 if ≥100 periods. Example: 0 - `JITTER_SEVERITY` — Jitter classification: "Normal" (≤1.04%) or "Elevated" (>1.04%). Example: "Normal" - `SHIMMER_SEVERITY` — Shimmer classification: "Normal" (≤3.81%) or "Elevated" (>3.81%). Example: "Normal" - `JITTER_INTERPRETATION` — Clinical interpretation text for jitter results. Example: "Within normal limits (≤1.04%)" - `SHIMMER_INTERPRETATION` — Clinical interpretation text for shimmer results. Example: "Within normal limits (≤3.81%)" #### POST https://platform.vocametrix.com/api/assignFileId Upload sustained vowel audio file for jitter/shimmer analysis **Parameters:** - `audio` — file - Sustained vowel audio (WAV format preferred, /a/ vowel recommended, minimum 1 second, 3+ seconds optimal) - REQUIRED - `email` — string - Email associated with user account - REQUIRED **Response:** - `fileId` — ID of the uploaded file for use with jitter-shimmer endpoint ### Capabilities **Recording Quality Assessment (NEW in v.03):** - Automatic dBFS-based recording level assessment - Quality warnings for suboptimal recording conditions - RMS amplitude measurement for signal-to-noise evaluation - Target range: -25 to -10 dBFS for optimal analysis - Warnings for too-low (<-30 dBFS) or too-high (>-6 dBFS) recordings **Jitter Measurements (Frequency Perturbation):** - Local Jitter: Cycle-to-cycle period variability (primary measure) - PPQ5: 5-period perturbation quotient for smoothed analysis - RAP: Relative average perturbation across 3 consecutive periods - DDP: Difference of differences of periods for sensitivity - Threshold: 1.04% (Teixeira & Gonçalves 2014) **Shimmer Measurements (Amplitude Perturbation):** - Local Shimmer (%): Cycle-to-cycle amplitude variability (primary measure) - Local Shimmer (dB): Logarithmic amplitude perturbation - APQ3: 3-period amplitude perturbation quotient - APQ5: 5-period amplitude perturbation quotient - APQ11: 11-period amplitude perturbation quotient for long-term trends - DDA: Difference of differences of amplitudes - Threshold: 3.81% (Teixeira & Gonçalves 2014) **Fundamental Frequency Analysis:** - Mean F0: Average fundamental frequency - F0 Standard Deviation: Pitch variability indicator - F0 Coefficient of Variation: Normalized pitch stability (SD/Mean × 100) - Inclusive pitch range: 75-500 Hz (all voice types) **Voice Quality Metrics:** - Voiceless frame percentage: Proportion of unvoiced segments - Period count: Number of vocal periods analyzed - Insufficient periods warning: Reliability indicator (<100 periods) **Clinical Interpretation (NEW in v.03):** - Automated severity classification: Normal vs. Elevated - Clinical interpretation text for jitter and shimmer - Based on scientifically validated thresholds - Simplified binary classification for clarity **Scientific Validity:** - Peer-reviewed normative thresholds (Teixeira & Gonçalves 2014) - Proper digital audio level measurement (Brockmann et al. 2011) - Validated for clinical and research applications - Focus on scientifically supported measures only **Changes from v.02 to v.03:** - REMOVED: HNR measurement (now separate endpoint) - REMOVED: Voice breaks, voice break degree (unvalidated metrics) - REMOVED: Automated quality assessments (clinical judgment recommended) - ADDED: dBFS-based recording quality with warnings - ADDED: RMS amplitude measurement - ADDED: Clinical severity classifications - ADDED: Clinical interpretation text - SIMPLIFIED: Binary classification (Normal/Elevated only) --- ## Extended Acoustic Calculators (ABI, Voice Dynamics, Prosody Similarity) **Group:** Advanced Voice Analysis Three Praat-driven calculators that complement the core voice quality APIs: Acoustic Breathiness Index (ABI) for breathy-voice quantification, Voice Dynamics for intensity stability and projection scoring, and Prosody Similarity for comparing two recordings (model vs. learner) with visualization-ready curves. ### Endpoints #### GET https://platform.vocametrix.com/api/calculate-abi Calculate the Acoustic Breathiness Index (ABI) from a connected speech recording and a sustained vowel recording. The ABI is a multi-component composite score that quantifies breathy voice quality. Both files must be pre-uploaded via POST /api/assignFileId. **Parameters:** - `csFileId` — REQUIRED. fileId of the CONNECTED SPEECH recording (e.g., reading a passage). - `svFileId` — REQUIRED. fileId of the SUSTAINED VOWEL /a/ recording (≥ 5 s). **Response:** - `ABI_SCORE` — Acoustic Breathiness Index — composite multi-component score (number). - `CPPS` — Cepstral Peak Prominence (Smoothed) in dB — harmonic-to-aperiodic structure indicator. - `JITTER_PERCENT` — Local jitter as a percentage — period perturbation. - `GNE_APPROXIMATION` — Approximate Glottal-to-Noise Excitation ratio. - `HNR_6KHZ` — Harmonics-to-Noise Ratio computed in the 0–6 kHz band (dB). - `HNR_DEJONCKERE` — HNR computed using De Jonckere's method (dB). - `H1_H2_DIFF` — Difference between the first two harmonic amplitudes (H1 − H2) in dB — voice source quality indicator. - `SHIMMER_DB` — Shimmer in logarithmic scale (dB). - `SHIMMER_PERCENT` — Shimmer as a percentage. - `PERIOD_STD` — Standard deviation of glottal period (period stability). - `CS_DURATION` — Connected speech recording duration in seconds. - `SV_DURATION` — Sustained vowel recording duration in seconds. - `CS_DURATION_USED` — Portion of CS actually analyzed (after silence trimming). - `SV_DURATION_USED` — Portion of SV actually analyzed. - `TOTAL_ANALYSIS_DURATION` — Sum of CS_DURATION_USED + SV_DURATION_USED. - `ABI_REFERENCE` — Clinical reference text emitted by the analysis script (interpretive guidance). - `ANALYSIS_VERSION` — Algorithm version, e.g. "ABI_v01". #### GET https://platform.vocametrix.com/api/calculate-voice-dynamics Compute intensity dynamics, pitch-intensity correlation, and projection/stability/effort/control/monotonicity scores from a sustained vowel recording. Demographics-aware: age and gender drive expected ranges and presbylaryngis assessment. **Parameters:** - `svFileId` — REQUIRED. fileId of the sustained vowel recording. - `age` — REQUIRED. Patient age in years (1–120). - `gender` — REQUIRED. "male" | "female" | "other" (or numeric "1" | "2" | "3"). - `timeStep` — Optional, default 0.01 (must be 0 < x ≤ 0.1). Praat analysis frame step in seconds. - `minPitch` — Optional, default 75. Lower limit of pitch tracking in Hz (range 50–200). - `subtractMean` — Optional, default 1. 0 or 1 — whether to subtract DC offset before analysis. - `windowLength` — Optional, default 0.03 (must be 0 < x ≤ 1). Analysis window in seconds. - `fatigueThreshold` — Optional, default 1.0. Threshold for severe fatigue indicator (positive number). - `mildFatigueThreshold` — Optional, default 0.5. Threshold for mild fatigue indicator. - `monotonicityCV` — Optional, default 5.0. Coefficient-of-variation threshold below which voice is flagged monotonous. - `monotonicityRange` — Optional, default 10.0. Pitch range threshold (semitones) below which voice is flagged monotonous. **Response:** - `INTENSITY_MEAN` — Mean intensity in dB. - `INTENSITY_STD` — Intensity standard deviation in dB. - `INTENSITY_CV` — Intensity coefficient of variation (%). - `INTENSITY_MIN` — Minimum intensity in dB. - `INTENSITY_MAX` — Maximum intensity in dB. - `INTENSITY_RANGE` — Intensity range (max − min) in dB. - `INTENSITY_MEDIAN` — Median intensity in dB. - `INTENSITY_Q25` — 25th percentile of intensity in dB. - `INTENSITY_Q75` — 75th percentile of intensity in dB. - `INTENSITY_IQR` — Inter-quartile range of intensity in dB. - `INTENSITY_SLOPE` — Slope of intensity over time (dB/sec) — fatigue indicator if strongly negative. - `PITCH_SLOPE` — Slope of pitch over time (Hz/sec). - `PITCH_INTENSITY_CORRELATION` — Correlation between pitch and intensity contours. - `MEAN_F0` — Mean fundamental frequency in Hz. - `F0_STD` — F0 standard deviation in Hz. - `ANALYSIS_DURATION` — Analyzed duration in seconds. - `N_FRAMES` — Number of frames analyzed. - `TIME_STEP` — Echo of the timeStep parameter used. - `WINDOW_LENGTH` — Echo of the windowLength parameter used. - `FATIGUE_THRESHOLD` — Echo of the fatigueThreshold parameter used. - `MILD_FATIGUE_THRESHOLD` — Echo of the mildFatigueThreshold parameter used. - `MONOTONICITY_CV_THRESHOLD` — Echo of monotonicityCV. - `MONOTONICITY_RANGE_THRESHOLD` — Echo of monotonicityRange. - `PROJECTION_SCORE` — Integer score (0–N) — vocal projection capability. - `STABILITY_SCORE` — Integer score — intensity stability. - `EFFORT_SCORE` — Integer score — vocal effort level. - `CONTROL_SCORE` — Integer score — pitch/intensity coordination. - `MONOTONICITY_SCORE` — Integer score — monotonicity (lower is better). - `RELIABILITY_FLAG` — Integer flag indicating whether the analysis is reliable. - `RELATIVE_MAX_THRESHOLD_HIGH` — Computed reference threshold (high). - `RELATIVE_MAX_THRESHOLD_MODERATE` — Computed reference threshold (moderate). - `GENDER_DETECTED` — Gender inferred from F0 statistics. - `GENDER_PROVIDED` — Echo of the gender parameter as supplied. - `PROJECTION_CAPABILITY` — Categorical: e.g. "Strong", "Adequate", "Reduced". - `INTENSITY_STABILITY` — Categorical clinical label for intensity stability. - `VOCAL_EFFORT` — Categorical: e.g. "Normal", "Increased", "Reduced". - `COORDINATION` — Categorical: how well pitch and intensity co-vary. - `FATIGUE_INDICATOR` — Categorical: "None" | "Mild" | "Severe". - `FATIGUE_RISK` — Categorical risk level. - `MONOTONICITY` — Categorical: "Monotonous" | "Normal variation" | etc. - `MONOTONICITY_LEVEL` — Categorical severity of monotonicity. - `MONOTONICITY_CONFIDENCE` — Categorical confidence: "High" | "Moderate" | "Low". - `INTENSITY_CONTROL` — Categorical assessment of intensity control. - `CLINICAL_INTERPRETATION` — Free-form clinical text summary. - `CLINICAL_RECOMMENDATION` — Free-form clinical recommendation text. - `PROFESSIONAL_VOICE` — Categorical: assessment of professional-voice readiness. - `CALIBRATION_NOTE` — Note about absolute dB calibration (recordings are typically uncalibrated). - `VISUALIZATION_NOTE` — Hint for downstream visualizers. - `PATIENT_AGE` — Echo of the input age. - `AGE_GROUP` — Categorical age group (Child / Adolescent / Adult / Older Adult). - `PRESBYLARYNGIS_RISK` — Categorical: presbylaryngis (age-related laryngeal change) risk. - `PRESBYLARYNGIS_AGE_THRESHOLD` — Threshold age above which presbylaryngis is suspected. - `EXPECTED_INTENSITY_RANGE` — Expected intensity range string for this demographic. #### GET https://platform.vocametrix.com/api/calculate-prosody-similarity Compare two recordings (a reference "model" and a learner "user") on prosodic dimensions. Returns per-dimension scores plus visualization-ready time-aligned curves for pitch, intensity, and rhythm. Designed for pronunciation coaching games. Both files must be pre-uploaded via POST /api/assignFileId. **Parameters:** - `modelFileId` — REQUIRED. fileId of the REFERENCE recording (the model the learner is trying to imitate). - `userFileId` — REQUIRED. fileId of the LEARNER recording. - `modelStartTime` — Optional, default 0. Seconds into the model file from which to start aligning. - `userStartTime` — Optional, default 0. Seconds into the user file from which to start aligning. **Response:** - `OVERALL_SCORE` — Composite similarity score on a 0–100 scale. - `PITCH_SCORE` — Pitch-similarity sub-score (0–100). - `RHYTHM_SCORE` — Rhythm-similarity sub-score (0–100). - `INTENSITY_SCORE` — Intensity-similarity sub-score (0–100). - `PERFORMANCE_LEVEL` — Categorical label (e.g. "Excellent", "Good", "Needs work"). - `OVERALL_FEEDBACK` — Free-form coaching text (English). - `BEST_MATCH` — Free-form description of the best-matching dimension. - `NEEDS_WORK` — Free-form description of the dimension that needs the most work. - `SPEECH_RATE_SIMILARITY` — Speech-rate similarity (0–1 or 0–100 depending on metric). - `DURATION_SIMILARITY` — Total-duration similarity. - `PITCH_CONTOUR_SIMILARITY` — Pitch-contour shape similarity. - `INTENSITY_CONTOUR_SIMILARITY` — Intensity-contour shape similarity. - `DYNAMIC_RANGE_SIMILARITY` — Dynamic-range similarity. - `MODEL_F0_MEAN` — Mean F0 of the model recording (Hz). - `USER_F0_MEAN` — Mean F0 of the learner recording (Hz). - `MODEL_SPEECH_RATE` — Speech rate of the model (e.g., syllables/sec). - `USER_SPEECH_RATE` — Speech rate of the learner. - `MODEL_DURATION` — Duration of the model recording (sec). - `USER_DURATION` — Duration of the learner recording (sec). - `CURVE_DATA` — Visualization-ready curves: `{ pitch: [{time, model, user}, …], intensity: [...], rhythm: [...] }`. Each array is sampled at common time points and includes both signals so the UI can overlay them directly. - `VISUALIZATION_METADATA` — Object: `{ model_duration, user_duration, total_samples, pitch_range: { model_median, user_median } }`. - `ANALYSIS_TYPE` — String — currently "Prosody_Similarity_Game". - `ALGORITHM_VERSION` — String — currently "v01_with_curves". ### Capabilities **ABI — Acoustic Breathiness Index:** - Multi-component breathiness score combining CPPS, jitter, GNE approximation, multi-band HNR, H1-H2, shimmer, and period stability - Requires both connected speech (CS) and sustained vowel (SV) recordings - Algorithm version returned (`ANALYSIS_VERSION = ABI_v01`) **Voice Dynamics — Intensity stability and projection:** - Intensity statistics: mean, std, CV, min, max, range, median, IQR, slope (dB/sec) - Pitch-intensity correlation, monotonicity scoring, vocal-effort assessment - Demographics-aware: age + gender drive expected ranges and presbylaryngis flags - Configurable thresholds for fatigue, monotonicity, projection, and reliability - Returns ~53 fields (numeric scores + categorical clinical strings) **Prosody Similarity — Game-grade pronunciation coaching:** - Compares a learner recording against a reference (model) recording - Per-dimension scores: pitch, rhythm, intensity, dynamic range, contour shape - Returns visualization-ready curves: pitch[], intensity[], rhythm[] each with {time, model, user} triplets - Optional time offsets to align two recordings that don't start together - Designed for real-time pronunciation training games and feedback UIs ### Example (cURL) ```bash # Upload connected speech and sustained vowel for ABI curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./connected_speech.wav" -F "email=user@example.com" # {"fileId":"cs12345"} curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./sustained_vowel.wav" -F "email=user@example.com" # {"fileId":"sv67890"} # ABI curl -X GET "https://platform.vocametrix.com/api/calculate-abi?csFileId=cs12345&svFileId=sv67890" \ -H "X-API-Key: your-api-key" # Voice dynamics (sustained vowel + demographics) curl -X GET "https://platform.vocametrix.com/api/calculate-voice-dynamics?svFileId=sv67890&age=42&gender=female" \ -H "X-API-Key: your-api-key" # Prosody similarity — upload model + user files first, then compare curl -X GET "https://platform.vocametrix.com/api/calculate-prosody-similarity?modelFileId=mdl001&userFileId=usr001&modelStartTime=0&userStartTime=0.2" \ -H "X-API-Key: your-api-key" ``` --- ## Audio Measures (Sound Level, eGeMAPS) **Group:** Audio Measures Simple acoustic measurement primitives complementing the clinical calculators. /api/soundLevel returns a calibrated-style dB SPL reading over a time window. /api/gemaps-extract returns the full openSMILE eGeMAPSv02 88-feature acoustic vector — the de-facto standard feature set used in voice ML research and computational paralinguistics. ### Endpoints #### POST https://platform.vocametrix.com/api/soundLevel Measure the sound level (dB SPL) of an audio recording over a specified time window. Useful for environmental noise checks, vocal effort estimation, or sound-pressure documentation. **Parameters:** - `blobUrl` — REQUIRED. URL of the audio file (typically obtained via /api/get-blob-url + upload). - `start_sec` — REQUIRED. Start of the analysis window in seconds. CAVEAT: The server uses a falsy check (`!start_sec`), so the literal value 0 is rejected — pass a tiny positive value like 0.001 if you mean 'from the beginning'. - `stop_sec` — REQUIRED. End of the analysis window in seconds. Must be > start_sec. **Response:** - `soundLevel` — Number — dB value rounded to 2 decimals. - `unit` — String — currently "dB SPL". - `frequencyRange` — String — currently "20-8000 Hz". #### GET https://platform.vocametrix.com/api/gemaps-extract Extract the full openSMILE eGeMAPSv02 feature set (88 features) from a previously uploaded audio file. Files longer than 8 seconds are processed in 8-second chunks and per-chunk results are combined into a single feature vector. **Parameters:** - `fileId` — REQUIRED. fileId from a prior /api/assignFileId upload. **Response:** - `` — The full eGeMAPSv02 feature object (88 features) as emitted by openSMILE — feature names follow the openSMILE convention (e.g. F0semitoneFrom27.5Hz_sma3nz_amean, loudness_sma3_amean, jitterLocal_sma3nz_amean, shimmerLocaldB_sma3nz_amean, HNRdBACF_sma3nz_amean, F1frequency_sma3nz_amean, etc.). Refer to the openSMILE eGeMAPSv02 configuration for the canonical key list — this server is a transparent wrapper around it. - `chunk_info` — Object describing chunking: `{ total_chunks: number, current_chunk: number, start_time: number, duration: number, is_chunked: boolean }`. For files ≤ 8 s, is_chunked is false and the whole file produces one result. - `metadata` — Object: `{ file_id: string, processing_duration_seconds: number, analysis_type: "single_chunk" | "chunked" }`. ### Capabilities **Sound Level (dB SPL):** - Computes a single dB value over a user-specified [start_sec, stop_sec] window - Frequency range 20–8000 Hz (broadband) - Uploads via blobUrl (different pattern than the Praat calculators which use fileId) - WARNING: start_sec=0 is rejected by a falsy check on the server — pass a tiny positive value (e.g., 0.001) instead **eGeMAPS Feature Extraction:** - Full openSMILE eGeMAPSv02 spec — 88 acoustic features - Standard feature set used in voice ML and paralinguistics research - Includes F0 (semitones), loudness, jitter, shimmer, HNR, spectral and formant features - Files longer than 8 seconds are split into 8-second chunks; per-chunk results are combined - Returns the full feature object plus chunking metadata (chunk_info, metadata) ### Example (cURL) ```bash # Sound level (dB SPL) over a window — note start_sec must be > 0 curl -X POST "https://platform.vocametrix.com/api/soundLevel" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "blobUrl": "", "start_sec": 0.001, "stop_sec": 5.0 }' # Response: {"soundLevel": -12.45, "unit": "dB SPL", "frequencyRange": "20-8000 Hz"} # eGeMAPS feature extraction — file must be uploaded first via /api/assignFileId curl -X GET "https://platform.vocametrix.com/api/gemaps-extract?fileId=sv67890" \ -H "X-API-Key: your-api-key" # Response: { # "F0semitoneFrom27.5Hz_sma3nz_amean": 30.4, # "loudness_sma3_amean": 0.42, # "jitterLocal_sma3nz_amean": 0.012, # "shimmerLocaldB_sma3nz_amean": 0.31, # "HNRdBACF_sma3nz_amean": 14.2, # ... (88 features total) # "chunk_info": {"total_chunks": 1, "current_chunk": 1, "start_time": 0, "duration": 4.2, "is_chunked": false}, # "metadata": {"file_id": "sv67890", "processing_duration_seconds": 1.3, "analysis_type": "single_chunk"} # } ``` --- ## Phoneme & Audio Classification (Phonemes-Live, Stuttering, Estonian Vowel) **Group:** Phoneme & Classification Three speech-recognition-adjacent services: live phoneme detection (French and Estonian), stuttering classification (async, polled via the therapy session store), and Estonian vowel classification. All three are Python-backed, return lowercase JSON keys (different convention from the SCREAMING_SNAKE_CASE Praat calculators). ### Endpoints #### POST https://platform.vocametrix.com/api/analyze-phonemes-live Run a phoneme-recognition pass on a recording. Provide either a previously-uploaded fileId (preferred for performance) or a publicly accessible blobUrl. Supports French (fr-FR) and Estonian (et-EE). **Parameters:** - `fileId` — Optional (use this OR blobUrl). fileId from a prior /api/assignFileId upload. - `blobUrl` — Optional (use this OR fileId). HTTPS URL of the audio file. Will be downloaded server-side. If the host is `vocametrixstorageaccount.blob.core.windows.net` and `keepBlob` is false, the blob is deleted after processing. - `referenceWord` — Optional. Currently accepted by the API but not used by the JavaScript layer (passed through to the Python script which may use it). - `language` — Optional, default "fr-FR". Accepted: "fr-FR" | "et-EE". - `model` — Optional. Alias for an alternative ASR model (must pass server-side `isKnownAlias`). Resolved to a model URL internally. - `keepBlob` — Optional, default false. If false and `blobUrl` was used, the source blob is deleted after success (only when hosted on the Vocametrix storage account). **Response:** - `` — JSON shape produced by the underlying phoneme client. Typical fields: `phonemes: [{ label, start_ms, end_ms, confidence }, ...]`, `language`, `model_used`. Exact shape is not enumerated by the JS layer and is determined by the per-language Python client (see python/phonemes/french/phoneme_client.py and python/phonemes/estonian/phoneme_client.py). #### POST https://platform.vocametrix.com/api/classify-stuttering Kick off an asynchronous stuttering classification job on a previously-uploaded recording. Returns a session_id immediately; clients poll /api/therapy-status/:sessionId for progress and /api/therapy-result/:sessionId for the final classification. The session store is shared with the therapy planner — same endpoints, different result shape. **Parameters:** - `fileId` — REQUIRED. fileId from a prior /api/assignFileId upload. - `patientId` — Optional, default "unknown". Patient identifier — the literal value "PT-UNKNOWN" is explicitly rejected (use "unknown" or a real ID). - `chunkOverlap` — Optional, default 0.25. Overlap fraction between analysis chunks. - `chunkSize` — Optional, default 4. Chunk size in seconds. - `locale` — Optional, default "en-US". Locale used by the Azure STT step within the classification pipeline. **Response:** - `success` — Boolean — true when the job has been queued. - `session_id` — String of the form "cls-". Pass to /api/therapy-status/:sessionId and /api/therapy-result/:sessionId to poll and retrieve. - `message` — Human-readable confirmation, e.g. "Classification started. Use session_id to poll for progress." #### POST https://platform.vocametrix.com/api/classify-estonian-vowel Synchronous Estonian vowel classifier. Submit a fileId, receive the predicted vowel and confidence in the response. **Parameters:** - `fileId` — REQUIRED. fileId from a prior /api/assignFileId upload. **Response:** - `` — JSON shape produced by python/phonemes/estonian_vowels/estonian_vowel_client.py — typical fields: `success`, `predicted_vowel`, `confidence`, `probabilities: { : , ... }`. The exact key set is determined by the Python client and is not enumerated by the JS layer. ### Capabilities **Phoneme Detection:** - Supports two languages: French (fr-FR) and Estonian (et-EE) - Accepts either a pre-uploaded fileId or an arbitrary HTTPS blobUrl - Optional model alias for swapping ASR backends - Auto-cleans temporary blobs after processing (Vocametrix-hosted blobs only) **Stuttering Classification (async):** - Forks a Python ML script (HuggingFace inference, local) — typical job time minutes - Returns a session_id immediately; clients poll /api/therapy-status/:sessionId for progress - Final result fetched via /api/therapy-result/:sessionId — note this is the SAME endpoint used by the therapy planner family (the session store is shared) - Hard 600 s timeout server-side (CLASSIFY_TIMEOUT_MS) - Per-chunk classification + overall classification + metadata in result payload **Estonian Vowel Classification:** - Synchronous (one-shot) classification of Estonian vowel recordings - 150 s exec timeout server-side - Returns predicted vowel + confidence + per-vowel probabilities (exact shape determined by the underlying Python client) ### Example (cURL) ```bash # Phoneme detection (French) curl -X POST "https://platform.vocametrix.com/api/analyze-phonemes-live" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "fileId": "sv67890", "language": "fr-FR" }' # Phoneme detection from a public blob URL (Estonian) curl -X POST "https://platform.vocametrix.com/api/analyze-phonemes-live" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "blobUrl": "https://example.com/audio.wav", "language": "et-EE" }' # Stuttering classification — async, returns a session_id curl -X POST "https://platform.vocametrix.com/api/classify-stuttering" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "fileId": "sv67890", "patientId": "P12345" }' # Response: {"success": true, "session_id": "cls-abc123", "message": "Classification started..."} # Poll progress (shared session store with therapy planner) curl -X GET "https://platform.vocametrix.com/api/therapy-status/cls-abc123" \ -H "X-API-Key: your-api-key" # Fetch final result curl -X GET "https://platform.vocametrix.com/api/therapy-result/cls-abc123" \ -H "X-API-Key: your-api-key" # Estonian vowel classification (synchronous) curl -X POST "https://platform.vocametrix.com/api/classify-estonian-vowel" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "fileId": "sv67890" }' ``` --- ## Speech Coaching Analysis API **Group:** Speech Coaching Unified orchestrator that runs a speech recording through multiple analyses in a single call — pitch, intensity, speech segments, eGeMAPS features, stuttering classification, Azure speech-to-text, and Azure pronunciation assessment — and returns one consolidated JSON. Supports both single-file (multipart upload) and ZIP-based batch modes. Currently supports French and English only. ### Endpoints #### POST https://platform.vocametrix.com/api/coaching-analysis Single-file orchestrator. Accept a multipart upload with one audio file and run the full coaching pipeline. Returns a consolidated JSON with per-sub-analysis results, errors, and warnings. **Parameters:** - `audio` — REQUIRED (multipart file field). Any audio format ffmpeg can decode (the server normalizes to 16 kHz mono internally). - `language` — REQUIRED (form field). Either "fr" or "en". Other values are rejected. - `mode` — Optional (form field, default "free_speech"). "free_speech" or "known_text". - `reference_text` — Required when mode="known_text" — the text the speaker is supposed to read. Ignored in free_speech mode. - `label` — Optional (form field). Free-form classification tag passed through into the response (useful for batch ergonomics). - `filename` — Optional (form field). Overrides the multipart originalname when echoing the result. **Response:** - `status` — "ok" if all sub-calls succeeded, "partial" if at least one failed (the response is still returned with whatever succeeded). - `filename` — Echo of the input filename. - `label` — Echo of the input label. - `language` — Echo of the input language. - `mode` — Echo of the input mode. - `reference_text` — Echo of the reference text (null in free_speech mode). - `transcript_used` — The transcript actually used downstream — either the user-provided reference_text, or the Azure STT result, or null. - `transcript_source` — "user_provided" | "azure_stt" | null. - `audio` — Object describing the normalized audio: `{ duration_seconds, sample_rate: 16000, channels: 1 }`. - `results` — Object with one entry per successful sub-call. Possible keys: `pitch`, `intensity`, `speech_segments`, `speech_percentage`, `gemaps`, `stuttering`, `azure_stt`, `azure_pronunciation`. Each value is the raw payload of the corresponding sub-endpoint. - `errors` — Object with one entry per FAILED sub-call (same key set as `results`). Each value is `{ message, status }`. - `warnings` — Array of strings — non-fatal advisories (e.g., "voice quality not yet wired into orchestrator"). - `pipeline_version` — String identifying the orchestrator pipeline version. Use this to detect behavior changes across deploys. - `processed_at` — ISO 8601 timestamp. - `processing_time_ms` — Total wall-clock time in milliseconds. #### POST https://platform.vocametrix.com/api/coaching-analysis/batch Submit a ZIP archive for batch processing. The ZIP must contain a top-level manifest.csv plus all the audio files referenced by the manifest. Returns immediately (202) with a job_id; poll /api/coaching-analysis/batch/:job_id for progress and results. Job state is in-memory only — do not assume jobs persist across server restarts. **Parameters:** - `archive` — REQUIRED (multipart file field, ZIP). Must contain manifest.csv at the root plus the audio files referenced by manifest rows. Manifest schema: filename, label, language, mode, reference_text (one row per audio). **Response:** - `job_id` — String — opaque job identifier (format: "ca_batch_"). Pass this to /api/coaching-analysis/batch/:job_id to poll. - `status` — "queued" — the initial state. The job transitions to "running" / "complete" / "failed" as it progresses. - `n_files` — Number of manifest rows the server will process. - `poll_url` — Convenience URL to poll the job state. - `created_at` — ISO 8601 timestamp. #### GET https://platform.vocametrix.com/api/coaching-analysis/batch/:job_id Poll the state of a batch job. Returns 404 if the job_id is unknown or the in-memory job entry has been evicted (e.g. server restart). Per-file results accumulate in `results` as the batch progresses, so the same response shape is returned during "running" as at "complete". **Parameters:** - `job_id` — Path parameter — the job identifier returned by /api/coaching-analysis/batch. **Response:** - `job_id` — Echo of the job identifier. - `status` — "queued" | "running" | "complete" | "failed". - `created_at` — ISO 8601 timestamp. - `updated_at` — ISO 8601 timestamp of the last state change. - `n_files` — Total number of manifest rows. - `n_processed` — How many rows have been processed (succeeded or failed). - `n_succeeded` — How many rows succeeded. - `n_failed` — How many rows failed. - `results` — Array of per-file orchestrator outputs (same shape as POST /api/coaching-analysis), in manifest order. Populated incrementally as the batch progresses. - `downloads` — Object with download URLs once the batch is complete (e.g., dataset.csv). Null while running. - `error` — String describing why the job failed, or null. - `pipeline_version` — Echo of the orchestrator pipeline version. ### Capabilities **Unified Orchestration:** - One multipart POST replaces ~7 separate API calls - Per-sub-call success/failure tracking via `results` and `errors` objects - Soft-fail design: partial results are returned even if some sub-calls fail (status: "partial") - Two modes: free_speech (no reference text) or known_text (reference for pronunciation scoring) - Pipeline version returned to track behavior across deploys **Batch Processing (ZIP + manifest):** - Submit a ZIP archive containing audio files plus a top-level manifest.csv - Manifest drives per-row configuration (label, language, mode, reference_text) - Async job model — returns 202 with a job_id and poll URL - Per-row results accumulate as the batch progresses (poll repeatedly) - Hard cap on number of files per ZIP (BATCH_MAX_FILES, server-configured) **Important constraints:** - Languages supported: "fr" or "en" only (rejected otherwise) - reference_text is REQUIRED when mode="known_text"; ignored otherwise - Emotion estimation is intentionally disabled (model not deployed) — not in `results` - Voice quality, formants, and FR phoneme analysis are NOT yet wired into the orchestrator (see `warnings`) - Batch job state is IN-MEMORY ONLY — jobs disappear on server restart, return 404 from poll ### Example (cURL) ```bash # Single-file coaching analysis (free speech) curl -X POST "https://platform.vocametrix.com/api/coaching-analysis" \ -H "X-API-Key: your-api-key" \ -F "audio=@./speech.wav" \ -F "language=en" \ -F "mode=free_speech" \ -F "label=session-001" # Single-file with known reference text (enables pronunciation scoring) curl -X POST "https://platform.vocametrix.com/api/coaching-analysis" \ -H "X-API-Key: your-api-key" \ -F "audio=@./reading.wav" \ -F "language=fr" \ -F "mode=known_text" \ -F "reference_text=Le petit chat dort sur le tapis." # Batch — ZIP must contain manifest.csv + audio files curl -X POST "https://platform.vocametrix.com/api/coaching-analysis/batch" \ -H "X-API-Key: your-api-key" \ -F "archive=@./batch.zip" # Response: {"job_id":"ca_batch_abc123","status":"queued","n_files":42,"poll_url":"/api/coaching-analysis/batch/ca_batch_abc123",...} # Poll a batch job curl -X GET "https://platform.vocametrix.com/api/coaching-analysis/batch/ca_batch_abc123" \ -H "X-API-Key: your-api-key" ``` --- ## Therapy Plan Generator API (Python workflow) **Group:** Therapy Planning Generate a clinical therapy plan from an audio recording via a multi-step async workflow: kick off, poll for status, retrieve result, then approve / modify / reject (human-in-the-loop). Backed by a LangGraph Python workflow using Google Gemini. Distinct from the Azure-AI-Foundry /api/therapy-planning-agent endpoint, which is a single-shot agent — this one is a multi-stage workflow with explicit approval gates. ### Endpoints #### POST https://platform.vocametrix.com/api/generate-therapy-plan Kick off therapy plan generation from an uploaded audio recording. Returns immediately (202) with a therapy_session_id; clients must poll /api/therapy-status/:sessionId for progress, then read the result via /api/therapy-result/:sessionId. Decrements credits before kickoff (returns 429 on insufficient credits). **Parameters:** - `fileId` — REQUIRED. fileId from a prior /api/assignFileId upload. - `patientId` — REQUIRED. Patient identifier matching ^[a-zA-Z0-9_-]{1,100}$. The literal "PT-UNKNOWN" is explicitly rejected. - `patientMetadata` — Optional. Object describing the patient. Max 10,000 bytes when JSON-serialized. Optional sub-fields: demographics.age (0–120), clinical_history.severity ("Mild"|"Moderate"|"Severe"|"Very Severe"), current_goals (max 20 entries), recent_progress_notes (max 50), previous_exercises (max 30). - `maxIterations` — Optional integer 1–5, default 2. Max critic-agent iterations within the workflow. - `therapyAgentTemperature` — Optional number, default 0.3. Generation temperature for the therapy-planner LLM. - `criticAgentTemperature` — Optional number, default 0. Generation temperature for the critic LLM. - `classificationSessionId` — Optional string. If present, links the new therapy session to a prior /api/classify-stuttering session so the workflow can use those classification results. - `useAzureML` — Optional boolean, default true. Whether to use Azure ML inside the workflow. **Response:** - `success` — Boolean — true when the job has been queued. - `message` — Human-readable status message. - `therapy_session_id` — Session ID of the form "THERAPY--". - `status` — "pending" — the initial state. - `status_url` — Convenience URL to poll status (/api/therapy-status/). - `result_url` — Convenience URL to fetch result once complete (/api/therapy-result/). - `estimated_time_seconds` — Approximate wall-clock estimate (e.g., 120 s). - `timestamp` — ISO 8601 timestamp. #### GET https://platform.vocametrix.com/api/therapy-status/:sessionId Poll the status of a therapy (or stuttering-classification) session. Returns 404 if the session is unknown or has expired (15-minute in-memory TTL), 403 if the session belongs to a different user. **Parameters:** - `sessionId` — Path parameter — the session ID. **Response:** - `success` — Boolean. - `session_id` — String — may be normalized from the input (the server does flexible lookup). - `status` — "pending" | "processing" | "complete" | "pending_approval" | "failed". - `progress_percent` — Number 0–100. - `progress` — Number — alias for progress_percent (kept for back-compat). - `status_message` — Free-form progress description. - `error_message` — String, or null. - `created_at` — ISO 8601 timestamp. - `updated_at` — ISO 8601 timestamp. - `completed_at` — ISO 8601 timestamp, or null until complete. - `result_available` — Boolean — true when status is "complete" or "pending_approval". - `approval_required` — Boolean — true when status is "pending_approval". - `generated_prompts` — Array, or null. Workflow-internal prompts (mostly for debugging). - `timestamp` — ISO 8601 server time. #### GET https://platform.vocametrix.com/api/therapy-result/:sessionId Fetch the final result of a therapy or classification session. Returns 400 "Result not ready" if status is not "complete" or "pending_approval". Result shape branches by session type. Same endpoint is used to read stuttering-classification results. **Parameters:** - `sessionId` — Path parameter — the session ID. **Response:** - `` — { success, message, nodejs_session_id, ...result } — `result` is whatever the LangGraph workflow saved. Typical fields include `therapySession.sessionMetadata`, exercise plans, generated HTML paths (`html_clinician_final`, `output_file`), and free-form workflow output. The full keyset is determined by the Python workflow, not the JS layer. - `` — { success, session_id, patient_id, classification, classificationMetadata, overallClassification, timestamp } — see /api/classify-stuttering documentation. #### POST https://platform.vocametrix.com/api/therapy-approve/:sessionId Approve, modify, or reject a generated therapy plan. The action determines the next step: "approve" locks the plan as final; "reject" discards it; "modify" + feedback re-runs the workflow with the feedback as additional context. **Parameters:** - `sessionId` — Path parameter — the session ID. Status must be "complete" or "pending_approval". - `action` — REQUIRED in body. "approve" | "modify" | "reject" (case-insensitive). - `feedback` — REQUIRED when action="modify". Free-form clinician notes describing what to change. Optional otherwise. **Response:** - `success` — Boolean. - `therapy_session_id` — Echo of the session ID. - `action` — Echo of the action. - `feedback` — Echo of the feedback (or null). - `delivery_status` — On approve: "approved_pending_delivery". On reject: "rejected". - `status` — On modify: "processing" — the workflow re-runs and the client must poll again. - `status_message` — Human-readable description. - `status_url` — (modify only) URL to poll the new run. - `result_url` — (modify only) URL to fetch the new result. - `timestamp` — ISO 8601. - `message` — Human-readable confirmation. #### GET https://platform.vocametrix.com/api/therapy-plan-html/:sessionId Download the clinician-facing HTML version of the generated therapy plan. Returns text/html with Content-Disposition: attachment so browsers download it directly. Returns 400 if the plan is not yet complete, 404 if the session or HTML file is missing. **Parameters:** - `sessionId` — Path parameter — the session ID. **Response:** - `` — text/html document. NOT a JSON response. Browsers receive it as `Content-Disposition: attachment; filename="therapy-plan--.html"`. #### POST https://platform.vocametrix.com/api/therapy-revise/:sessionId DEPRECATED. Always returns 410 Gone. Migrate to POST /api/therapy-approve/:sessionId with action="modify" + feedback. The deprecation response includes a `recommended_endpoint` and `recommended_payload` field to guide migration. **Parameters:** - `sessionId` — Path parameter — the session ID. - `feedback` — Body — required for the deprecation handler to confirm intent (returns 400 if missing). - `maxIterations` — Body, optional. Accepted but unused. **Response:** - `success` — Always false. - `error` — "Endpoint deprecated". - `recommended_endpoint` — "POST /api/therapy-approve/:sessionId". - `recommended_payload` — { action: "modify", feedback }. ### Capabilities **Multi-step async workflow:** - Step 1: POST /api/generate-therapy-plan to kick off — returns therapy_session_id immediately (202) - Step 2: Poll GET /api/therapy-status/:sessionId until status is "complete" or "pending_approval" - Step 3: GET /api/therapy-result/:sessionId to read the generated plan - Step 4: POST /api/therapy-approve/:sessionId with action="approve"|"modify"|"reject" - Step 5: GET /api/therapy-plan-html/:sessionId to download the clinician HTML **Session lifetime and isolation:** - Sessions live in an in-memory store with a 15-minute TTL — long polls past 15 min will 404 - Sessions are scoped to the API key's user_id; cross-user reads return 403 - Stuttering-classification sessions live in the same store and use the same status/result endpoints **Human-in-the-loop approval:** - /api/therapy-approve/:sessionId with action="modify" + feedback re-runs the workflow - Each approval round is appended to session.human_approval_history - Use action="approve" to lock the plan as final, action="reject" to discard - (/api/therapy-revise/:sessionId is hard-deprecated and always returns 410 — use approve+modify instead) ### Example (cURL) ```bash # Step 1: Upload audio curl -X POST "https://platform.vocametrix.com/api/assignFileId" \ -H "X-API-Key: your-api-key" \ -F "audio=@./session.wav" -F "email=clinician@example.com" # {"fileId":"f12345"} # Step 2: Kick off therapy plan generation curl -X POST "https://platform.vocametrix.com/api/generate-therapy-plan" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "fileId": "f12345", "patientId": "P-12345", "patientMetadata": { "demographics": {"age": 8}, "clinical_history": {"severity": "Moderate"}, "current_goals": ["Reduce stuttering on /s/"] }, "maxIterations": 2 }' # {"success": true, "therapy_session_id": "THERAPY-P-12345-abc123", "status": "pending", ...} # Step 3: Poll status (loop until status is "complete" or "pending_approval") curl -X GET "https://platform.vocametrix.com/api/therapy-status/THERAPY-P-12345-abc123" \ -H "X-API-Key: your-api-key" # Step 4: Read the generated plan curl -X GET "https://platform.vocametrix.com/api/therapy-result/THERAPY-P-12345-abc123" \ -H "X-API-Key: your-api-key" # Step 5a: Approve as final curl -X POST "https://platform.vocametrix.com/api/therapy-approve/THERAPY-P-12345-abc123" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{"action": "approve"}' # Step 5b: Or request modifications (workflow re-runs, poll again) curl -X POST "https://platform.vocametrix.com/api/therapy-approve/THERAPY-P-12345-abc123" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{"action": "modify", "feedback": "Add more articulation drills for /r/."}' # Step 6: Download the HTML report (binary download — pipe to a file) curl -X GET "https://platform.vocametrix.com/api/therapy-plan-html/THERAPY-P-12345-abc123" \ -H "X-API-Key: your-api-key" \ --output therapy-plan.html ``` --- ## AI Agents **Group:** AI Agents Specialized AI agents for speech therapy and educational content generation ### Endpoints #### POST https://platform.vocametrix.com/api/speech-exercise-generator Generate personalized speech therapy exercises. The agent returns a single conversational reply containing the exercises as structured text — parse the `response` string to extract individual exercises. **Parameters:** - `message` — REQUIRED. Instructions or context for the exercise generation (e.g., "Generate 5 articulation exercises for /r/"). - `ageLevel` — Patient age range (e.g., "3-6 years", "7-12 years", "adults") - `speechChallenge` — The specific speech challenge to address (e.g., "R sounds", "S sounds") - `language` — Target language for exercises (English, French, Spanish) **Response:** - `response` — The agent reply containing the generated exercises as structured text - `threadId` — Conversation thread ID — pass it back to continue the same session - `agentName` — Identifier of the agent that produced the reply - `runStatus` — Internal run state (e.g., "completed") - `remaining_credits` — Number of API credits remaining in your account - `isAnonymousSession` — Boolean — true if the call used an anonymous_ key - `parameters` — Echo of the parameters used to produce the reply (ageLevel, speechChallenge, language) #### POST https://platform.vocametrix.com/api/word-list-generator Generate word lists tailored to specific phonemes with helpful hints **Parameters:** - `language` — Target language for words (english, french, spanish, german) - `age` — Patient age range (e.g., "3-6 years", "7-12 years", "adults") - `selectedSound` — Object containing symbol (IPA phoneme) and position (beginning/middle/end/random) **Response:** - `words` — String containing 20 words separated by semicolons - `hints` — String containing 20 corresponding hints separated by semicolons - `wordHintPairs` — Array of objects with word and hint properties #### POST https://platform.vocametrix.com/api/spell-agent Interpret raw speech-to-text transcriptions of spelling attempts and provide intelligent feedback **Parameters:** - `text` — The speech-to-text transcription of user's spoken spelling attempt - `word` — The target word to compare against (with proper accents if applicable) - `language` — Language code (e.g., "en-US", "fr-FR", "es-ES") for language-appropriate feedback **Response:** - `output` — Spelled word in uppercase with any recognized accents - `match` — Boolean indicating whether the spelling matches the target word - `explanation` — Detailed explanation in the target language providing educational feedback - `remaining_credits` — Number of remaining API credits #### POST https://platform.vocametrix.com/api/speech-therapist-assistant Expert speech therapy assistant providing role-based guidance for therapists, patients, and caregivers. Text-only — file uploads are not currently supported on this endpoint. **Parameters:** - `input` — The user's query or question about speech therapy - `accountType` — User role: "slt" (therapist), "patient", or "parent" for role-based responses - `threadId` — Conversation thread ID for maintaining context between messages **Response:** - `response` — The AI assistant's detailed answer with evidence-based recommendations and resources - `threadId` — Conversation thread ID for continuing the conversation - `remaining_credits` — Number of API credits remaining in your account #### POST https://platform.vocametrix.com/api/language-chat-vocabulary Interactive conversational AI that helps users practice language skills with real-time vocabulary enhancement **Parameters:** - `message` — REQUIRED. The user's message in their target language - `language` — REQUIRED. Target language the user is learning (en-US, fr-FR, es-ES, de-DE, etc.) - `nativeLanguage` — REQUIRED. The user's native language code — used to localize vocabulary hints and corrections - `ageLevel` — Age/level (child-beginner, teen-intermediate, adult-advanced, etc.) - `topic` — Conversation topic (family, travel, food, work, hobbies, etc.) - `threadId` — Optional thread ID to continue previous conversation - `email` — Optional user email for tracking **Response:** - `response` — The AI assistant's conversational reply with vocabulary enhancements - `threadId` — Thread ID for continuing the conversation - `remaining_credits` — Number of API credits remaining in your account - `configuration` — Object containing the current language, age level, and topic #### POST https://platform.vocametrix.com/api/voice-metrics-interpreter Translate raw voice metrics (jitter, shimmer, HNR, CPPS, etc.) into a clinical-language interpretation with severity, problematic metrics, recommended actions, and next steps. Backed by an Azure AI Foundry agent. Useful as a follow-up call after AVQI/DSI/CPP/HNR calculators to surface human-readable findings. **Parameters:** - `metrics` — REQUIRED. Object of voice metrics — typical keys: jitter, shimmer, hnr, cpps, etc. (validated server-side). - `age` — REQUIRED. Patient age in years (1–120). - `gender` — REQUIRED. "male" | "female" | "other". - `languageCode` — Optional. Output language code — server applies a default via validateLanguageCode if omitted. - `threadId` — Optional. Azure AI Foundry thread ID for multi-turn continuity. - `praatResults` — Optional. Full Praat output object — when present, the server merges F0 / expected-range fields into the metrics for richer interpretation. - `email` — Optional. Used by anonymous-key validation flow. **Response:** - `success` — Boolean. - `overallScore` — Number 0–100, or -1 if the agent's response could not be parsed. - `pathologyLevel` — Categorical string describing pathology severity. - `problematicMetrics` — Array of metric names flagged as problematic. - `interpretation` — Object: { summary, overallAssessment, metrics: [...], additionalNotes: [...], recommendedActions: [{category, recommendation}], riskLevel, nextSteps }. - `metadata` — Object: { languageCode, age, gender, metricsProcessed, threadId, agentName, processingTimeSeconds, timestamp, genderValidation: {detectedF0, expectedRange, f0Validity, note} | null }. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean — true if the call used an anonymous_ key. #### POST https://platform.vocametrix.com/api/syntax-checker-agent Linguistic syntax/grammar checker. Returns a structured analysis with overall score, per-issue breakdown by severity and type (grammar/spelling/punctuation/style/clarity), suggestions, corrected text, and readability stats. **Parameters:** - `text` — REQUIRED. Text to analyze (≤ 5000 characters). - `locale` — REQUIRED. Language code (validated via validateLanguageCode). - `threadId` — Optional. Thread ID for multi-turn continuity. - `email` — Optional. Used by anonymous-key validation flow. **Response:** - `success` — Boolean. - `analysis` — Object: { overall_score, language_detected, text_length, analysis_timestamp, issues: [...], suggestions: [{category, type, description, examples}], statistics: {total_issues, by_severity: {high, medium, low}, by_type: {grammar, spelling, punctuation, style, clarity}}, corrected_text, readability: {grade_level, reading_ease, avg_sentence_length} }. - `metadata` — Object: { locale, textLength, threadId, agentName, processingTimeSeconds, timestamp }. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean. #### POST https://platform.vocametrix.com/api/vocabulary-tutor-agent Conversational language-tutor agent that adapts vocabulary to the learner's native language, target language, age group, and topic. WARNING: This endpoint does not currently enforce X-API-Key validation or credit decrement at the controller level (unlike other agents) — treat this as undocumented internal behavior subject to change. **Parameters:** - `message` — REQUIRED. The learner's message in the target language. - `nativeLanguage` — REQUIRED. The learner's native language code. - `targetLanguage` — REQUIRED. The language the learner is studying. - `ageGroup` — REQUIRED. Categorical age group label. - `topic` — REQUIRED. Conversation topic. - `threadId` — Optional. Thread ID for multi-turn continuity. **Response:** - `success` — Boolean. - `response` — String — the agent's free-form text reply. - `threadId` — Thread ID for the conversation (use to continue). - `metadata` — Object: { nativeLanguage, targetLanguage, ageGroup, topic, runId, runStatus }. #### POST https://platform.vocametrix.com/api/adaptive-exercise-agent Adapts a speech-therapy exercise to a learner profile (ADHD, dyslexia, dysgraphia, dyspraxia, Tourette syndrome, autism). Returns the exercise as adapted HTML ready to render. **Parameters:** - `exerciseText` — REQUIRED. The original exercise text. - `profile` — REQUIRED. One of "adhd" | "dyslexia" | "dysgraphia" | "dyspraxia" | "tourette" | "autism" (case-insensitive). Other values return 400 with a `validProfiles` field. - `includeTips` — Optional, default false. If true, the agent includes practitioner tips alongside the adapted exercise. - `email` — Optional. Used by anonymous-key validation flow. **Response:** - `success` — Boolean. - `adaptedHTML` — String — HTML version of the exercise adapted for the chosen profile. - `metadata` — Object: { profile, includeTips, threadId, agentName, processingTimeSeconds, timestamp }. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean. #### POST https://platform.vocametrix.com/api/french-to-ipa-agent Convert French words to IPA (International Phonetic Alphabet) transcription. Accepts either a single word string or a JSON-stringified array of up to 20 words. Returns the same shape (single object vs. array) that was supplied. **Parameters:** - `phoneticInput` — REQUIRED. Either a single French word as a string, OR a JSON-stringified array of up to 20 strings. The server detects which by attempting JSON.parse. - `threadId` — Optional. Thread ID for multi-turn continuity. - `email` — Optional. Used by anonymous-key validation flow. **Response:** - `success` — Boolean. - `result` — Object OR array of objects, mirroring the input shape. Each entry has the form { word, ipa: [...], ... }. The server checks `result.ipa.length` to count transcriptions; the rest of the shape is determined by the agent. - `threadId` — Thread ID. - `agentName` — Agent identifier. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean. #### POST https://platform.vocametrix.com/api/therapy-planning-agent Single-shot Azure AI Foundry agent that generates a therapy recommendation from session metadata + wav2vec output. NOTE: This is DIFFERENT from /api/generate-therapy-plan (Python LangGraph workflow) — this one is a single agent call returning a recommendation directly, without the multi-step approve/modify lifecycle. Polls up to 120 seconds. **Parameters:** - `session_metadata` — REQUIRED. Object — must include `patient_id` (string). May also include `session_id`, `timestamp`, `audio_file`. - `wav2vec_output` — REQUIRED. Object — must include `summary_statistics` (typically `{disfluency_types_detected, overall_fluency_rate, total_segments, ...}`). - `patient_anamnesis` — Optional. Patient context — `demographics: {age}`, `clinical_history: {diagnosis, severity}`, `therapy_information: {current_treatment_approach, total_sessions_completed}`, etc. - `email` — Optional. Used by anonymous-key validation flow. **Response:** - `success` — Boolean. - `threadId` — Azure AI Foundry thread ID (always a new thread for this endpoint). - `runId` — Run ID. - `agentName` — Agent identifier. - `status` — Run status string. - `timestamp` — ISO 8601 timestamp. - `recommendation` — Object — preferred shape contains `primary_recommendation` and related structured fields when the agent's JSON parses successfully. Fallback shape on parse failure: { raw_response, structured_content: bool, analysis_summary }. - `allMessages` — Array — full thread message history. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean. - `metadata` — Object: { patient_id, session_id, processing_time_seconds, disfluency_types, fluency_rate, total_segments_analyzed, response_metadata: {responseLength, containsStructuredData, parseSuccess} }. #### POST https://platform.vocametrix.com/api/language-chat-pronunciation Conversational language-coach agent specialized in pronunciation feedback. Similar to /api/language-chat-vocabulary but with a pronunciation-coaching focus and optional `assessmentHistory` for grounding the conversation in prior pronunciation-assessment results. **Parameters:** - `message` — REQUIRED. The learner's message in the target language. - `language` — REQUIRED. Target language being learned. - `ageLevel` — REQUIRED. Categorical age/level label. - `topic` — REQUIRED. Conversation topic. - `nativeLanguage` — Optional. The learner's native language code. - `threadId` — Optional. Thread ID for multi-turn continuity. - `email` — Optional. Used by anonymous-key validation flow. - `assessmentHistory` — Optional. Array or object of prior pronunciation-assessment results to feed into the coach for grounded feedback. **Response:** - `response` — String — the coach's text reply. - `threadId` — Thread ID. - `runStatus` — Run status string. - `remaining_credits` — Number. - `isAnonymousSession` — Boolean. - `configuration` — Object — passthrough of resolved language, nativeLanguage, ageLevel, topic. ### Capabilities **Available Agents:** - Speech Exercise Generator Agent - Creates personalized speech therapy exercises - Word List Generator Agent - Generates therapy-focused word lists with hints - Spelling Interpretation Agent - Analyzes spoken spelling attempts and provides detailed feedback - Speech Therapist Assistant - Expert AI assistant for therapists, patients, and caregivers - Language Learning Vocabulary Coach - Interactive conversation practice with vocabulary enhancement - More specialized agents coming soon **Integration Features:** - Simple REST API integration with any platform or language - Consistent response formats with structured JSON - Built-in language support for English, French, Spanish, and German - Age-appropriate content generation based on patient needs - Easy to understand error messages and validation