Metadata-Version: 2.4
Name: azure-ai-voicelive
Version: 1.2.0b5
Summary: Microsoft Corporation Azure Ai Voicelive Client Library for Python
Author-email: Microsoft Corporation <azpysdkhelp@microsoft.com>
License-Expression: MIT
Project-URL: repository, https://github.com/Azure/azure-sdk-for-python
Keywords: azure,azure sdk
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: isodate>=0.6.1
Requires-Dist: azure-core>=1.37.0
Requires-Dist: typing-extensions>=4.6.0
Provides-Extra: aiohttp
Requires-Dist: aiohttp<4.0.0,>=3.9.0; extra == "aiohttp"
Provides-Extra: test
Requires-Dist: aiohttp<4.0.0,>=3.9.0; extra == "test"
Requires-Dist: azure-identity; extra == "test"
Requires-Dist: pyaudio; (platform_python_implementation == "CPython" and python_version < "3.13" and sys_platform == "win32") and extra == "test"
Requires-Dist: pytest-asyncio>=0.23; extra == "test"
Requires-Dist: pytest-rerunfailures>=13.0; extra == "test"
Requires-Dist: pytest>=8.0; extra == "test"
Requires-Dist: python-dotenv; extra == "test"
Requires-Dist: soundfile; extra == "test"
Dynamic: license-file

Azure AI VoiceLive client library for Python
============================================

This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.
It opens a WebSocket session to stream microphone audio to the service and receive
typed server events (including audio) for responsive, interruptible conversations.

> **Status:** General Availability (GA). This is a stable release suitable for production use.

> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.

---

Getting started
---------------

### Prerequisites

- **Python 3.9+**
- An **Azure subscription**
- A **VoiceLive** resource and endpoint
- A working **microphone** and **speakers/headphones** if you run the voice samples

### Install

Install the stable GA version:

```bash
# Base install (core client only)
python -m pip install azure-ai-voicelive

# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"

# For voice samples (includes audio processing)
# First install PyAudio dependencies for your platform:
#   Linux: sudo apt-get install -y portaudio19-dev libasound2-dev
#   macOS: brew install portaudio
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
```

The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.

### Authenticate

You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.

#### API Key Authentication (Quick Start)

Set environment variables in a `.env` file or directly in your environment:

```bash
# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"
```

Then, use the key in your code:

```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect

async def main():
    async with connect(
        endpoint="your-endpoint",
        credential=AzureKeyCredential("your-api-key"),
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())
```

#### AAD Token Authentication

For production applications, AAD authentication is recommended:

```python
import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.ai.voicelive import connect

async def main():
    credential = DefaultAzureCredential()
    
    async with connect(
        endpoint="your-endpoint",
        credential=credential,
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())
```

---

Key concepts
------------

- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service
- **Session Management** – Configure conversation parameters:
  - **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
  - **RequestSession** – Strongly-typed session configuration
  - **ServerVad** – Configure voice activity detection
  - **AzureStandardVoice** – Configure voice settings
- **Audio Handling**:
  - **InputAudioBufferResource** – Manage audio input to the service with async methods
  - **OutputAudioBufferResource** – Control audio output from the service with async methods
- **Conversation Management**:
  - **ResponseResource** – Create or cancel model responses with async methods
  - **ConversationResource** – Manage conversation items with async methods
- **Error Handling**: 
  - **ConnectionError** – Base exception for WebSocket connection errors
  - **ConnectionClosed** – Raised when WebSocket connection is closed
- **Strongly-Typed Events** – Process service events with type safety:
  - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
  - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
  - `ERROR`, and more

---

Examples
--------

### Basic Voice Assistant (Featured Sample)

The Basic Voice Assistant sample demonstrates full-featured voice interaction with:

- Real-time speech streaming
- Server-side voice activity detection  
- Interruption handling
- High-quality audio processing

```bash
# Run the basic voice assistant sample
# Requires [aiohttp] for async
python samples/basic_voice_assistant_async.py

# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
```

### Minimal example

```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
)

API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"

async def main():
    async with connect(
        endpoint=ENDPOINT,
        credential=AzureKeyCredential(API_KEY),
        model=MODEL,
    ) as conn:
        session = RequestSession(
            modalities=[Modality.TEXT, Modality.AUDIO],
            instructions="You are a helpful assistant.",
            input_audio_format=InputAudioFormat.PCM16,
            output_audio_format=OutputAudioFormat.PCM16,
            turn_detection=ServerVad(
                threshold=0.5, 
                prefix_padding_ms=300, 
                silence_duration_ms=500
            ),
        )
        await conn.session.update(session=session)

        # Process events
        async for evt in conn:
            print(f"Event: {evt.type}")
            if evt.type == ServerEventType.RESPONSE_DONE:
                break

asyncio.run(main())
```

Available Voice Options
-----------------------

### Azure Neural Voices

```python
# Use Azure Neural voices
voice_config = AzureStandardVoice(
    name="en-US-AvaNeural",  # Or another voice name
    type="azure-standard"
)
```

Popular voices include:

- `en-US-AvaNeural` - Female, natural and professional
- `en-US-JennyNeural` - Female, conversational
- `en-US-GuyNeural` - Male, professional

### OpenAI Voices

```python
# Use OpenAI voices (as string)
voice_config = "alloy"  # Or another OpenAI voice
```

Available OpenAI voices:

- `alloy` - Versatile, neutral
- `echo` - Precise, clear
- `fable` - Animated, expressive
- `onyx` - Deep, authoritative
- `nova` - Warm, conversational
- `shimmer` - Optimistic, friendly

---

Handling Events
---------------

```python
async for event in connection:
    if event.type == ServerEventType.SESSION_UPDATED:
        print(f"Session ready: {event.session.id}")
        # Start audio capture
        
    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
        print("User started speaking")
        # Stop playback and cancel any current response
        
    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
        # Play the audio chunk
        audio_bytes = event.delta
        
    elif event.type == ServerEventType.ERROR:
        print(f"Error: {event.error.message}")
```

---

Troubleshooting
---------------

### Connection Issues

- **WebSocket connection errors (1006/timeout):**  
  Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.

- **Missing WebSocket dependencies:**  
  If you see import errors, make sure you have installed the package:
    pip install azure-ai-voicelive[aiohttp]

- **Auth failures:**  
  For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.

### Audio Device Issues

- **No microphone/speaker detected:**  
  Check device connections and permissions. On headless CI environments, audio samples can't run.

- **Audio library installation problems:**  
  On Linux/macOS you may need PortAudio:

  ```bash
  # Debian/Ubuntu
  sudo apt-get install -y portaudio19-dev libasound2-dev
  # macOS (Homebrew)
  brew install portaudio
  ```

### Enable Verbose Logging

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

---

Next steps
----------

1. **Run the featured sample:**
   - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation

2. **Customize your implementation:**
   - Experiment with different voices and parameters
   - Add custom instructions for specialized assistants
   - Integrate with your own audio capture/playback systems

3. **Advanced scenarios:**
   - Add function calling support
   - Implement tool usage
   - Create multi-turn conversations with history

4. **Explore other samples:**
   - Check the `samples/` directory for specialized examples
   - See `samples/README.md` for a full list of samples

---

Contributing
------------

This project follows the Azure SDK guidelines. If you'd like to contribute:

1. Fork the repo and create a feature branch
2. Run linters and tests locally
3. Submit a pull request with a clear description of the change

---

Release notes
-------------

Changelogs are available in the package directory.

---

License
-------

This project is released under the **MIT License**.

# Release History

## 1.2.0b5 (Unreleased)

### Features Added

### Breaking Changes

### Bugs Fixed

### Other Changes

- Updated default API version to `2026-01-01-preview`.

## 1.2.0b4 (2026-02-12)

### Features Added

- **Agent Session Configuration**: Added `AgentSessionConfig` TypedDict for configuring Azure AI Foundry agents at connection time:
  - `agent_name`: The name of the agent (required)
  - `project_name`: The Foundry project containing the agent (required)
  - `agent_version`: Optional version specification
  - `conversation_id`: Optional existing conversation ID to continue
  - `authentication_identity_client_id`: Optional client ID for authentication
  - `foundry_resource_override`: Optional Foundry resource override
- **Server Warning Events**: Added `ServerEventWarning` and `ServerEventWarningDetails` for handling non-fatal warnings from the service
- **New Event Type**: Added `ServerEventType.WARNING` for warning event handling

### Breaking Changes

- **Removed Foundry Agent Tools**: The following classes and enums related to Foundry agent tools have been removed:
  - `FoundryAgentTool` - Use `AgentSessionConfig` with `connect()` instead
  - `ResponseFoundryAgentCallItem`
  - `FoundryAgentContextType` enum
  - `ToolType.FOUNDRY_AGENT` enum value
  - `ItemType.FOUNDRY_AGENT_CALL` enum value
  - `ServerEventResponseFoundryAgentCallArgumentsDelta`
  - `ServerEventResponseFoundryAgentCallArgumentsDone`
  - `ServerEventResponseFoundryAgentCallInProgress`
  - `ServerEventResponseFoundryAgentCallCompleted`
  - `ServerEventResponseFoundryAgentCallFailed`
  - Related `ServerEventType` enum values for Foundry agent events

## 1.2.0b3 (2026-02-02)

### Features Added

- **Support for Explicit Null Values**: Enhanced `RequestSession` to properly serialize explicitly set `None` values (e.g., `turn_detection=None` now correctly sends `"turn_detection": null` in the WebSocket message)
- **Interim Response Configuration**: Added support for interim response generation during latency or tool calls:
  - `StaticInterimResponseConfig` for static interim response texts that are randomly selected
  - `LlmInterimResponseConfig` for LLM-generated context-aware interim responses
  - `InterimResponseTrigger` enum with `latency` and `tool` triggers
  - `interim_response` field in `RequestSession` and `ResponseSession`
- **Foundry Agent Integration**: Added support for Azure AI Foundry agents:
  - `FoundryAgentTool` for defining Foundry agent configurations
  - `ResponseFoundryAgentCallItem` for Foundry agent call responses
  - `FoundryAgentContextType` enum for context management (`no_context`, `agent_context`)
  - Server events for Foundry agent call lifecycle: `ServerEventResponseFoundryAgentCallArgumentsDelta`, `ServerEventResponseFoundryAgentCallArgumentsDone`, `ServerEventResponseFoundryAgentCallInProgress`, `ServerEventResponseFoundryAgentCallCompleted`, `ServerEventResponseFoundryAgentCallFailed`
- **Reasoning Effort Control**: Added `reasoning_effort` field to `RequestSession`, `ResponseSession`, and `ResponseCreateParams` for controlling reasoning models effort levels with `ReasoningEffort` enum (`none`, `minimal`, `low`, `medium`, `high`, `xhigh`)
- **Response Metadata**: Added `metadata` field to `Response` and `ResponseCreateParams` for attaching up to 16 key-value pairs (max 64 chars for keys, 512 chars for values)
- **Array Encoding Support**: Enhanced serialization to support pipe, space, comma, and newline-delimited array encoding formats
- **Custom Text Normalization**: Added `custom_text_normalization_url` field to `AzureStandardVoice`, `AzureCustomVoice`, and `AzurePersonalVoice` for custom text normalization configurations
- **Avatar Scene Configuration**: Added `Scene` model for controlling avatar's zoom level, position (x/y), rotation (x/y/z pitch/yaw/roll), and movement amplitude in the video frame
- **Enhanced Avatar Configuration**: Added `scene` and `output_audit_audio` fields to `AvatarConfig` for scene control and audit audio forwarding via WebSocket

### Other Changes

- **Dependency Update**: Updated minimum `azure-core` version from 1.36.0 to 1.37.0
- **Security Enhancement**: Removed `eval()` usage in serialization utilities, replaced with explicit type checking for improved security
- **Serialization Improvements**: Enhanced model_base deserialization for mutable types and array-encoded strings

### Bug Fixes

- **Audio Format Values**: Fixed `OutputAudioFormat` enum values to use underscore format (`pcm16_8000hz`, `pcm16_16000hz`) instead of hyphenated format for consistency with wire protocol and backward compatibility

## 1.2.0b2 (2025-11-20)

### Features Added

- **Enhanced Avatar Configuration**: Expanded avatar functionality with new configuration options:
  - Added `AvatarConfigTypes` enum with support for `video-avatar` and `photo-avatar` types
  - Added `PhotoAvatarBaseModes` enum for photo avatar base models (e.g., `vasa-1`)
  - Added `AvatarOutputProtocol` enum for avatar streaming protocols (`webrtc`, `websocket`)
  - Enhanced `AvatarConfig` model with new properties: `type`, `model`, and `output_protocol`
- **Image Content Support**: Added support for image inputs in conversations:
  - New `RequestImageContentPart` model for including images in requests
  - New `RequestImageContentPartDetail` enum for controlling image detail levels (`auto`, `low`, `high`)
  - Added `INPUT_IMAGE` to `ContentPartType` enum
  - Enhanced token details models (`InputTokenDetails`, `CachedTokenDetails`) with `image_tokens` tracking
- **Enhanced OpenAI Voices**: Added new OpenAI voice options:
  - Added `marin` and `cedar` voices to `OpenAIVoiceName` enum
- **Extended Azure Personal Voice Configuration**: Enhanced `AzurePersonalVoice` with additional customization options:
  - Added support for custom lexicon via `custom_lexicon_url`
  - Added `prefer_locales` for locale preferences
  - Added `locale`, `style`, `pitch`, `rate`, and `volume` properties for fine-tuned voice control
- **Enhanced MCP Server Events**: Added completion status events for MCP tool calls:
  - `ServerEventResponseMcpCallInProgress` for tracking in-progress MCP calls
  - `ServerEventResponseMcpCallCompleted` for successful MCP call completion
  - `ServerEventResponseMcpCallFailed` for failed MCP calls
- **Pre-generated Assistant Messages**: Added support for pre-generated assistant messages in `ResponseCreateParams` via the `pre_generated_assistant_message` property

## 1.2.0b1 (2025-11-14)

### Features Added

- **MCP (Model Context Protocol) Support**: Added comprehensive support for Model Context Protocol integration:
  - `MCPServer` tool type for defining MCP server configurations with authorization, headers, and approval requirements
  - `MCPTool` model for representing MCP tool definitions with input schemas and annotations
  - `MCPApprovalType` enum for controlling approval workflows (`never`, `always`, or tool-specific)
  - New item types: `MCPApprovalResponseRequestItem`, `ResponseMCPApprovalRequestItem`, `ResponseMCPApprovalResponseItem`, `ResponseMCPCallItem`, and `ResponseMCPListToolItem`
  - New server events: `ServerEventMcpListToolsInProgress`, `ServerEventMcpListToolsCompleted`, `ServerEventMcpListToolsFailed`, `ServerEventResponseMcpCallArgumentsDelta`, and `ServerEventResponseMcpCallArgumentsDone`
  - Client event `MCP_APPROVAL_RESPONSE` for responding to approval requests
  - Enhanced `ItemType` enum with MCP-related types: `mcp_list_tools`, `mcp_call`, `mcp_approval_request`, and `mcp_approval_response`

## 1.1.0 (2025-11-03)

### Features Added

- Added support for Agent configuration through the new `AgentConfig` model
- Added `agent` field to `ResponseSession` model to support agent-based conversations
- The `AgentConfig` model includes properties for agent type, name, description, agent_id, and thread_id

## 1.1.0b1 (2025-10-06)

### Features Added

- **AgentConfig Support**: Re-introduced `AgentConfig` functionality with enhanced capabilities:
  - `AgentConfig` model added back to public API with full import and export support
  - `agent` field re-added to `ResponseSession` model for session-level agent configuration
  - Updated cross-language package mappings to include `AgentConfig` support
  - Provides foundation for advanced agent configuration scenarios

## 1.0.0 (2025-10-01)

### Features Added

- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:
  - Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control
  - Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration
  - Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)
  - Improved documentation with clearer descriptions for all connection parameters
  - Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)
  - More robust option mapping with proper type conversion and safety checks
- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:
  - `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals
  - Better IntelliSense support and compile-time type checking for content part discriminators

### Breaking Changes

- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:
  - `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming
  - `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning
  - `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency
  - Updated type unions and imports to reflect the new naming conventions
  - Cross-language package mappings updated to maintain compatibility across SDKs
- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:
  - `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`
  - All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited
  - This provides clearer separation of concerns between request and response session configurations
  - May affect type checking and code that relied on the previous inheritance relationship
- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:
  - `AgentConfig` class has been completely removed from imports and exports
  - `agent` field removed from `ResponseSession` model (including constructor parameter)
  - Updated cross-language package mappings to reflect the removal
- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:
  - Class name changed from `EOUDetection` to `EouDetection` 
  - All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`
  - Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes
  - Import statements and exports updated to reflect the new naming
- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:
  - `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `"input_audio"`
  - `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `"input_text"`  
  - `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `"text"`

### Other Changes

- Initial GA release

## 1.0.0b5 (2025-09-26)

### Features Added

- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:
  - `LOW` for low sensitivity threshold level
  - `MEDIUM` for medium sensitivity threshold level  
  - `HIGH` for high sensitivity threshold level
  - `DEFAULT` for default sensitivity threshold level
- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:
  - `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum
  - Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`
  - Improved documentation for threshold level parameters
- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:
  - All enum types and their functionality
  - Model creation, validation, and serialization
  - Async connection functionality with proper mocking
  - Client event handling and workflows
  - Voice configuration across all supported types
  - Message handling with content part hierarchy
  - Integration scenarios and real-world usage patterns
  - Recent changes validation and backwards compatibility
- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)
- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:
  - `AZURE_CUSTOM` for custom voice configurations
  - `AZURE_STANDARD` for standard voice configurations  
  - `AZURE_PERSONAL` for personal voice configurations
- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items
- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:
  - Added detailed docstrings for model classes and their parameters
  - Enhanced enum value documentation with descriptions
  - Improved type annotations and parameter descriptions
- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:
  - Added `threshold_level` parameter with options: `"low"`, `"medium"`, `"high"`, `"default"` (recommended over deprecated `threshold`)
  - Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)
- **Video Background Support**: Added new `Background` model for video background customization:
  - Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)
  - Support for image URL backgrounds
  - Mutually exclusive color and image URL options
- **Enhanced Video Parameters**: Extended `VideoParams` model with:
  - `background` parameter for configuring video backgrounds using the new `Background` model
  - `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support
- **Package Structure Modernization**: Simplified package initialization with namespace package support
- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management

### Breaking Changes

- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency
- **Model Refactoring**: 
  - Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy
  - All message items now require a `content` field with list of `MessageContentPart` objects
  - `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone
- **Enhanced Type Safety**: 
  - Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals
  - Message role discriminators now use `MessageRole` enum values for better type safety
- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:
  - Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)
  - Removed `timeout` parameter from all semantic detection classes
  - Users must now use `threshold_level` and `timeout_ms` parameters respectively
- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
  - Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation
  - Removed sync `basic_voice_assistant.py` sample (only async version remains)
  - Simplified sync patch to minimal structure with empty exports
  - All functionality now available only through async patterns
- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:
  - Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency
  - Removed `websockets` optional dependency as sync API no longer exists
  - Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`
- **Model Rename**:
  - Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions
  - Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants
- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety

### Bug Fixes

- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety

### Other Changes

- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:
  - 8 main test files with 200+ individual test methods
  - Tests for all enums, models, async operations, client events, voice configurations, and message handling
  - Integration tests covering real-world scenarios and recent changes
  - Proper mocking for async WebSocket connections
  - Backwards compatibility validation
  - Test coverage for all recent changes and enhancements
- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
- **Documentation Updates**: Comprehensive updates to all markdown documentation:
  - Updated README.md to reflect async-only nature with updated examples and installation instructions
  - Updated samples README.md to remove sync sample references
  - Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
  - Added MIGRATION_GUIDE.md for users upgrading from previous versions

## 1.0.0b4 (2025-09-19)

### Features Added

- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models
- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:
  - `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`
  - `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`
- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control
- **Improved Error Handling**: Added `ErrorResponse` class for better error management
- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance
- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity
- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity
- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16

### Breaking Changes

- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`
- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity
- **Enum Reorganization**:
  - Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums
  - Removed `Phi4mmVoice` enum
  - Removed `EMOTION` value from `AnimationOutputType` enum
  - Removed `IN_PROGRESS` value from `ItemParamStatus` enum
- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum

### Other Changes

- **Package Structure**: Simplified package initialization with namespace package support
- **Sample Updates**: Improved basic voice assistant samples
- **Code Optimization**: Streamlined model definitions with significant code reduction
- **API Configuration**: Updated API view properties for better tooling support

## 1.0.0b3 (2025-09-17)

### Features Added

- **Transcription improvement**: Added phrase list
- **New Voice Types**: Added `AzurePlatformVoice` and `LLMVoice` classes
- **Enhanced Speech Detection**: Added `AzureSemanticVadServer` class
- **Improved Function Calling**: Enhanced async function calling sample with better error handling
- **English-Specific Detection**: Added `AzureSemanticDetectionEn` class for optimized English-only semantic end-of-utterance detection
- **English-Specific Voice Activity Detection**: Added `AzureSemanticVadEn` class for enhanced English-only voice activity detection

### Breaking Changes

- **Transcription**: Removed `custom_model` and `enabled` from `AudioInputTranscriptionSettings`.
- **Async Authentication**: Fixed credential handling for async scenarios
- **Model Serialization**: Improved error handling and deserialization

### Other Changes

- **Code Modernization**: Updated type annotations throughout

## 1.0.0b2 (2025-09-10)

### Features Added

- Async function call

### Bugs Fixed

- Fixed function calling: ensure `FunctionCallOutputItem.output` is properly serialized as a JSON string before sending to the service.

## 1.0.0b1 (2025-08-28)

### Features Added

- Added WebSocket connection support through `connect()`.
- Added `VoiceLiveConnection` for managing WebSocket connections.
- Added models of Voice Live preview.
- Added WebSocket-based examples in the samples directory.

### Other Changes

- Initial preview release.
