Skip to end of banner
Go to start of banner

ASR model management

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Intelligent Voice (IV) introduces new concepts when it comes to ASR models:

  • IV supports automatic language detection and the transcription of multi-language calls. When transcribing a call, language detection requires the definition of up to 3 languages (plus the language detection model). 

  • IV allows easy customization of ASR models (the lexical model), enabling customers to create new models (the process is called model adaptation, for more information, see https://support.intelligentvoice.com/hc/en-us/articles/360044447274-Model-adaptation). Both partners and customers can adopt models. Models adapted by customers should be private and not be shared with any other customer. Models adapted by partners can be potentially shared with multiple customers.

  • IV license articles limit the number of languages used to transcribe calls for individual users. A basic license only allows to use of a single language per user (different users can have different languages configured), other licenses allow the usage of multiple languages per user.

In order to support the new requirements, VFC introduces the concept of ASR model management.

ASR model administration

  • Create a new page under Data / Data Management / ASR Models, this item in the menu should only be available if one of the IV SKUs are present in the license

  • The ASR models need to be in sync with the IV system:

    • Manual synchronization would allow quick implementation but would not be prone to errors and would generate unnecessary support issues

    • Automatic synchronization would be ideal, but requires more work, the Get ASR Model List API returns all the available models in the IV system: https://api-docs.intelligentvoice.com/?version=latest#6b681c45-dc74-4645-94f5-b929669f2c0e

    • Mapping:

      • "id": 1 → this field is an auto increment ID in IV, we should use it as a primary means of identifying the model in IV

      • "createdBy": "IntelligentVoice" → this field can contain any text value and should be used only for information purposes

      • "languageCode": "en-GB" → this field identifies the language of the model, we need this information to enforce language restrictions per user (the basic license allows one language per user, but there can be multiple ASR models for the same language)

      • "sampleRate": "8kHz" → not sure what is this, I don't think it is relevant

      • "lexiconSize": 145000 → informational only, we should store and display it

      • "description": "general" → free text description of the model

      • "version": "V3_ASRv5" → TODO: ask IV what is this, if this is generated or should follow a pattern

    • The ASR Models page will list the configured models in the system:

      • The list will display the following fields:

        • Created By: IntelligentVoice or VerintPartner1 or CustomerTenant2

        • Language: en-GB

        • Lexicon Size: 145000

        • Description: Trader Voice ASR model

        • Version: V3_ASRv5

        • Type: Intelligent Voice (for future if we will have models for other integrations)

      • The list must be tenant aware, and only display the models available in the tenant

      • Permission required: ASR Model Read

    • The ASR model edit page allows:

      • Viewing ASR model data (see above), editing is not allowed (IV does not support updating any of the fields)

      • Associating models with VFC tenants

        • As explained above, the models can be restricted to specific tenants. By default, models should only be available to the reference environment (0000).

        • Autocomplete field to select one or more VFC tenants, option to configure visibility to all tenants

        • Permission required: ASR Model Update

      • Deletion

        • Delete button

        • Permission required: ASR Model Delete

        • Confirmation required: "Are you sure you want to permanently delete the "description" ASR Model?"

        • Check required: if the model is currently configured for any of the policies, "The "description" ASR Model cannot be deleted, because it is currently used by the following transcription policies: ..."

        • When a model gets deleted, it has to be deleted in IV as well: https://api-docs.intelligentvoice.com/?version=latest#15f500ef-5b34-4ba5-bcf5-d69d73b6fe95

      • Adaption

        • Adapt button

        • Permission required: ASR Model Create

        • A new page opens with a form to submit the data for model adaptation, TODO: design the adaptation page

        • When a model gets adapted, the following IV API must be used: https://api-docs.intelligentvoice.com/?version=latest#40d58fb1-e542-4024-b73f-eaaf726e5ddd

        • For now, administrators can adapt a model using the Intelligent Voice administration portal (/JumpToWeb/admin/config/jumptoweb/asr-models/adapt). Once the new model becomes available in the IV system, the models can be synchronized to VFC and the desired tenants can be selected. This model prefers system and tenant administrators doing the adaptation, not customers. The adaptation puts an extra load on the IV infrastructure, so tenant administrators will want to keep control and not allow customers to start model adaptation on their own.

      • Deactivation: 

      • Activation

ASR model selection in transcription policies

  • Rename Language to ASR Model (it appears that it is more accurate)

  • If the processor is IV, allow specifying up to 4 models, IV recommends defining up to 3 languages plus the language detection model, but we don't differentiate so we should enforce 4 for now

  • The list should contain all models configured in the system available in the given tenant

  • TODO: how to align the model selection with the license restrictions

License enforcement

  • The key difference from other speech integrations is the limit on the number of models a user can have. The basic license only allows a single ASR model to be used for transcribing the calls of a specific user.

  • A system can have a mix of licenses, so certain users will be limited to a single model, while others won't be limited.

  • The current enforcement logic, which uses permission to assign a license to a user is suitable for IV as well, but it has to be extended.

  • To enforce a single model, the model has to be selected when permission is added to the role. 

  • Only models available in the tenant can be listed for selection.

  • Normally a user should have a single transcription license, but since this is coming through the role system, configuration issues will occur. We need to ensure that always the better license is enforced. So the licenses have some kind of order or priority.

  • No labels