1. IOS平台相关概念
1.1 IOS平台自带AGC功能
可以通过以下方法查看ios平台是否支持AGC和设置AGC(代码摘自webrtc):
//1. 创建Component,设置开启硬件音频处理
// Create an audio component description to identify the Voice Processing
// I/O audio unit.
AudioComponentDescription vpio_unit_description;
vpio_unit_description.componentType = kAudioUnitType_Output;
vpio_unit_description.componentSubType = kAudioUnitSubType_VoiceProcessingIO;//设置这个 sub type 后,会启用系统的 AEC, AGC, NS 等
vpio_unit_description.componentManufacturer = kAudioUnitManufacturer_Apple;
vpio_unit_description.componentFlags = 0;
vpio_unit_description.componentFlagsMask = 0;
// Obtain an audio unit instance given the description.
AudioComponent found_vpio_unit_ref =
AudioComponentFindNext(nullptr, &vpio_unit_description);
// Create a Voice Processing IO audio unit.
OSStatus result = noErr;
result = AudioComponentInstanceNew(found_vpio_unit_ref, &vpio_unit_);
if (result != noErr) {
vpio_unit_ = nullptr;
RTCLogError(@"AudioComponentInstanceNew failed. Error=%ld.", (long)result);
return false;
}
// 2. 查看是否支持AGC,webRTC中封装在GetAGCState方法中
UInt32 size = sizeof(*enabled);
OSStatus result = AudioUnitGetProperty(vpio_unit_,
kAUVoiceIOProperty_VoiceProcessingEnableAGC,
kAudioUnitScope_Global,
kInputBus,
enabled,
&size);
RTCLog(@"VPIO unit AGC: %u", static_cast<unsigned int>(*enabled));
// 设置开启AGC
UInt32 enable_agc = 1;
result =
AudioUnitSetProperty(vpio_unit_,
kAUVoiceIOProperty_VoiceProcessingEnableAGC,
kAudioUnitScope_Global, kInputBus, &enable_agc,
sizeof(enable_agc));
if (result != noErr) {
RTCLogError(@"Failed to enable the built-in AGC. "
"Error=%ld.",
(long)result);
RTC_HISTOGRAM_COUNTS_SPARSE_100000(
"WebRTC.Audio.SetAGCStateErrorCode", (-1) * result);
}
//3. 初始化AudioUnit
AudioUnitInitialize(vpio_unit_);
1.2 设置采集回调
在直播录制中比较关键的一步就是Render Callback Function(webrtc中对应设置的方法为OnDeliverRecordedData)。AudioUnit每次都是处理一段音频数据,每次处理完成一段数据的时候,这个回调函数就会被调用一次。 代码摘自webrtc:
/ Disable AU buffer allocation for the recorder, we allocate our own.
// TODO(henrika): not sure that it actually saves resource to make this call.
UInt32 flag = 0;
result = AudioUnitSetProperty(
vpio_unit_, kAudioUnitProperty_ShouldAllocateBuffer,
kAudioUnitScope_Output, kInputBus, &flag, sizeof(flag));
if (result != noErr) {
DisposeAudioUnit();
RTCLogError(@"Failed to disable buffer allocation on the input bus. "
"Error=%ld.",
(long)result);
return false;
}
// Specify the callback to be called by the I/O thread to us when input audio
// is available. The recorded samples can then be obtained by calling the
// AudioUnitRender() method.
AURenderCallbackStruct input_callback;
input_callback.inputProc = OnDeliverRecordedData;
input_callback.inputProcRefCon = this;
result = AudioUnitSetProperty(vpio_unit_,
kAudioOutputUnitProperty_SetInputCallback,
kAudioUnitScope_Global, kInputBus,
&input_callback, sizeof(input_callback));
if (result != noErr) {
DisposeAudioUnit();
RTCLogError(@"Failed to specify the input callback on the input bus. "
"Error=%ld.",
(long)result);
return false;
}
1.3 获取pcm数据
WebRTC 禁用了 AudioUnit 为采集数据的 buffer 分配,而是自己管理 buffer,在收到 AudioUnit 的回调之后,再手动调用 AudioUnitRender 把采集到的数据取出来。通过AudioUnit的AudioUnitRender方法,可以AUGraph中的某一个节点中获取到一段处理后的音频PCM数据。同时,如果需要进行耳返播放,在这个回调中也需要将取得的音频数据送入到回调函数的最后一个参数ioData对应的buffer中。代码摘自webrtc:
OSStatus VoiceProcessingAudioUnit::Render(AudioUnitRenderActionFlags* flags,
const AudioTimeStamp* time_stamp,
UInt32 output_bus_number,
UInt32 num_frames,
AudioBufferList* io_data) {
RTC_DCHECK(vpio_unit_) << "Init() not called.";
OSStatus result = AudioUnitRender(vpio_unit_, flags, time_stamp,
output_bus_number, num_frames, io_data);
if (result != noErr) {
RTCLogError(@"Failed to render audio unit. Error=%ld", (long)result);
}
return result;
}
2. webrtc中的AGC
iOS平台的VPIO自己本身已经支持AEC、AGC和NS所以不使用WebRTC的软件算法。WebRTC iOS 并未实现开关 AEC 的逻辑,但是提供了一个 ios_force_software_aec_HACK 选项,用以在硬件 AEC 不生效的机器上强制开启软件 AEC 实现,但这种做法的具体表现如何,WebRTC 也是持观望态度。webrtc中在iOS平台可以通过ios_force_software_aec_HACK 强制开启软件回声消除:echo_cancellation/extended_filter_aec,但是AGC和NS目前没有选项可以设置。如果安卓平台内置了AEC、AGC和NS也是不使用WebRTC软件算法而使用平台内置算法。顺便看了一下android上使用平台内置硬件AEC,NS的代码,代码目录在:
![](https://img.haomeiwen.com/i1996279/02c3d31683c23121.png)
在java中使用AudioEffect
检测是否支持硬件AEC和NS,使用AcousticEchoCanceler
和NoiseSuppressor
处理AEC和NS。而AGC没有平台硬件支持,直接使用WebRTC中的算法。
那如何开启webrtc的agc呢?我们可以通过 media/engine/webrtc_voice_engine.cc 的WebRtcVoiceEngine::ApplyOptions(const AudioOptions& options_in) 方法,可以强制设置走webrtc的agc,测试代码如下:
...
// Set and adjust noise suppressor options.
#if defined(WEBRTC_IOS)
// On iOS, VPIO provides built-in NS.
// options.noise_suppression = false;
// options.typing_detection = false;
// options.experimental_ns = false;
//lygtest
options.noise_suppression = true;
options.typing_detection = true;
options.experimental_ns = true;
RTC_LOG(LS_INFO) << "Always disable NS on iOS. Use built-in instead.";
#elif defined(WEBRTC_ANDROID)
options.typing_detection = false;
options.experimental_ns = false;
#endif
// Set and adjust gain control options.
#if defined(WEBRTC_IOS)
// On iOS, VPIO provides built-in AGC.
// options.auto_gain_control = false;
// options.experimental_agc = false;
//lygtest
options.auto_gain_control = true;
options.experimental_agc = true;
RTC_LOG(LS_INFO) << "Always disable AGC on iOS. Use built-in instead.";
#elif defined(WEBRTC_ANDROID)
options.experimental_agc = false;
#endif
2.1 接口调用逻辑和调用堆栈(自己绘制的分享给大家)
![](https://img.haomeiwen.com/i1996279/becf9748b56d3931.png)
![](https://img.haomeiwen.com/i1996279/1582d55b18ffef4c.png)
关键代码:
int AudioProcessingImpl::ProcessCaptureStreamLocked() {
HandleCaptureRuntimeSettings();
// Ensure that not both the AEC and AECM are active at the same time.
// TODO(peah): Simplify once the public API Enable functions for these
// are moved to APM.
RTC_DCHECK_LE(!!private_submodules_->echo_controller +
!!private_submodules_->echo_cancellation +
!!private_submodules_->echo_control_mobile,
1);
AudioBuffer* capture_buffer = capture_.capture_audio.get(); // For brevity.
if (private_submodules_->pre_amplifier) {
private_submodules_->pre_amplifier->ApplyGain(AudioFrameView<float>(
capture_buffer->channels(), capture_buffer->num_channels(),
capture_buffer->num_frames()));
}
capture_input_rms_.Analyze(rtc::ArrayView<const float>(
capture_buffer->channels_const()[0],
capture_nonlocked_.capture_processing_format.num_frames()));
const bool log_rms = ++capture_rms_interval_counter_ >= 1000;
if (log_rms) {
capture_rms_interval_counter_ = 0;
RmsLevel::Levels levels = capture_input_rms_.AverageAndPeak();
RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureInputLevelAverageRms",
levels.average, 1, RmsLevel::kMinLevelDb, 64);
RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureInputLevelPeakRms",
levels.peak, 1, RmsLevel::kMinLevelDb, 64);
}
if (private_submodules_->echo_controller) {
// Detect and flag any change in the analog gain.
int analog_mic_level = agc1()->stream_analog_level();
capture_.echo_path_gain_change =
capture_.prev_analog_mic_level != analog_mic_level &&
capture_.prev_analog_mic_level != -1;
capture_.prev_analog_mic_level = analog_mic_level;
// Detect and flag any change in the pre-amplifier gain.
if (private_submodules_->pre_amplifier) {
float pre_amp_gain = private_submodules_->pre_amplifier->GetGainFactor();
capture_.echo_path_gain_change =
capture_.echo_path_gain_change ||
(capture_.prev_pre_amp_gain != pre_amp_gain &&
capture_.prev_pre_amp_gain >= 0.f);
capture_.prev_pre_amp_gain = pre_amp_gain;
}
// Detect volume change.
capture_.echo_path_gain_change =
capture_.echo_path_gain_change ||
(capture_.prev_playout_volume != capture_.playout_volume &&
capture_.prev_playout_volume >= 0);
capture_.prev_playout_volume = capture_.playout_volume;
private_submodules_->echo_controller->AnalyzeCapture(capture_buffer);
}
if (constants_.use_experimental_agc &&
public_submodules_->gain_control->is_enabled()) {
private_submodules_->agc_manager->AnalyzePreProcess(
capture_buffer->channels_f()[0], capture_buffer->num_channels(),
capture_nonlocked_.capture_processing_format.num_frames());
if (constants_.use_experimental_agc_process_before_aec) {
private_submodules_->agc_manager->Process(
capture_buffer->channels_const()[0],
capture_nonlocked_.capture_processing_format.num_frames(),
capture_nonlocked_.capture_processing_format.sample_rate_hz());
}
}
if (submodule_states_.CaptureMultiBandSubModulesActive() &&
SampleRateSupportsMultiBand(
capture_nonlocked_.capture_processing_format.sample_rate_hz())) {
capture_buffer->SplitIntoFrequencyBands();
}
const bool experimental_multi_channel_capture =
config_.pipeline.experimental_multi_channel &&
constants_.experimental_multi_channel_capture_support;
if (private_submodules_->echo_controller &&
!experimental_multi_channel_capture) {
// Force down-mixing of the number of channels after the detection of
// capture signal saturation.
// TODO(peah): Look into ensuring that this kind of tampering with the
// AudioBuffer functionality should not be needed.
capture_buffer->set_num_channels(1);
}
if (private_submodules_->high_pass_filter) {
private_submodules_->high_pass_filter->Process(capture_buffer);
}
RETURN_ON_ERR(
public_submodules_->gain_control->AnalyzeCaptureAudio(capture_buffer));
public_submodules_->noise_suppression->AnalyzeCaptureAudio(capture_buffer);
if (private_submodules_->echo_control_mobile) {
// Ensure that the stream delay was set before the call to the
// AECM ProcessCaptureAudio function.
if (!was_stream_delay_set()) {
return AudioProcessing::kStreamParameterNotSetError;
}
if (public_submodules_->noise_suppression->is_enabled()) {
private_submodules_->echo_control_mobile->CopyLowPassReference(
capture_buffer);
}
public_submodules_->noise_suppression->ProcessCaptureAudio(capture_buffer);
RETURN_ON_ERR(private_submodules_->echo_control_mobile->ProcessCaptureAudio(
capture_buffer, stream_delay_ms()));
} else {
if (private_submodules_->echo_controller) {
data_dumper_->DumpRaw("stream_delay", stream_delay_ms());
if (was_stream_delay_set()) {
private_submodules_->echo_controller->SetAudioBufferDelay(
stream_delay_ms());
}
private_submodules_->echo_controller->ProcessCapture(
capture_buffer, capture_.echo_path_gain_change);
} else if (private_submodules_->echo_cancellation) {
// Ensure that the stream delay was set before the call to the
// AEC ProcessCaptureAudio function.
if (!was_stream_delay_set()) {
return AudioProcessing::kStreamParameterNotSetError;
}
RETURN_ON_ERR(private_submodules_->echo_cancellation->ProcessCaptureAudio(
capture_buffer, stream_delay_ms()));
}
public_submodules_->noise_suppression->ProcessCaptureAudio(capture_buffer);
}
if (config_.voice_detection.enabled) {
capture_.stats.voice_detected =
private_submodules_->voice_detector->ProcessCaptureAudio(
capture_buffer);
} else {
capture_.stats.voice_detected = absl::nullopt;
}
if (constants_.use_experimental_agc &&
public_submodules_->gain_control->is_enabled() &&
!constants_.use_experimental_agc_process_before_aec) {
private_submodules_->agc_manager->Process(
capture_buffer->split_bands_const_f(0)[kBand0To8kHz],
capture_buffer->num_frames_per_band(), capture_nonlocked_.split_rate);
}
// TODO(peah): Add reporting from AEC3 whether there is echo.
RETURN_ON_ERR(public_submodules_->gain_control->ProcessCaptureAudio(
capture_buffer,
private_submodules_->echo_cancellation &&
private_submodules_->echo_cancellation->stream_has_echo()));
if (submodule_states_.CaptureMultiBandProcessingPresent() &&
SampleRateSupportsMultiBand(
capture_nonlocked_.capture_processing_format.sample_rate_hz())) {
capture_buffer->MergeFrequencyBands();
}
if (capture_.capture_fullband_audio) {
const auto& ec = private_submodules_->echo_controller;
bool ec_active = ec ? ec->ActiveProcessing() : false;
// Only update the fullband buffer if the multiband processing has changed
// the signal. Keep the original signal otherwise.
if (submodule_states_.CaptureMultiBandProcessingActive(ec_active)) {
capture_buffer->CopyTo(capture_.capture_fullband_audio.get());
}
capture_buffer = capture_.capture_fullband_audio.get();
}
if (config_.residual_echo_detector.enabled) {
RTC_DCHECK(private_submodules_->echo_detector);
private_submodules_->echo_detector->AnalyzeCaptureAudio(
rtc::ArrayView<const float>(capture_buffer->channels()[0],
capture_buffer->num_frames()));
}
// TODO(aluebs): Investigate if the transient suppression placement should be
// before or after the AGC.
if (capture_.transient_suppressor_enabled) {
float voice_probability =
private_submodules_->agc_manager.get()
? private_submodules_->agc_manager->voice_probability()
: 1.f;
public_submodules_->transient_suppressor->Suppress(
capture_buffer->channels()[0], capture_buffer->num_frames(),
capture_buffer->num_channels(),
capture_buffer->split_bands_const(0)[kBand0To8kHz],
capture_buffer->num_frames_per_band(),
capture_.keyboard_info.keyboard_data,
capture_.keyboard_info.num_keyboard_frames, voice_probability,
capture_.key_pressed);
}
// Experimental APM sub-module that analyzes |capture_buffer|.
if (private_submodules_->capture_analyzer) {
private_submodules_->capture_analyzer->Analyze(capture_buffer);
}
if (config_.gain_controller2.enabled) {
private_submodules_->gain_controller2->NotifyAnalogLevel(
agc1()->stream_analog_level());
private_submodules_->gain_controller2->Process(capture_buffer);
}
if (private_submodules_->capture_post_processor) {
private_submodules_->capture_post_processor->Process(capture_buffer);
}
// The level estimator operates on the recombined data.
if (config_.level_estimation.enabled) {
private_submodules_->output_level_estimator->ProcessStream(*capture_buffer);
capture_.stats.output_rms_dbfs =
private_submodules_->output_level_estimator->RMS();
} else {
capture_.stats.output_rms_dbfs = absl::nullopt;
}
capture_output_rms_.Analyze(rtc::ArrayView<const float>(
capture_buffer->channels_const()[0],
capture_nonlocked_.capture_processing_format.num_frames()));
if (log_rms) {
RmsLevel::Levels levels = capture_output_rms_.AverageAndPeak();
RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureOutputLevelAverageRms",
levels.average, 1, RmsLevel::kMinLevelDb, 64);
RTC_HISTOGRAM_COUNTS_LINEAR("WebRTC.Audio.ApmCaptureOutputLevelPeakRms",
levels.peak, 1, RmsLevel::kMinLevelDb, 64);
}
capture_.was_stream_delay_set = false;
return kNoError;
}
其中,ProcessRenderAudio 这个函数在远端调用,主要目的是分析远端信号的VAD属性。调用了AGC中的WebRtcAgc_AddFarend函数。实际上WebRtcAgc_AddFarend函数只是做了一些参数的校验,最后调用了WebRtcAgc_AddFarendToDigital,WebRtcAgc_AddFarendToDigital调用到了WebRtcAgc_ProcessVad。再Check input的流程中,只支持以下采样率和帧长的输入数据:
1、8K采样率,10或者20ms长度数据,subFrames为80;
2、16K采样率,10或者20ms长度数据,subFrames为160;
3、32K采样率,5或者10ms长度数据,subFrames为160;
AnalyzeCaptureAudio 流程主要用于分析没有经过处理的声音。根据不同模式调用了数字不同的处理。如果是选择了kFixedDigital的AGC模式,则AnalyzeCaptureAudio不起作用。
ProcessCaptureAudio 是处理的核心,包括了AGC的重要调用WebRtcAgc_Process。除了正常的参数检测之外还有有关音量调节的流程。capture_levels_是通过WebRtcAgc_Process计算出来,之后的analog_capture_level_=capture_levels_。同时analog_capture_level_会用在AnalyzeCaptureAudio中的WebRtcAgc_VirtualMic中,做为设置音量输入,同时计算出capture_levels_。事实上由mode控制,如果选择了数字模式,WebRtcAgc_VirtualMic输入analog_capture_level_获得的capture_levels_用于做WebRtcAgc_Process的输入,但是WebRtcAgc_Process输入值写到capture_levels_后给废弃了。如果选择了模拟模式不执行WebRtcAgc_VirtualMic,那么WebRtcAgc_Process输入capture_levels_,每次更新,并且把结果保存到analog_capture_level_中。
2.2 AGC相关接口
>创建AGC:WebRtcAgc_Create
>初始化AGC:WebRtcAgc_Init
>设置配置:WebRtcAgc_set_config
>初始化capture_level = 0
>对于kAdaptiveDigital,调用VirtualMic:WebRtcAgc_VirtualMic
> Process Buffer with capture_level:WebRtcAgc_Process
>获取从WebRtcAgc_Process返回的捕获级别,并将其设置为capture_level
>对于音频缓冲区重复5到7
>销毁AGC:WebRtcAgc_Free
欢迎关注公众号“音视频开发之旅”,一起学习成长。
参考文章:
Webrtc AGC 算法原理初识(一)
网友评论