IOS real time recording


Core Audio

Core Audio is a collection of iOS and OS X audio processing framework, with the advantages of high performance, low latency. Core Audio in the framework are: Audio Toolbox, Audio Unit, AV Foundation, OpenAL
iOS

IOS real time recording
Core Audio iOS architecture

Recording scheme

AVFoundation: AVAudioPlayer, AVAudioRecorder, and a simple OC interface, the recording process is the audio recorded audio file playback process is playing audio files, suitable for non real time scene.
Audio Unit in the audio development in the bottom, you can get real-time and play PCM data, with the advantages of fast response, low latency, suitable for low latency real-time scene Unit:Audio.
Audio ToolBox: Based on Audio Unit, providing Core Audio middle and high level service interface, including Audio Session Services, AudioQueueService (audio queue). Audio queue is another recording scheme, the recorded audio placed in the queue, remove the play.
OpenAL: Based on Audio Unit, mainly to provide cross platform interface.
can see that there are two kinds of real-time recording scheme, this paper describes the characteristics of these two ways.


Audio Queue

About Audio Queue knowledge, there are a lot of good summary on the Internet, if the English reading barrier, you can read the official document details Audio Queue Services Programming Guide. The recording and playback schematic is as follows:

IOS real time recording
recording audio A queue
IOS real time recording
playback audio A queue

General principle is, use the cache queue to achieve the real-time recording and playback of the recording in effect, for example, PCM data acquisition of the first microphone is filled into the first team in the cache, the cache is full when a team to trigger a callback function, can do sound processing, the callback when written to the file, then play operation. Clear the modified cache, and the cache is added to the queue, waiting to be filled, this process has been circulating playback process similarly.
note: we can control the callback time by setting the size of the cache, so as to process audio in real time. The calculations are as follows:

Callback time is the sampling rate sampling digit * cache size (note the approximate value! )

Audio Queue recording scheme is relatively simple to be able to deal with audio in real time, but also has its limitations, it is not accurate enough, there is a certain delay, that is, the callback function is not stable. When the sampling rate is 44100, the number is 16, the buffer size is 8820, according to the formula callback time is equal to 100ms, the exact value of 92.9ms (later explained), the callback time is as follows:

IOS real time recording
cache callback time of 8820

You can see most of the callback interval is 93ms, there are some fluctuations, the third to the fourth time is 105ms, and the smaller the callback interval, the greater the volatility, such as the cache size set to 4410, the callback time is as follows:

IOS real time recording
cache callback time of 4410

this time the volatility has been very obvious, the second to the third and even the case of 7ms. In a real time scenario, each call represents a frame that is difficult to accept when the frame size is fine.

Thinking: why is there a fluctuation situation? Solution?

The reason for this volatility is generated at the bottom of the Audio Queue, previously said, Audio ToolBox is based on the Audio Unit, the callback function to be resolved in the end layer.

IOS real time recording
frameworks in Audio iOS

can guess, there may be a concurrent thread at the bottom, so that the synchronization of the callback function random time, it will fluctuate, even in the case of the 7ms call two times. On this point, you can refer to the discussion of AudioQueueNewInput callback latency answer in stackoverflow:

The Audio Queue API looks like it is built on top of the Audio Unit RemoteIO API. Small Audio Queue buffers are probably being used to fill a larger RemoteIO buffer behind the scenes. Perhaps even some rate resampling might be taking place (on the original 2G phone
For). Lower latency, try using the RemoteIO Audio Unit API directly and, then requesting the audio session to provide your app a smaller lower latency buffer size.

As you can see, using a low latency recording mode requires the use of a lower Audio Unit.


Audio Unit

On the introduction of Audio Unit, the official document Audio Unit Guide for iOS explained in great detail, Audio Unit usually work in a closed context, called audio processing graph, as follows:

IOS real time recording
Audio Unit processing

microphones to capture audio transmission to audio processing graph, EQ unit audio data after two (Jun Heng), and Mixer unit (hybrid), and finally to directly connected with the output device I/O unit. This process can be seen, Audio Unit is the direct processing of the audio, and even can be output to unit peripherals, compared to the audio queue configuration, Audio Unit is more complicated, the details of using Audio Unit to achieve real-time recording examples.
Audio Unit construction methods are divided into two kinds, one is the direct use of Unit API, one is the use of Audio Unit Graph, following the first approach.

AudioUnit audioUnit;

Explanation of AudioUnit:

The type used represent instance of a particular audio to an component

The structure shown is as follows:

IOS real time recording
unit scopes and Audio elements

next to build the structure of the Unit, in different audio applications, you can build a variety of different structures, a simple structure is as follows:

IOS real time recording
architecture of an I/O The unit

determines the structure and begins to work.

Configure AudioSession

As with other recording playback, you need to configure the recording playback environment, in response to headphones, etc..

NSError *error; AVAudioSession *audioSession = [AVAudioSession sharedInstance]; [audioSession setCategory:AVAudioSessionCategoryPlayAndRecord error:& error] [audioSession; setPreferredSampleRate:44100 error:& error] [audioSession; setPreferredInputNumberOfChannels:1 error:& error] [audioSession; setPreferredIOBufferDuration:0.05 error:& error];

Configure AudioComponentDescription

AudioComponentDescription is used to describe the unit types, including 3D equalizer, mixing, mixing, remote input and output, VoIP input and output, the general output format, use the remote input and output here.

AudioComponentDescription audioDesc; audioDesc.componentType = kAudioUnitType_Output; audioDesc.componentSubType = kAudioUnitSubType_RemoteIO; audioDesc.componentManufacturer = kAudioUnitManufacturer_Apple; audioDesc.componentFlags = 0; audioDesc.componentFlagsMask = 0; AudioComponent inputComponent = AudioComponentFindNext (NULL, & audioDesc); AudioComponentInstanceNew (inputComponent, & audioUnit);

Data format for input and output

Set the sampling rate of 44100, a single channel, the format of the 16, pay attention to the input and output should be set.

AudioStreamBasicDescription audioFormat; audioFormat.mSampleRate = 44100; audioFormat.mFormatID = kAudioFormatLinearPCM; audioFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked; audioFormat.mFramesPerPacket = 1; audioFormat.mChannelsPerFrame = 1; audioFormat.mBitsPerChannel = 16; audioFormat.mBytesPerPacket = 2; audioFormat.mBytesPerFrame = 2; AudioUnitSetProperty (audioUnit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, INPUT_BUS, audioFormat, sizeof & amp; (audioFormat); AudioUnitSetProperty (audioUnit, kAudioUnitProperty_StreamFormat). KAudioUnitScope_Inpu T, OUTPUT_BUS, & audioFormat, sizeof (audioFormat);

Open the input and output ports

By default, the input is turned off and the output is open. In unit’s Element, Input uses “1” (like I) to say that Output is represented by “0” (and O).

UInt32 flag = 1; AudioUnitSetProperty (audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, INPUT_BUS, & flag, sizeof (flag)); AudioUnitSetProperty (audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, OUTPUT_BUS, & flag, sizeof (flag));

Configure callback

Depending on the requirements of the application scene, you can set the callback in the input and output to enter the callback as an example:

AURenderCallbackStruct recordCallback; recordCallback.inputProc = RecordCallback; recordCallback.inputProcRefCon = (__bridge * void) self; AudioUnitSetProperty (audioUnit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, INPUT_BUS, & recordCallback, sizeof (recordCallback));

Need to define the callback function, the callback function is AURenderCallback type, in accordance with the parameters defined in the AUComponent.h type, define the input callback function:

Static OSStatus RecordCallback (void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) {AudioUnitRender (audioUnit, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, buffList); return noErr;}

Allocate cache

This is a very important step to get the recording data, you need to allocate the cache to store real-time recording data. If this is not done, the recording data can be obtained at the time of the output, but the meaning is not the same, the acquisition of recording data should be completed in the input callback, rather than output callback.

UInt32 flag = 0; AudioUnitSetProperty (audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Output, INPUT_BUS, & flag, sizeof (flag)); buffList (AudioBufferList*) = malloc (sizeof (AudioBufferList)); buffList-> mNumberBuffers = 1; buffList-> = 1; mBuffers[0].mNumberChannels; buffList-> mBuffers[0].mDataByteSize = 2048 * sizeof (short); buffList-> mBuffers[0].mData = (short * malloc) (sizeof (short) * 2048);

Through the above settings, can be recorded in real time, and real-time playback (in this case, the input and output are open).


Several problems

  1. When run on a real machine, will be in error, the error information is as follows:
IOS real time recording
runtime error message

this is because there is no record rights to open the Info.plist file in the form of source code, the following attributes are added to the Dict tag:

< key> NSMicrophoneUsageDescription< /key> < string> microphoneDesciption< /string>

Run again, on OK.

2 callback interval problem.
Audio Unit delay is very low, the callback time is very stable, very suitable for audio processing strictly in real time, even if the time is set to 0.000725623582766 seconds, callback time is still very accurate:

IOS real time recording
callback interval is very short

in fact, Audio Unit does not have a callback interval configuration, but we can configure it through the context:

[audioSession setPreferredIOBufferDuration:0.05 error:& error];

This set duration for 0.05 seconds, that every 0.05 seconds to read the cache data. If the sampling rate is 44100, then 16 digit sampling, buffer size should be 441000.0516 / 8 = 4410, but Audio Unit buffer is the size of the power of 2, then it is impossible to have 4410 buffer, when the actual size is 4096, in turn, the computing time is 0.0464 seconds, which also explains the proximate cause calculate the callback time on Audio Queue. In addition to the
, if not AudioSession setting time, there will be a default size of buffer, the size is not the same in the simulator and so on, so it is necessary to program control, this setting.

3 on the issue of
test found that the effect of using headphones better, do not have to play when the headset noise. If you want to get a clear effect, you can each time the PCM data written to the file, and then playback. Recommend the use of Lame, which can be converted to PCM MP3.

4 read PCM data
PCM data stored in the AudioBuffer structure, the audio data is void * type data:

@struct AudioBuffer @abstract A / *! Structure to hold a buffer of audio data. @field mNumberChannels The number of interleaved channels in the buffer. @field mDataByteSize The number of bytes in the buffer pointed at by mData. @field mData A pointer to the buffer of audio data. / struct AudioBuffer mNumberChannels UInt32 {UInt32; mDataByteSize; void* __nullable mData;}; typedef struct AudioBuffer AudioBuffer;

If the sample number is 16 bits, i.e., 2Byte, that is, each 2Byte in the mData is a PCM data to obtain the first data as an example:

Short *data = (short *) buffList-> mBuffers[0].mData; NSLog (@%d), data[0]);

It is important to note that when the type of conversion to be consistent.

DEMO

Audio Unit real time recording