aurum---logarithmic audio-spectrum analysis

One of the technical challenges in the VR/voice-recognition vocoder and music audio compression industry is the cost of detecting pitch and computing harmonic content and formant identities: We examine a lower-cost mathematical solution

The audio (listening) spectrum is logarithmic---the steps in a tempered music "octave" (cf piano key half-steps) progess by multiplicative factor of 21/12~1.06 (1.0595 but both are more precise than noticeable over the 6-7 octave range of music). [Nom. "octave" is a closed range by a factor of 2x]

It is usually estimated that music does not need further definition---indeed, adjacent mid-range audio tones "beat" instead of distinguish well (eg. A-440 beats with A#-466 at 26 cps: within the phoneme rate)---harmonic correlations reveal more distinguishability than dissonant beating (but slow beating is sometimes musically desireable undulation like organ "Leslie"). For utility, quarter-notes logarithmically 1.03x apart are deemed indistinguishable beyond acute tuning requirements.

Elementary:

A simple standing-wave power-filter, a delay line with 2 or more taps per frequency, each original sample and its delayed reflection(s), discriminates spectrum peaks, or nulls (though not strongly)---each frequency detector samples the signal vector of everything on the average and phase-correlates or partially except at its null. [A null is compared to that total average energy] It requires fast power-square calculation---but single-parameter squaring is less computational demand than two-parameter transform multiplication of the basic FFT.

It is however only slightly discriminating: Average amplitude is 50% about 1.5x/ frequency. And detection-points are harmonic sensitive: High frequencies have repetitious nulls at lower sub-harmonic frequencies---but various further techniques can reduce these: Eg. add-back the upper harmonics to their sub-harmonics' pre-outputs, or, include simple parameters that adjust the reflection in the delay line (the low frequency detectors are at multiplied wavelengths of the higher, and so are multipliedly desensititized by their slight smearing---but such smearing must increase successively for lower frequencies), or, filtering using octaval filters (simply efficient with single-bit multiply: power-of-2 fast-shifting and single addition).

Or, both distinctness and harmonic reduction can be handled by cumulating more "taps" along the sample line: About 16, with the signal over-sampled about 8x, keep the filtering efficient within 3% amplitude and 3% aurum frequency spacings.

And this is efficiently applicable to logarithmic scales where frequency (sub)steps are ordered by a (constant) ratio: For example, the audio discriminator receives 5% of its 3%-shy adjacent neighbor frequencies (a 95%-null) at half the ordinary music-pitched half-steps (half of 5.95%) -- needing a dozen cycles to resolve it. Discriminating has significantly faster sensitivity than ordinary transform detectors, but does take a longer dwell to resolve it against the noisy average total. Also, by slightly smearing the reflection to reduce sub-harmonic detection, the high-end frequency spectrum is also smeared, reducing its nulling depth -- requiring longer detection dwell, but overall detecting a thick (sub)step of close frequencies (which ordinary transform discriminators overly detail at the high end).

Advanced:

Potentially more effective is the aurum, aur(spectr)al/um, analysis by QNL Quadrature Nulling Loop, similar to PLL Phase Locked Loop but using the quadrature nulling instead of edge-transition detection of the so-called phase-lock which served well in simpler (clean) digital bi-phase and even narrowband FM detection (technically, phase and FM, discrimination), but not effectually in compounded signal AM aural analysis.

The QNL method made simplistic in digital, tracks a key band of frequencies (which can be a full key step 1.06x rather than half-step) and finds the best tracking, yielding that amplitude---it also tends to find the central frequency when compounded signals are presented at the input, thus resulting in proper "audio beating", replacing the common Fourier/Hilbert/McClaurin/Laplace transforms pure-spectral analyses significantly lacking the temporal responsivity component and so misrepresenting "beating" as modulation (typically half the beat-frequency).

The QNL tends insensitive to integral harmonics, and if computed peak-to-peak cyclic, insensitive to subharmonics. Thus it is ideal for power-aurum processing. Further, what harmonics do get through the use of squarish sampling (in the simplest implementation) are further reduced by statistical dither in the time base as the QNL tracks.

Preternatural considerations:

The actual sensitivity of a given key-filter, over a single cycle is merely 6% or 1/17 samples---close enough that adjacent key-filters might be used (to reduce the computation load)...alternate keys might sample on the quadrature (sine-cosine). But the advantage of QNL is that it tracks-out the phase, whereas fixture time-base filters only supply initial information which must be tracked at a second level, interpolating between first level sine-cosine filters. [under construction]

A premise discovery under the title,

Grand-Admiral Petry
'Majestic Service in a Solar System'
Nuclear Emergency Management

© 2002 GrandAdmiralPetry@Lanthus.net