IEEE P1918.1.1Haptic Codecs for the Tactile Internet Task GroupProposal for Tactile CodecTUM Vibrotactile Perceptual Codec based on DWT and SPIHT (TUM-VPC-DS)DCN: HC_NGS_19-1-r0_Proposal_for_Tactile_CodecDate: 2019-3-29AbstractThis document describes a proposal for a tactile codec for the IEEE P1918.1.1 standardizationactivity in response to the respective Call for Contributions. The proposed codec uses aperceptual approach with a DWT and subsequent quantization. The quantizer is designed to beadaptive considering a psychohaptic model. After quantization we further compress using theSPIHT algorithm and generate the bitstream. The whole process is modulary, hence the encodercan work with any psychohaptic model. This allows for future enhancements.Subclause 5.2.1 of the IEEE-SA Standards Board Bylaws states, While participating in IEEEstandards development activities, all participants...shall act in accordance with all applicable laws(nation-based and international), the IEEE Code of Ethics, and with IEEE Standards policies andprocedures.The contributor acknowledges and accepts that this contribution is subject to The IEEE Standards copyright policy as stated in the IEEE-SA Standards Board Bylaws,section 7, http://standards.ieee.org/develop/policies/bylaws/sect6-7.html#7, and theIEEE-SA Standards Board Operations Manual, section 6.1,http://standards.ieee.org/develop/policies/opman/sect6.html The IEEE Standards patent policy as stated in the IEEE-SA Standards Board Bylaws,section 6, http://standards.ieee.org/develop/policies/bylaws/sect6-7.html#6, and theIEEE-SA Standards Board Operations Manual, section 6.3,http://standards.ieee.org/develop/policies/opman/sect6.html1 Technical DescriptionThe proposed compression scheme involves various operations in the encoder as depicted inFigure 1.Figure 1: Encoding structure of the proposed tactile codec.The input signal is split into blocks. Each block is first decomposed by a Discrete WaveletTransform (DWT) using CDF 9/7 filters. At the same time, each block is transformed by a DFT.The obtained spectrum is passed on to a psychohaptic model that computes masking andperception thresholds for the corresponding block. Results from the psychohaptic model are usedto adapt the quantization and allocate bits to different DWT bands. After quantizing the waveletcoefficients we further compress them with an adaptation of the SPIHT algorithm from [1]. Thecompressed bitstream is then multiplexed with side information from the quantization and storedor transmitted. In the following, we will explain all steps in more detail.1.1 DWTThe DWT operates on the blocks by applying the CDF 9/7 filters. These filters are chosen as theyhave a symmetric impulse response, which implies linear phase. Therefore, we achieve the samenumber of wavelet coefficients in each block as we have input signal values. In addition, the CDF9/7 filters are almost orthogonal, meaning we can calculate signal energy values in the waveletdomain with acceptable accuracy.1.2 Psychohaptic ModelThe psychohaptic model plays a crucial role in the codec as it adapts the quantizer to introducedistortion where it is least perceivable. We start by taking the FFT of an input block and representit in dB. We determine dominant peaks in the spectrum. It has been shown in [2] that for tonalsignals we observe masking phenomena. Thus, in a more complex signal we assume that maskingwill occur as well, which means that dominant peaks will increase perception thresholds aroundthem. To model this we use quadratic spreading functions that constitute masking thresholds forpeaks at different frequencies. Then all masking thresholds are added together with the absolutethreshold of perception by power additive combination. This yields the so-called global maskingthreshold. This process is illustrated in Figure 2.After obtaining the global masking threshold, we compute the so-called Signal-to-Mask-Ratio(SMR) for each DWT band. That is we take the energy of the spectrum in each band divided bythe energy of the previously obtained global masking threshold. The SMR values are passed on tothe quantizer together with the values of the signal energy in each band. 1.3 QuantizationThe quantizer is the core component of our codec. It allocates a certain bit budget to the differentDWT bands according to the psychohaptic model to reduce the rate considerably withoutintroducing any perceivable distortion.To accomplish this task the quantizer takes into account the values from the psychohapticmodel. In a loop a total of n bits are allocated to each band. We start with 0 bits allocated to allbands. In every iteration we calculate the in dB using the signal energy values in each bandpassed over by the psychohaptic model and the noise energy introduced by the quantization. Wethen calculate the so-called Mask-to-Noise-Ratio. Then weallocate one bit to the band with the lowest value and repeat until all n bits are allocated.Since in general the bands will have a different number of quantizer bits, we design the quantizeritself as an embedded deadzone quantizer adapted from [2]. We first calculate the maximumwavelet coefficient for the current block ?)*+. This value is quantized to a fixed point numberwith 3 integer bits and 4 fraction bits by a ceiling operation to receive ?,)*+ . The 7 bitsrepresenting this maximum value are passed on to the bitstream encoding as side information. Thequantizer then takes the bits allocated to each band and this maximum value to determine thequantization interval asΔ = ,)*+2/ ,where is the number of bits allocated to a particular band. The wavelet coefficients are thenquantized according to2 = sgn() 8Δ9 Δ.Figure 2: Magnitude spectrum of an exemplary block (blue), computed masking thresholds (red), absolute thresholdof perception (green) and the resulting global masking threshold (black).Thus, the wavelet coefficients are quantized to the original range. This formula also implies theaddition of one sign bit. After all bits have been allocated and therefore all wavelet coefficientshave been quantized, we scale all the quantized wavelet coefficients to integers by.These quantized integer wavelet coefficients are passed on to the SPIHT algorithm.1.4 SPIHTIn order to efficiently compress the quantized wavelet coefficients, we employ a 1D version of SetPartitioning in Hierarchical Trees (SPIHT) algorithm proposed in [1]. SPIHT is a zero tree basedcoding method, which achieves superior performance than Embedded Zero-tree Wavelet (EZW)coding. It utilizes two types of zero trees and encodes the significant coefficients and zero trees bysuccessive significance and refinement passes. The details of the algorithm for coding of 2Dwavelet coefficients is provided in [1], and exemplified in [3]. We adapt the same for the quantized1D wavelet coefficients, by constructing the parent-child relationship in only one dimension. Theoutput of the SPIHT module is the bitstream of lossless compression of quantized 1D waveletcoefficients.1.5 Bitstream EncodingIn order for the decodIEEE P1918.1.1作业代做、Haptic Codecs作业代写、C++程序作业调试、代做JAVA,Pythoner to be able to decompress the signals correctly, we need to pass some sideinformation in the bitstream. We therefore add a header on front of every compressed block. Thisheader consists of 32 bits for a 512 sample long block and codes the following information:- 14 bits: Length of the following bitstream segment belonging to one block- 2 bits: Coding of block length chosen from 64, 128, 256, and 512.- 6 bits: Integer number coding the maximum number of bits allocated to the DWT bands- 3 bits: Integer coding the level of the DWT- 7 bits: Fixed point number with 3 integer and 4 fraction bits coding the maximum waveletcoefficient value of the current block.For smaller block lengths the length of the header can be reduced accordingly.1.6 DecodingThe decoder can be built very simply by 4 operations. First, the blocks are separated out of thebitstream, followed be an inverse SPIHT algorithm. Then we dequantize and do an inverse DWTto obtain the reconstructed signal suitable for playback.2 Performance EvaluationWe aim to show the performance of our compression scheme by examining its rate-distortionbehavior. We use the provided test data set consisting of 280 vibrotactile signals recorded with anaccelerometer. The test dataset contains signals of various materials for different explorationspeeds. We compress the signals using a block length of 512 samples and a DWT of level 7.All signals are encoded, decoded and the resulting output is then compared to the original. Wevary the bit budgets of the quantizer between 8 and 112 bits to achieve different rates and thereforequality levels. We define the compression ratio () as the ratio between the original rate and thecompressed rate. Then, we compute SNR and PSNR for all 280 test signals for different values.The respective scatter plots for all three metrics with averages are given in the following plots. Inblue are the scatter plots for all test signals at different rates and in red the average over all testsignals. It is clearly visible that the quality decreases with increasing compression. At a of 10we have an SNR of about 10 dB and a PSNR of about 52 dB. Additionally, the results for different bit budgets are given in the following table.Here we also computed the required runtime per block of our algorithm in MATLAB. Especiallyfor low rates, this time is sufficiently low, to allow for a real-time scenario. In this case we wouldhave to choose a significantly lower block length, since 512 samples already will account for adelay of about 180ms. A block length of 64 samples would deliver 23ms of delay at the cost of aslightly worse compression performance.To assess the behavior of our algorithm in more detail, we examine individual signals in terms oftheir PSNR over performance. The resulting plots are given in the following figures. MSE SNR (dB) PSNR (dB) Runtime per block (ms)8 54.65 1.51 × 10FG 2.56 45.12 4.310 41.62 1.38 × 10FG 3.20 45.75 4.212 32.58 1.23 × 10FG 3.81 46.36 4.714 26.74 1.10 × 10FG 4.44 47.00 5.416 22.24 9.61 × 10FH 5.02 47.58 5.920 15.90 6.68 × 10FH 6.24 48.80 6.924 11.53 4.19 × 10FH 7.78 50.34 8.528 8.73 2.46 × 10FH 9.56 52.11 9.732 6.90 1.31 × 10FH 11.50 54.06 11.240 4.98 4.00 × 10FI 15.12 57.67 12.948 3.69 1.17 × 10FI 19.22 61.78 15.056 2.77 3.26 × 10FJ 24.77 67.33 17.164 2.29 8.32 × 10FK 30.65 73.20 18.780 1.78 5.93 × 10FO 42.55 85.11 20.696 1.47 9.28 × 10FP 54.41 96.97 23.4112 1.26 6.21 × 10FP 66.38 108.94 26.0128 1.10 6.03 × 10FP 78.26 120.81 28.4Direct_-_1spike_Probe_-_cork_-_slower.mat(Signal #20)Direct_-_3x1spike_Probe_-_antiVibPad_-_fast.mat(Signal #84)Direct_-_3x1spike_Probe_-_polyesterPad_-_slow.mat (Signal #107)Direct_-_3x3small-round_Probe_-_felt_-_fast.mat(Signal #175)Direct_-_big-round_Probe_-_foam_-_fast.mat(Signal #255)Direct_-_big-round_Probe_-_foam_-_medium.mat(Signal #256)Direct_-_big-round_Probe_-_foam_-_tooSlow.mat(Signal #258)Direct_-_finger_Probe_-_foam_-_slower.mat(Signal #274)We see that the quality decreases for all signals over the compression ratio.Lastly, we aim to exemplify the behavior of our method towards the signal shape. This will helpto gain some further intuition into how perceivable the introduced distortions are. We take the firstsignal from the 8 examples before (Direct_-_1spike_Probe_-_cork_-_slower.mat) and plot thefirst 200 samples together with reconstructed signals for = 8, 16, 32, 64. The results are givenin Figure 3. We can see that the general structure of the signal is preserved even for very highlevels of compression (= 8 is equivalent here to ≈ 62). At = 64 the two signals are soclose that we assume that no distortions should be perceivable. To assess the codec correctly interms of its transparency, we need to conduct extensive experiments and develop new metricsbased on human haptic perception.3 ConclusionWe have presented a novel method to compress and encode 1D tactile signals. The rate distortionperformance is good and the algorithm allows for offline and online encoding with the appropriatechoice of block length. The transparency of the codec should be evaluated in terms of subjectiveexperiments and newly developed perceptual metrics.The presented codec works with any choice of perceptual model, which readily allows for futureenhancements as better psychohaptic models are being developed. In addition, it can fairly easilybe extended to higher dimensional signals to allow for more points of interaction.Figure 3: First 200 samples of the signal Direct_-_1spike_Probe_-_cork_-_slower.mat for various levels ofcompression determined by bit budget ?.4 References[1] A. Said and W. A. Pearlman, A new, fast, and efficient image codec based on set partitioningin hierarchical trees, IEEE Transactions on Circuits and Systems for Video Technology, vol.6, no. 3, pp. 243-250, June 1996.[2] R. Chaudhari, C. Schuwerk, M. Danaei and E. Steinbach, Perceptual and Bitrate-ScalableCoding of Haptic Surface Texture Signals, IEEE Journal of Selected Topics in SignalProcessing, vol. 9, no. 3, pp. 462-473, November 2014.[3] D. S. Taubman und M. W. Marcellin, JPEG2000: Image compression fundamentals,standards, and practice, Kluwer Academic, 2002. Annex A: Information form for the submission of contributionsName of Contribution: TUM Vibrotactile Perceptual Codec based on DWT and SPIHT(TUM-VPC-DS)Authors and Affiliation: Andreas Noll, Basak Gülecyüz, Eckehard Steinbach; Chair ofMedia Technology, Technical University of MunichAddressed Requirements and Test Conditions (see Section 4.2.1): Test condition 1: test datatracesSummary of Proposal: The proposed codec uses a perceptual approach with a DWT andsubsequent quantization. The quantizer is designed to be adaptive considering a psychohapticmodel. After quantization we further compress using the SPIHT algorithm and generate thebitstream. The whole process is modulary, hence the encoder can work with any psychohapticmodel. This allows for future enhancements.Comments on Relevance to CfC: Fully in line with the CfC转自:http://www.7daixie.com/2019042913633718.html
网友评论