• 検索結果がありません。

東北大学機関リポジトリTOUR

N/A
N/A
Protected

Academic year: 2021

シェア "東北大学機関リポジトリTOUR"

Copied!
7
0
0

読み込み中.... (全文を見る)

全文

(1)

A Systolic Memory Architecture for Fast

Codebook Design based on MMPDCL Algorithm

Kentaro Sano

Chiaki Takagi

Ryusuke Egawa

Kenichi Suzuki

Tadao Nakamura

Graduate School of Information Sciences, Tohoku University

E-mail:

{kentah,chiaki,egawa,suzuki,nakamura}@archi.is.tohoku.ac.jp

Abstract

Vector quantization with an adaptive codebook is at-tractive for lossy data compression. During the last few decades, architectures have been proposed to accelerate adaptive codebook design that requires a huge amount of computation. However, they are mainly based on Koho-nen competitive learning algorithm or LBG algorithm that have an essential problem, the under-utilization problem. This paper presents a systolic memory architecture for high-speed codebook design based on MMPDCL algorithm not suffering from the under-utilization problem. We modify MMPDCL algorithm to exploit parallelism and implement with simple hardware. Simulation results demonstrated that the modified MMPDCL algorithm can give codebooks with comparable MSEs to the original MMPDCL algorithm. Keywords – systolic memory architecture, vector quantiza-tion, codebook design, MMPDCL algorithm

1. Introduction

Lossy data compression of images[3][4], movies, and volumes[8]has recently been a key technology for such data to be efficiently stored in media and/or transmitted through networks. Vector quantization(VQ)[2] is one of the attrac-tive techniques for lossy data compression, which can the-oretically overcome scalar quantization(SQ). According to Shannon’s rate distortion theory, VQ can always provide a better compression performance than any other coding scheme using SQ. By binding scalars into a vector, VQ can exploit a larger amount of coherence in data than SQ.

A vector quantizer maps input vectors into one repre-sentative codeword in a codebook. This procedure causes distortion, which is an error of quantization. Large distor-tion seriously degrades the quantizadistor-tion quality. Since the distortion is usually given by the distance between an input vector and its corresponding codeword, there exists an opti-mal codebook that presents a minimized average distortion. An optimal codebook is utilized by adaptive VQ. Adap-tive VQ has two stages, a codebook design stage and an

en-†Corresponding author.

coding stage. In the codebook design stage, a codebook is generated adaptively for a given set of input vectors. Then the encoding stage performs vector quantization with this designed codebook. Adaptive VQ is important for many applications in terms of quantization errors, however, the huge amount of computation required by codebook design has limited practical use of adaptive VQ.

So far, various architectures have been proposed to accel-erate the encoding stage of VQ[5][9][11][12][14]. Some of them are based on a systolic array architecture to highly ex-ploit parallelism of encoding[5][11][14]. They parallelize iterations of nearest neighbor search within a codebook. Codebook design also involves the nearest neighbor search of the encoding stage, and hence the systolic architecture can be applicable to codebook design.

Panchanathan et al. proposed a systolic memory array architecture for adaptive VQ[10]. This architecture enabled both the codebook design stage and the encoding stage upon the same hardware without any overhead of updating a codebook. However, it is based on LBG algorithm that sometimes suffers from locally optimum codebooks due to the under-utilization problem[13], like Kohonen com-petitive learning algorithm. The under-utilization problem gives codewords that are not efficiently used for quantiza-tion, thus resulting in an increased quantization error.

Zhu et al. proposed MMPDCL (mini-max par-tial distortion competitive learning) algorithm to effi-ciently design an optimal codebook without the under-utilization problem[15]. In addition to the two known conditions, Voronoi partition and Generalized centroid assignment[1][2][7], this algorithm relies on the minimax partial distortion criterion to avoid the under-utilization problem. A partial distortion is introduced into a distor-tion between an input vector and a codeword, and the near-est neighbor search is performed so that partial distortions of codewords are equalized. We focus on MMPDCL algo-rithm as a base algoalgo-rithm for codebook design in terms of designed codebook quality.

This paper proposes a systolic memory architecture for codebook design based on MMPDCL algorithm to achieve high-speed and optimal codebook design simultaneously. We modify MMPDCL algorithm to exploit as much par-allelism as possible with simple hardware. The proposed

(2)

architecture consists of an array of codeword cells, partial distortion calculation units, and comparators. The organiza-tion locally interconnecting these components is expected to match the deep-submicron CMOS technology generations with high integration and much influences of wire delay.

We give a concept proof of our approach by software simulation and prototype design. We will demonstrate that the modified MMPDCL algorithm can give codebooks with comparable quantization errors to the original algorithm, despite of algorithmic modifications. We design a proto-type codeword cell to obtain its clock cycle time. Through these results, we estimate the performance of the proposed architecture for adaptive VQ.

This paper is organized as follows. Section 2 briefly reviews vector quantization and codebook design by us-ing MMPDCL algorithm. Section 3 describes our systolic memory architecture and modifications of MMPDCL al-gorithm for parallelism exploitation. Section 4 discusses potential performance of this architecture through software simulation and prototype design. Finally, Section 5 gives conclusions and future work.

2. Codebook Design based on MMPDCL

Algo-rithm

2.1. Codebook design for VQ

A vector quantizer maps vectors in the k-dimensional

Euclidean spaceRk into one of the finite number of rep-resentative vectors in Rk. Each representative vector is referred to as a codeword, and a set of codewords is re-ferred to as a codebook. Vector quantizer is specified by a codebookY = {y1, y2, ..., yN} and its associated

par-titionS = {S1, S2, ..., SN} that divides the input vector

space intoN disjoint subspaces. Given a set of input

vec-tors, it is quantized by replacing all input vectors inSiwith yi. This replacement is called reproduction, and expressed

as

Q(x) = yi if x ∈ Si for i = 1, 2, ..., N (1)

wherex = {x1, x2, ..., xk} ∈ Rkis ak-dimensional input

vector. Distortiond(x, y) caused by reproducing x with y

is commonly the Euclidean norm given by

d(x, y) = x − y. (2)

Letp(x) be probability distribution of x. The mean squared

error (MSE) between input vectors and their corresponding codewords is defined as M SE = N  i=1  Si d2(x, yi)p(x)dx = N  i=1 Di, (3) whereDi is the contribution ofyi to MSE, called partial

distortion.

To obtain an optimal vector quantizer, MSE should be minimized with respect to partitionS and codewords Y .

The partial distortion theorem[2] denotes that each partial distortion makes an equal contribution to the minimized MSE with sufficiently large N . Usually we have to solve

this minimization problem by a learning procedure with a given set of input vectors, because no analytical solution is generally available for most of applications wherep(x) is

generally unknown.

2.2. Kohonen competitive learning

Kohonen competitive learning algorithm is one of the representative neural vector quantizers based on a

winner-takes-all rule[6]. The competitive learning process of

Koho-nen algorithm solves the minimization problems as follows. Prior to learning, codebookY is initialized with some dis-tribution of vectors. Then the following learning steps are repeated. For input vectorxk, such codewordyi∗ is found

as a winner thatxk ∈ Si∗ whereSi∗ is the Voronoi region

ofyi. Codewordyihaving minimumd(xk, yi) satisfies

this condition. The process to find a winner is called

com-petition. Next, winneryi is updated based on:

yi := yi + α(xk− yi∗) (4)

whereα is a learning rate, which is a positive constant for

codebook adaptability to inputs.

As these learning steps proceed, codewords are approx-imating the probability distribution of input vectors. How-ever, the Kohonen learning has an inherent problem, i.e., the

under-utilization problem. Inadequate distribution of initial

codewords gives several codewords extremely distant from any inputs, which cannot be utilized for quantization. Such codewords have no opportunity to be moved by input vec-tors. As a result, distortion is not reduced due to the exis-tence of under/never-utilized codewords.

2.3. MMPDCL algorithm

In order to avoid the local optimality such as the under-utilization problem, MMPDCL algorithm[15] was proposed based on the equidistortion principle[13], which is to even out all of partial distortions. Codebook design based on this principle can avoid the local optimality, and thereby give a codebook with a smaller MSE than Kohonen algorithm.

For the equidistortion principle, competition of MM-PDCL algorithm is performed according to the minimax

partial distortion criterion that minimizes the maximum

partial distortion among codewords. Instead of a squared Euclidean norm, MMPDCL algorithm uses the following distortion biased with an online partial distortion:

d2M(x, yi) = pdi(t − 1)e−m/T + x − yi2 (5)

wheret denotes a time that is incremented every input, and m is the sweep index. Term pdi(t) is an online partial

dis-tortion at time t. A sweep is one full learning sequence

for the whole set of input vectors. T (> 0) is a constant to

(3)

exponential term, called an attenuation term, attenuates the effects of partial distortions as learning proceeds.

By using this biased distortion,yi is selected as a

ner in the similar way to Kohonen algorithm. Then the win-ner is updated based on:

yi := yi∗+ αi∗(t)(xk− yi∗). (6)

Learning rateαi∗(t) is 1/ui∗(t) where ui∗(t) is the number

of times that codewordyi has won until timet. Actually,

countersui∗(t) are reset to a gradually increasing number,

2 + m/10, when sweep index m equals 2I whereI

de-notes an integer. After the winner is moved, its online par-tial distortion is updated as:

pdi∗(t) = pdi(t − 1) + xk− yi∗2. (7)

Thanks to the online partial distortion term of Eq.(5), MM-PDCL algorithm prevents the codeword with the maximum partial distortion from winning competition, and indirectly reduces the variance among partial distortions by minimiz-ing the maximum partial distortion. Consequently, equidis-tortion is achieved. Note that the online partial disequidis-tortion term of Eq.(5) is attenuated as the learning steps proceed. This means that equidistortion is activated only in the early sweeps to escape from the local optimality.

This paper presents a dedicated processor to codebook design based on MMPDCL algorithm. To achieve high-performance codebook design, we exploit as much paral-lelism as possible. Within MMPDCL algorithm, compe-tition has to be performed after updating a codebook ac-cording to the result of the previous competition, and there-fore competitions for successive sweeps cannot originally be carried out in parallel. However, if we release this rig-orousness, we can exploit parallelism of competitions be-tween successive sweeps.

On the other hand, we have to consider limited hardware resources. Naive implementation of MMPDCL algorithm is likely to require complicated and large-scale circuits for online partial distortions and variable learning-rates. Since it is preferred to implement necessary functions by using as small resources as possible, we have to modify the algo-rithm for simple and smart implementation.

3. Systolic Memory Architecture for

MM-PDCL Algorithm

3.1. Modified MMPDCL algorithm

Based on the considerations described in the previous section, we modify MMPDCL algorithm. To exploit paral-lelism among sweeps, we alleviate the rigorousness of code-book update by introducing deferred update. Deferred up-date allows competition to be performed before a codebook is completely updated for the previous competition. Con-sequently, a part of competition computation can be carried out with a codebook that is not completely updated, while

0 1 00 200 Sweeps 10–4 10–3 10–2 10–1 100 Attenuation factor exponential (original) bit–shift (descrete impl.)

Figure 1. Exponential attenuation and discrete approximation by bit-shift.

the previous competition is performed simultaneously. We describe the details of deferred update in Section 3.4.

Next, we simplify the complicated learning-rates and the exponential attenuation of partial distortions for simple hardware implementation. As a learning rate, we simply use a constant that is empirically determined. For an atten-uation term, we use bit-shifting instead of an exponential function. We assume that fixed-point numbers are used for hardware implementation. Thex-bit right shift of a

fixed-point number is equal to multiplication with0.5x. Instead ofpdi(t−1)e−m/T in Eq.(5), we shift a fixed-point number ofpdi(t − 1) to right by x-bits, and increase x by one every i sweeps to approximate e−m/T, wherei is 1/16 of the

to-tal sweeps. Note thatT is a constant value which is 10% of

the total sweeps. Fig.1 shows the curves ofe−m/T and the bit-shifting attenuation for the total sweeps of 256. Section 4 will show that comparative MSEs are obtained with these modifications.

3.2. Systolic memory architecture

In order to exploit parallelism of competition, we adopt a systolic memory architecture that has the following advan-tages.

1. Scalable pipelining with processing elements 2. Memories localized to processing elements 3. No global structure

First, a systolic array of processing elements processes input data in parallel while they pass through the ar-ray. Second, memories localized to processing elements, called a functional memory, address the linchpin of high-performance processing; bandwidth of data supply. Pro-cessing elements closely coupled with memory cells can exploit wide bandwidth for data access, and essentially match exploiting data parallelism of competitive learning. The conventional architecture having separated memory and functional units suffers from narrow bandwidth, and therefore the performance improvement is limited. A

(4)

sys-cycle

y

11

y

12

y

13

y

14

y

21

y

22

y

23

y

24

y

31

y

32

y

33

y

34

y

41

y

42

y

43

y

44

x

1l

x

12

x

13

x

14

x

21

x

22

x

23

x

24

x

31

x

32

x

33

x

41

x

42

x

51 input vector

x

1 codeword

y

1

codebook memory of systolic array

d2 d2 d2 d2

delay

comp comp comp comp

pd1calc

unit pd2unitcalc pd3unitcalc pd4unitcalc

d

2 & winner ID

winner ID for update control

d2 & cw ID d2 d M2 dM 2 codeword cell

Figure 2. Overview of systolic memory architecture for MMPDCL algorithm (4 × 4 codeword cells).

tolic array structure also mitigates a bottleneck of data sup-ply from external memories by enabling re-use of input data passing through an array. Third, the structure with-out global resources is not influenced by wire delay that is expected to prevent performance improvement in deep sub-micron CMOS technology generations. According to these advantages, the systolic memory architecture has a potential to achieve high-performance codebook design.

3.3. MMPDCL processor overview

We propose an MMPDCL processor based on the sys-tolic memory architecture. Fig.2 shows an overview of this processor. The MMPDCL processor consists of three com-ponents; a codebook memory, partial distortion (pd) calcu-lation units and comparators.

The codebook memory computes distortionsd2between codewords and input vectors. The codebook memory is a systolic array of functional memory cells, called codeword

cells. Each codeword cell stores an element of a codeword,

and each column of the array corresponds to a codeword. Letm denote dimension of a codeword. Each column

con-tainsm codeword cells. The number of columns is equal to

the number of codewords, i.e., the size of a codebook. The MMPDCL processor in Fig.2 is an example with four 4-dimensional codewords. The elements of input vectors are input to the array in a skewed fashion as depicted in Fig.2. After a certain cycles delay, each column of the array out-puts a distortion between each codeword and an input vector every clock cycle.

Each column of the systolic array is connected to apd

calculation unit. This unit stores an online partial distortion of each codeword. The online partial distortion is added

to the distortion output from the corresponding column af-ter it is attenuated as shown in Eq.(5). Then the original distortiond2and the biased distortion d2M are output to a

comparator.

Comparators forming a 1-dimensional array search the ID of the codeword with minimumd2M. Each comparator

comparesd2M of its column with an output of the previous comparator. Ifd2M of the column is smaller, its ID,d2and d2M of the column are output to the next comparator. If

not, those of the previous comparator are output to the next. For the first comparator, the maximum value expressed in a given numeric format is used as a value ofd2M passed from

a previous comparator. Thus, the final comparator outputs the ID andd2of the winner.

The winner ID is returned to the codebook memory to control the deferred update of the winner based on Eq.(4). The winner ID and d2 are also used to update the online partial distortion of the winner according to Eq.(7). The se-ries of these operations are successively performed to input vectors in a pipelined fashion, and thereby learning for an input vector completes every clock cycle.

3.4. Systolic array of codeword cells

A codeword cell has three inputs: an element of an input vector, its delayed propagation and a cumulative distortion. The input vector’s element and its delayed propagation are input from the left cell. The cumulative distortion is input from the upper cell. A cell computes a squared difference between the input vector’s element and a codeword’s ele-ment stored in the cell. The squared difference is added to the cumulative distortion from the upper cell, and then output to the lower cell. Simultaneously, the input vector’s element and its delayed propagation themselves are sent to

(5)

x

kl SUB MUL

y

ij ADD MUL ADD

α

update control

x

(k-d)l SUB

d

ij2

Figure 3. Codeword cell.

d

2 MUL ADD

pd

i n-bit R-SHIFT shamt ADD ADD n

(

1

)

2 d2 of winner 1 28 22 di2 dM2i di2 22

Figure 4. Pd calculation unit.

IDi 28 22 di2 dMi2 MUX 1 0 MUX 1 0 MUX 1 0 a < b b a 22 28 1 IDi+1 di+21 dMi+2 1 28 22 IDi-1 d2 i-1 Mi-1 d2 Figure 5. Comparator.

the right cell. We will mention the delayed propagation of the input vector’s element later.

Operations of all the cells are synchronized. By inputing elements of input vectors in a skewed fashion as depicted in Fig.2, the lowest cell of each column outputs a distortion between the corresponding codeword and an input vector every clock cycle. For example, celly14of Fig.2 outputs the

distortion between codewordy1 and input vectorx1 four clock cycles after the first element ofx1is input to celly11. In the next cycle, cellsy14 andy24 outputd(x2, y1) and

d(x1, y2), respectively.

A block diagram of codeword cellyijis shown in Fig.3.

The cell has four registers for a codeword element(yij),

a cumulative distortion(d2ij), an input vector element(xkl),

and a delayed input vector element(x(k−d)l). There exist

two paths for squared difference computation and deferred update. The right path fromyijandxkltod2ijis for squared

difference computation and its addition to the cumulative distortion. The left path includingyijandx(k−d)lis for

de-ferred update. Here,α is a constant learning rate. Note that

the multiplication withα can be implemented as bit-shifting

ifα is a constant of 2−Iwhere I is an integer.

The update control in Fig.3 controls writing registeryij

for codeword update based on Eq.(4). As mentioned in the previous section, a winner ID is returned to the most upper-left cell, and propagated to the successive cells in row and column directions. When the update control circuit receives a winner ID that matches the codeword ID of the cell, reg-isteryijis written withyij+ α(x(k−d)l− yij).

Letn denote the number of columns. This update is

car-ried out for an input vector(n + m + 1) cycles after the first element of the input vector comes to the column of the up-dated codeword. This is the deferred update of the modified MMPCDL algorithm. Until deferred update begins, succes-sive(n + m) input vectors are input to the array and pro-cessed. This means that cells concurrently perform deferred update for thei-th input vector and squared difference

com-putation of the(i + n + m + 1)-th input vector. In Fig.2, the beginning of the delayed propagation line of an input vector element has a shift register withd = (n + m + 1) entries

Figure 6. Lena test image.

to delay the input forx(k−d)l. We will discuss the influence

of deferred update upon an MSE of a designed codebook in Section 4.

3.5. Partial distortion calculation units

Thepd calculation unit computes a biased distortion by

adding a distortion and an online partial distortion stored in this unit. Fig.4 shows a block diagram of the pd

calcula-tion unit. The unit has three registers for an online partial distortion(pdi), a shift amount(shamt), and a distortion of

a winner from the corresponding input vector(d2). A distor-tion between a codeword and an input vector is input asd2i.

Then biased distortiond2M is computed by addingd2i topdi

afterpdiis attenuated by bir-shifting. The shift amount is a counter to approximate the exponential attenuation term in Eq.(5) with bit-shifting ofpdi in a fixed-point format. Fi-nally,d2M andd2i are output to a comparator.

The distortion of the winner, d2, is used to update its online partial distortion based on Eq.(7). The winner ID andd2are sent from the end of comparators to the most left

pd calculation unit, and propagate through the units. When

(6)

0 100 200 Sweeps 0 500 1000 1500 2000 MS E

variable learning rate learning rate = 0.5 learning rate = 0.25 learning rate = 0.125 learning rate = 0.0625 0 100 200 Sweeps 0 500 1000 1500 2000 MS E MMPDCL Modified MMPDCL Kohonen

Figure 7. MSEs by different learning rates. Figure 8. Sweeps vs. MSE. as the unit’s ID,pdiis updated by usingd2. Sinced2is the

distortion of the winner that is not moved yet, it should be multiplied by(1 − α)2.

3.6. Comparators

Comparators search the winner with minimumd2M. Each

comparator is connected to the next one via pipeline regis-ters. Fig.5 shows a block diagram of the comparator. The comparator takes six inputs. They are a codeword ID, an original distortion and a biased distortion from the corre-sponding pd calculation unit, and those from the previous

comparator. To achieve the function described in Section 3.3, the comparator has three multiplexors controlled by a comparison result ofd2M and that from the previous

com-parator.

4. Discussions

This section discusses MSEs of codebooks designed by the MMPDCL processor and estimates its computational performance.

4.1. Software simulation

The modified MMPDCL algorithm implemented on the processor may introduce some negative influences upon MSEs of designed codebooks because of the deferred up-date, fixed-point computation, attenuation approximated by bit-shifting and a constant learning rate. We evaluate their influences by comparing results of simulations of the MM-PDCL processor.

We simulated the MMPDCL processor for image com-pression based on VQ. A set of input vectors was given by dividing a test image, and used to design a codebook. MSEs of quantization using the codebook were measured, and an average of ten results was evaluated. Fig.6 shows a test im-age, Lena, which is a 8-bit gray scale image with a reso-lution of256 × 256 pixels. This image was divided into

42-pixel blocks forming 4096 16-dimensional input

vec-tors. We used a codebook with 512 codewords to give compression rate of about 0.2 (original image;64KB, code-book;8KB, block indices for compressed image;4.5KB).

Fig.7 shows difference in MSEs for various learning rate,

α, of MMPDCL algorithm. Although decreasing speed of

MSEs is different, constant learning rates less than 0.5 give comparable MSEs to the original learning rate. Accord-ingly, the MMPDCL processor implementation with an em-pirically selected constant learning rate is a feasible choice if we calculate enough sweeps for applications. For the fol-lowing simulations, we used a learning rate of 0.125 giving the smallest MSE, which can be implemented as 3-bit right shifting.

Fig.8 shows MSE difference among codebook design al-gorithms. We designed codebooks for image compression of Lena, by using the following algorithms.

1. Kohonen algorithm 2. MMPDCL algorithm

3. Modified MMPDCL algorithm

Algorithms 1 and 2 are original algorithms themselves where all computations are done in a floating-point format. For the original Kohonen algorithm, we used a constant learning rate of 0.125. Algorithm 3 is the modified MM-PDCL algorithm including fixed-point calculation, deferred update, a constant learning rate, and attenuation factor ap-proximated by bit-shifting. Algorithm 3 corresponds to the MMPDCL processor. Considering fixed-point calculation with a sufficient precision for image compression, we made choices of 8 bits-integer and 4 bits-decimal, 18 bits-integer and 4 bits-decimal, and 24 bits-integer and 4 bits-decimal formats for codeword elements,d2, andd2M, respectively.

The result in Fig.8 shows that MMPDCL algorithms are definitely superior to the Kohonen algorithm in terms of MSEs of the final codebook. The MMPDCL algorithm with deferred update (Algorithm 3) has slower convergence than the original one. This is because deferred update causes wrong winner selection. However, their final MSEs are

(7)

comparable to that of the original MMPDCL algorithm, though this may depend on property of input vectors. We obtained similar results for other test images. As a result of these discussions, we conclude that the modified MM-PDCL algorithm for the processor can design codebooks with MSEs superior to Kohonen algorithm and comparable to the original MMPDCL algorithm.

4.2. Performance estimation

We have designed a codeword cell to estimate perfor-mance of a codebook memory. The Synopsys design com-pileroutput the critical path of this codeword cell. Under the ROHM 0.35µm CMOS technology, the critical path is

the right data path for the squared difference calculation of Fig.3, and its maximum delay is 15.64 nsec. This means that codeword cells can operate at 60MHz. The other units seem to be able to operate at more than 60MHz because their data paths are simpler than the critical path of a code-word cell. If this is true and we utilize16 × 512(= 8192) cells for 512 16-dimension codewords, 3.072 × 1010( 512 × 60 × 106) distance calculations are available in one

second under a steady pipelining.

Let’s suppose that42-pixel blocks of an image with2562 pixels are quantized with 512 codewords. The number of in-put vectors in one image is 8192. If we assume 256 sweeps to obtain the final codebook, the total number of distortion calculations is512×8192×256  1.073×109. Therefore, a frame rate is 3.072×101.073×10109  28.6. This processing speed can

be regarded as almost real-time codebook design. Data-path optimization of a codeword cell is our future work.

5. Conclusions

In this paper, we have proposed the MMPDCL proces-sor based on a systolic memory architecture to achieve high-speed and optimal codebook design. MMPDCL al-gorithm was modified to exploit parallelism and simplify hardware implementation. The software simulations have demonstrated that the modified MMPDCL algorithm can design codebooks with comparable MSEs to the original MMPDCL algorithm. The prototype design of a codeword cell resulted in possibility of operating at 60MHz. The es-timated codebook design performance was 28.6 frames per second for compression of 2562-pixel images, which can be regarded as real-time processing. Note that the MM-PDCL processor is completely scalable to the number of codewords due to the fully pipelined structure without cen-tralized resources.

Our future work is as follows. 1) Detailed evaluation with various types of input data, 2) Design ofpd calculation

units, comparators, and mechanisms for input vector supply, 3) Folding of the systolic memory array to give flexibility to the different number of codewords, and 4) Prototyping with FPGA or ASIC.

This work is supported by VLSI Design and Education

Cen-ter(VDEC), the University of Tokyo in collaboration with Synopsys, Inc.

Acknowledgments

This research was partially supported by Grant-in-Aid for Young Scientists(B) KAKENHI(#13780183) and Grant-in-Aid for Scientific Research(B) KAKENHI(#14380131).

References

[1] A.Gersho. Asymptotically optimal block quantization.

IEEE Transactions on Information Theory, 25(4):373–380, 1980.

[2] A.Gersho and R.M.Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992. [3] B.Ramamurthi and A.Gersho. Classified vector quantization

of images. IEEE Transactions on Communications, COM-34(11):1105–1115, November 1986.

[4] C.Amerijckx, M.Verleysen, P.Thissen, and J.D.Legat. Image compression by self-organized kohonen map. IEEE Trans-actions on Neural Networks, 9(3):503–507, May 1998. [5] G. A. Davidson, P. R. Cappello, and A. Gersho. Systolic

architectures for vector quantization. IEEE Transactions

on Acoustics, Speech, and Signal Processing, 36(10):1651– 1664, October 1988.

[6] T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag, Berlin Heidelberg, 1989.

[7] Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1):84–95, January 1980.

[8] P. Ning and L. Hesselink. Vector quantization for volume rendering. Proceedings of 1992 Workshop on Volume Visu-alization, pages 69–74, October 1992.

[9] T. Nozawa, M. Konda, M. Fujibayashi, M. Imai, K. Kotani, and S. S. T. Ohmi. A parallel vector-quantization proces-sor eliminating redundant calculation for real-time motion picture compression. IEEE Journal of Solid-State Circuits, 35(11):1744–1751, November 2000.

[10] S. Panchanathan and M. Goldberg. A systolic array archi-tecture for image coding using adaptive vector quantization. IEEE Transactions on Circuits and Systems for Video Tech-nology, 1(2):222–229, June 1991.

[11] P. A. Ramamoorthy, B. Potu, and T. Tran. Bit-serial vlsi im-plementation of vector quantizer for real-time image coding. IEEE Transactions on Circuits and Systems, 36(10):1281– 1290, October 1989.

[12] K. Tsang and B. W. Y. Wei. A vlsi architecture for a real-time code book generator and encoder of a vector quantizer. IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2(3):360–364, September 1994.

[13] N. Ueda and R. Nakano. A new competitive learning ap-proach based on an equidistortion principle for designing optimal vector quantizers. Elsevier Journal on Neural Net-works, 7(8):1211–1227, 1994.

[14] C.-L. Wang and K.-M. Chen. A new vlsi architectures for full-search vector quantization. IEEE Transactions on Cir-cuits and Systems for Video Technology, 6(4):389–398, Au-gust 1996.

[15] C. Zhu and L.-M. Po. Minimax partial distortion competitive learning for optimal codebook design. IEEE Transactions on Image Processing, 7(10):1400–1409, 1998.

Figure 1. Exponential attenuation and discrete approximation by bit-shift.
Figure 2. Overview of systolic memory architecture for MMPDCL algorithm ( 4 × 4 codeword cells).
Figure 4. Pd calculation unit.
Figure 7. MSEs by different learning rates. Figure 8. Sweeps vs. MSE.

参照

関連したドキュメント

We consider a parametric Neumann problem driven by a nonlinear nonhomogeneous differential operator plus an indefinite potential term.. The reaction term is superlinear but does

This paper deals with the a design of an LPV controller with one scheduling parameter based on a simple nonlinear MR damper model, b design of a free-model controller based on

Oscillatory Integrals, Weighted and Mixed Norm Inequalities, Global Smoothing and Decay, Time-dependent Schr¨ odinger Equation, Bessel functions, Weighted inter- polation

In the study of dynamic equations on time scales we deal with certain dynamic inequalities which provide explicit bounds on the unknown functions and their derivatives.. Most of

In this paper, we we have illustrated how the modified recursive schemes 2.15 and 2.27 can be used to solve a class of doubly singular two-point boundary value problems 1.1 with Types

We have formulated and discussed our main results for scalar equations where the solutions remain of a single sign. This restriction has enabled us to achieve sharp results on

In this work, we have applied Feng’s first-integral method to the two-component generalization of the reduced Ostrovsky equation, and found some new traveling wave solutions,

Thus, we use the results both to prove existence and uniqueness of exponentially asymptotically stable periodic orbits and to determine a part of their basin of attraction.. Let