Expectation Propagation Decoding for Sparse Superposition Codes

(1)

1666 IEICE TRANS. FUNDAMENTALS, VOL.E103–A, NO.12 DECEMBER 2020

LETTER

Expectation Propagation Decoding for Sparse Superposition Codes

Hiroki MAYUMI^†,Nonmember andKeigo TAKEUCHI^†^a),Member

SUMMARY Expectation propagation (EP) decoding is proposed for sparse superposition coding in orthogonal frequency division multiplexing (OFDM) systems. When a randomized discrete Fourier transform (DFT) dictionary matrix is used, the EP decoding has the same complexity as approximate message-passing (AMP) decoding, which is a low-complexity and powerful decoding algorithm for the additive white Gaussian noise (AWGN) channel. Numerical simulations show that the EP decoding achieves comparable performance to AMP decoding for the AWGN channel.

For OFDM systems, on the other hand, the EP decoding is much superior to the AMP decoding while the AMP decoding has an error-floor in high signal-to-noise ratio regime.

key words: sparse superposition codes, orthogonal frequency division multiplexing (OFDM), discrete Fourier transform (DFT) dictionary, ap- proximate message-passing, expectation propagation

1. Introduction

Sparse superposition (SS) codes [1]–[3] are an error- correcting code achieving the Shannon capacity of the additive white Gaussian noise (AWGN) channel. A codeword of an SS code is generated as the multiplication of a dense dictionary matrix by a sparse information vector. Thus, the codes are called SS codes.

Approximate message-passing (AMP) decoding[4],[5]

is a low-complexity and capacity-achieving algorithm for SS codes with zero-mean independent and identically dis- tributed (i.i.d.) dictionary matrices. Numerical simulations in[4]showed that, when a randomized Hadamard dictionary matrix is used instead, AMP achieves good performance comparable to the case of zero-mean i.i.d. Gaussian matrices. Hadamard dictionary matrices allow us to implement low-complexity encoding and decoding of SS codes.

A limitation of AMP is that it fails to converge when the dictionary matrix is ill-conditioned[6]. This convergence issue is practically important since fading in wireless com- munication systems[7]might convert the dictionary matrix into an ill-conditioned effective dictionary matrix. The purpose of this letter is to propose a novel decoding algorithm that converges in fading channels.

As an important example of fading channels, we consider orthogonal frequency division multiplexing (OFDM).

When SS coding is performed across OFDM subcarriers, Manuscript received May 13, 2020.

Manuscript revised June 24, 2020.

Manuscript publicized July 6, 2020.

†The authors are with the Department of Electrical and Elec- tronic Information Engineering, Toyohashi University of Technol- ogy, Toyohashi-shi, 441-8580 Japan.

a) E-mail: [email protected] DOI: 10.1587/transfun.2020EAL2053

the effective dictionary matrix is the product of the original dictionary matrix and a diagonal matrix that consists of channel gains in frequency domain. Fading makes the diagonal matrix ill-conditioned.

We extend expectation propagation (EP) [8] in com- pressed sensing to the decoding issue of SS codes. EP can be regarded as a Bayes-optimal version of orthogonal AMP[9]or equivalently vector AMP[10], which was orig- inally proposed in[11]. The main advantage of EP is the Bayes-optimality for all unitarily invariant matrices[8],[10], including ill-conditioned effective dictionary matrices. Nu- merical simulations in this letter show that the proposed EP decoder has good convergence properties for ill-conditioned effective dictionary matrices, where AMP without damping fails to converge.

2. System Model

2.1 OFDM

We consider OFDM transmission of block lengthN_b[7]. A complex codeword of lengthN >N_bis sent overK=N/N_b OFDM blocks. For simplicity, we assume thatNis divisible by N_b. The frequency-domain received vectory_k ∈C^N^bin OFDM blockk∈ {0, . . . ,K−1}is given by

y_k =Λ_kc_k +w_k, w_k ∈ CN(0,N₀I_N_b). (1) In (1), ck ∈ C^N^b is part of a codeword c ∈ C^N in frequency domain, i.e. c = (c^T₀, . . . ,c^T_K−₁)^T. The vectors {w_k}are independent AWGN vectors with varianceN₀. The nth diagonal element λ_k,n of the complex diagonal matrix Λ_k =diag(λ_k,0, . . . , λ_k,_N_b−1)represents the channel gain of subcarriern∈ {0, . . . ,N_b−1}in OFDM blockk. Assuming the use of cyclic prefix longer than the delay spread of the fading channels, we have

λ_k,n=

N_b−1

X

p=0

h_k,pe⁻^2πjpn/N^b, (2)

whereh_k,p ∈Cdenotes the time-domain channel gain of the pth resolvable path in OFDM blockk.

2.2 SS Coding

(2)

LETTER

1667

is generated as the multiplication of a complex dictionary matrix D ∈ C^N^{×M L} by an information vector β = (β^T[0], . . . ,β^T[L−1])^T. We assume the uniform power allocation and write the average power as P > 0. Lete_m denote themth column of theM×Midentity matrix. The information vectorβ[l]∈C^Min sectionlis a 1-sparse vector

√

PMem, in which the messagemis sampled from the index set{0, . . . ,M−1}uniformly and randomly. Since we have Mdifferent codewords per section, the transmission rate per complex channel use is defined as

R= L

N log₂M. (3)

We use a randomized discrete Fourier transform (DFT) dictionary matrixDobtained by selectingNdifferent rows from the rows of an M L×M LDFT matrix uniformly and randomly. The norm of each row is normalized to 1. Since E[ββ^H]=PIM Lholds, this normalization implies the average power constraint

1 NEβ

fkck²g

= 1 NTr

DEf ββ^Hg

D^H

=P. (4) The DFT dictionary matrix allows us to implement an efficient SS encoder. When the fast Fourier transform is used, the computational complexity in encoding isO(M LlogM L) sinceNis smaller thanM Lin general.

3. Expectation Propagation Lety= (y^T₀, . . . ,y^T

K−1)^Tandw=(w^T₀, . . . ,w^T

K−1)^Tin (1).

To propose EP decoding, we rewrite the SS-coded OFDM system (1) as

y=Aβ+w, A=ΛD, (5)

withΛ=diag(Λ₀, . . . ,Λ_K−₁). The purpose of the decoder is to estimate the information vectorβfrom the knowledge about the received vectoryand the effective dictionary ma- trixA.

As derived in Appendix, the proposed EP decoder consists of two modules—called modules A and B. Suppose that the extrinsic meanβ_B_→_A,t ∈C^{M L}and variancev_B→A,t >0 of the information vectorβhave been passed from module B to module A. The module A uses the linear minimum-mean square error (LMMSE) filter to compute the extrinsic mes- sagesβ_A_→_B,t ∈C^{M L}and variancev_A→B,t>0.

β_A_→_B,t=β_B_→_A,t+γ_tA^HΞ⁻_t¹(y−Aβ_B_→_A,t), (6)

v_A→B,t=γ_t−v_B→A,t, (7)

with

Ξ_t =N₀IN+v_B→A,tAA^H, (8) γ_t⁻¹= 1

M LTr

Ξ⁻_t¹AA^H−1. (9) In the initial iteration,β_B_→_A,0 =0andv_B→A,0=Pare used.

Remark 1: Module A requires the matrix inversionΞ⁻_t¹in

(6). While the singular-value decomposition (SVD) of A allows us to circumvent this high-complexity matrix inversion[10], the SVD itself needs high complexity in general.

Fortunately, this complexity issue does not occur in OFDM systems because the SVDA=ΛDis given explicitly.

Module B usesβ_A_→_B,t=(β^T_A_→_B,t[0], . . . ,β^T_A_→_B,t[L− 1])^T ∈ C^{M L} and v_A→B,t to compute the posterior mean β_B,t+1 = (β^T_B,t+1[0], . . . ,β_B,t+1^T [L−1])^T ∈ R^{M L}and variance v_B^t+1 > 0. Consider the virtual AWGN channel for sectionl

β_A_→_B,t[l]=β[l]+zt[l], zt[l]∼ CN(0, v_A→B,tIM).

(10) The posterior mean and variance are given by

β_B,t+1[l]=E

fβ[l]|β_A_→_B,t[l]g

, (11)

v_B,t+1 = 1 L M

L−1

X

l=0

E

fkβ[l]−β_B,t+1[l]k²

β_A_→_B,t[l]g . (12) Sinceβ[l] follows the uniform distribution on the discrete set {√

PMe₀, . . . ,√

PMe_M−1}, we have the explicit formulas, β_B,t+1,m=

√ PM e²

√

P M<[βA→B,t,m]/vA→B,t

PM−1 m⁰=0e²

√

P M<[β_A→B,t,m0]/vA→B,t

, (13)

v_B,t+1 =P− 1 L M

L−1

X

l=0

kβ_B,t+1[l]k². (14) In (13) and (14), β_A→B,t,mandβ_B,t+1,mdenote themth elements ofβ_A_→_B,tandβ_B,t₊₁, respectively. The notation<[z] means the real part of a complex numberz∈C.

Estimation of the information vector β[l] is based on the hard decision ofβ_B,t+1[l]. To improve the decoding performance, module B feeds the extrinsic messages β_B_→_A,t+1 andv_B→A,t+1back to module A,

β_B_→_A,t+1=v_B→A,t+1

β_B,t+1

v_B,t+1 −β_A_→_B,t v_A→B,t

!

, (15)

1

v_B→A,t+1 = 1

v_B,t₊₁ − 1

v_A→B,t. (16)

The EP decoder may have a bad convergence property for finite-sized systems. To improve the convergence property, we replace the messages β_B_→_A,t+1 andv_B→A,t+1

with the damped messagesθβ_B_→_A,t+1+(1−θ)β_B_→_A,tand θv_B→A,t+1 +(1−θ)v_B→A,t for damping factor θ ∈ [0,1], respectively.

The proposed EP decoding withTiterations is presented in Algorithm 1. The computational complexity of the EP decoding is dominated by the updates in module A. For a DFT dictionary matrix, they can be computed inO(M LlogM L) time. Thus, the proposed EP decoding has the same complexity per iteration as AMP decoding[4],[5].

(3)

1668 IEICE TRANS. FUNDAMENTALS, VOL.E103–A, NO.12 DECEMBER 2020

Algorithm 1EP Decoding

Require:Received vectoryand the effective dictionary matrixA. 1: Letβ_B→A,0=0andv_B→A,0=P.

2: fort=0, . . .,T−1do 3: Computeγtgiven in (9).

4: Computeβ_A→B,tandv_A→B,tgiven in (6) and (7).

5: Computeβ_B,t₊₁andv_B,t₊₁given in (13) and (14).

6: Computeβ_B→A,t+1andv_B→A,t₊₁given in (15) and (16).

7: Updateβ_B→A,t+1←θβ_B→A,t+1+(1−θ)β_B→A,t. 8: Updatev_B→A,t+1←θv_B→A,t+1+(1−θ)v_B→A,t. 9: end for

10: Output hard decision ofβ_B,t+1.

4. Numerical Simulation

The EP decoding is compared to conventional AMP decoding[4],[5]in terms of section error rate (SER). We simulated damped AMP decoding with two damping factorsθ_A∈[0,1]

andθ_B ∈ [0,1] shown in Algorithm 2. The average power E_bper information bit is defined asE_b =P/R, with the rate Rgiven in (3).

We first compare the EP decoding with the AMP decoding for artificial fading channels. Letλ_n=[Λ]n,ndenote thenth diagonal element ofΛin (5). For condition number κ = λ₀/λ_N−1 ∈ [1,∞), we assume λ_n = dλ_n−₁ for d = κ^−(N−¹⁾⁻¹ and N⁻¹PN−1

n=0 λ²₀ =1. In particular, κ =1 implies the AWGN channelλ_n=1.

Figure 1 shows the SERs of the EP and AMP decoding for the artificial fading channels. The two algorithms are comparable to each other for the AWGN channelκ=1. The AMP decoding has poor performance when the condition number is larger than 2. On the other hand, the EP decoding has comparable SER to the AWGN channelκ=1 whenκis below 3. These results imply that the EP decoding is robust against ill-conditioned fading channels.

Algorithm 2AMP Decoding

Require:Received vectoryand the effective dictionary matrixA. 1: Letβ_B,0=0,v_B,0=P,β_A,−₁=0,v_A,−₁=P, andz−1=0. 2: fort=0, . . .,T−1do

3: Computezt=y−Aβ_B,t+v_B,tz_t−1/v_A,t−1. 4: Computeβ_A,t =Nβ_B,t/(M L)+A^Hzt. 5: Computev_A,t =N₀+v_B,t.

6: Updateβ_A,t ←θ_Aβ_A,t+(1−θ_A)β_A,t₋₁. 7: Updatev_A,t←θ_Av_A,t+(1−θ_A)v_A,t−₁. 8: Computeβ_B,t+1andv_B,t₊₁given in (13) and (14).

9: Updateβ_B,t₊₁←θ_Bβ_B,t₊₁+(1−θ_B)β_B,t. 10: Updatev_B,t+1←θ_Bv_B,t+1+(1−θ_B)v_B,t. 11: end for

12: Output hard decision ofβ_B,t₊₁.

We next compare the EP decoding with the AMP decoding in OFDM systems. We assume independent Rayleigh fadingh_k,p ∼ CN(0, σ²_p)in (2) with the exponential power decayσ²_p =Ce⁻^0.1^p, in whichC is the normalization constant to imposePN_b−1

p=0 σ²_p=1. The average condition number of this fading channel is very large, e.g. approximately

Fig. 1 SER versus the condition numberκfor the artificial fading channels. L=256,M =16,N=512,E_b/N₀=4 dB, 20 iterations,θ=0.9 in the EP decoding,θ_A=1 andθ_B=0.9 in the AMP decoding.

Fig. 2 SER versusE_b/N₀forL =256,M =16, andN =512. For OFDM systems with block length N_b=64, 50 iterations,θ=0.8 in the EP decoding, andθ_A=θ_B=0.5 in the AMP decoding were used. For the AWGN channel, 20 iterations,θ=0.9,θ_A=1, andθ_B=0.9 were used.

100 forN =512 andN_b =64.

Figure 2 shows the SERs of the EP and AMP decoding for both OFDM and AWGN channels. For the AWGN channel, the EP decoding is comparable to the AMP decoding for all signal-to-noise ratios (SNRs). For OFDM systems, the EP decoding is much superior to the AMP decoding while the AMP decoding has an error-floor in the high SNR regime. The poor performance of the AMP decoding is due to ill-conditioned effective dictionary matrices. The EP decoding has a good convergence property even for such an ill-conditioned case.

Acknowledgments

K. Takeuchi was in part supported by the Grant-in-Aid for Scientific Research (B) (JSPS KAKENHI Grant Number

(4)

LETTER

1669

18H01441), Japan.

References

[1] A. Joseph and A.R. Barron, “Least squares superposition codes of moderate dictionary size are reliable at rates up to capacity,” IEEE Trans. Inf. Theory, vol.58, no.5, pp.2541–2557, May 2012.

[2] A. Joseph and A.R. Barron, “Fast sparse superposition codes have near exponential error probability for R < C,” IEEE Trans. Inf.

Theory, vol.60, no.2, pp.919–942, Feb. 2014.

[3] Y. Takeishi, M. Kawakita, and J. Takeuchi, “Least squares superposition codes with Bernoulli dictionary are still reliable at rates up to capacity,” IEEE Trans. Inf. Theory, vol.60, no.5, pp.2737–2750, May 2014.

[4] C. Rush, A. Greig, and R. Venkataramanan, “Capacity-achieving sparse superposition codes via approximate message passing decoding,” IEEE Trans. Inf. Theory, vol.63, no.3, pp.1476–1500, March 2017.

[5] J. Barbier and F. Krzakala, “Approximate message-passing decoder and capacity achieving sparse superposition codes,” IEEE Trans. Inf.

Theory, vol.63, no.8, pp.4894–4927, Aug. 2017.

[6] S. Rangan, P. Schniter, A. Fletcher, and S. Sarkar, “On the convergence of approximate message passing with arbitrary matrices,”

IEEE Trans. Inf. Theory, vol.65, no.9, pp.5339–5351, Sept. 2019.

[7] D.N.C. Tse and P. Viswanath, Fundamentals of Wireless Communi- cation, Cambridge University Press, Cambridge, UK, 2005.

[8] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” IEEE Trans.

Inf. Theory, vol.66, no.1, pp.368–386, Jan. 2020.

[9] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol.5, pp.2020–

2033, Jan. 2017.

[10] S. Rangan, P. Schniter, and A.K. Fletcher, “Vector approximate message passing,” IEEE Trans. Inf. Theory, vol.65, no.10, pp.6664–6684, Oct. 2019.

[11] M. Opper and O. Winther, “Expectation consistent approximate in- ference,” J. Mach. Learn. Res., vol.6, pp.2177–2204, Dec. 2005.

Appendix: Derivation of EP Decoding

We follow [8] to derive the EP decoding. The purpose of EP is to compute the marginal posterior distribution p(β[l]|y,A) approximately. The marginal posterior distribution is approximated via a tractable distribution

q_A(β)∝p(y|A,β)

L−1

Y

l=0

q_B→A(β[l]), (A·1) with

q_B→A(β[l])∝exp −kβ[l]−β_B_→_A[l]k² v_B→A

!

, (A·2)

where f(x) ∝ g(x)means that there is an x-independent constant C > 0 such that f(x) = Cg(x) holds. Let β_B_→_A = (β_B_→_A[0]^T, . . . ,β_B_→_A[L − 1]^T)^T and β_A = (β_A[0]^T, . . . ,β_A[L −1]^T)^T. Following[8], we can evalu- ate the marginalization of (A·1) over{β[l⁰] :l⁰,l}as

q_A(β[l])∝exp −kβ[l]−β_A[l]k² v_A

!

, (A·3)

with

β_A=β_B_→_A+v_B→AA^HΞ⁻¹(y−Aβ_B_→_A), (A·4) v_A=v_B→A−γ⁻¹(v_B→A)v_B²_→_A, (A·5) Ξ=N₀IN +v_B→AAA^H, (A·6) γ⁻¹(v)= 1

L MTr

Ξ⁻¹AA^H

. (A·7)

The main difference between conventional[8]and proposed EP is in the update rule of q_B→A. We define the extrinsic distribution ofβ[l] as

q_A→B(β[l])∝ q_A(β[l])

q_B→A(β[l]). (A·8) We write the mean and variance of β[l] with respect to q_A→B(β[l])p(β[l])as

β_B[l]=

Pβ[l]β[l]q_A→B(β[l])p(β[l])

Pβ[l]q_A→B(β[l])p(β[l]) , (A·9) v_B[l]= 1

M

Pβ[l]kβ[l]k²q_A→B(β[l])p(β[l]) Pβ[l]q_A→B(β[l])p(β[l])

− 1

Mkβ_B[l]k². (A·10) Letq_B^new_→_A(β[l])denote an updated message ofq_B→A(β[l]) given by

q^new_B_→_A(β[l])∝exp* ,

−kβ[l]−β^new_B_→_A[l]k² v_B^new_→_A +

-

. (A·11)

Then, the messageq_B→A(β[l])is updated so as to satisfy the moment matching conditions

β_B[l]=

Pβ[l]β[l]q_A→B(β[l])q_B^new_→_A(β[l])

Pβ[l]q_A→B(β[l])q_B^new_→_A(β[l]) , (A·12) M

L−1

X

l=0

v_B[l]=

L−1

X

l=0

Pβ[l]kβ[l]k²q_A→B(β[l])q_B^new_→_A(β[l]) Pβ[l]q_A→B(β[l])q_B^new_→_A(β[l])

−

L−1

X

l=0

kβ_B[l]k². (A·13) The remaining derivation is similar to in [8], so that we omit it. The extrinsic distributionq_A→Bin conventional EP[8]was defined element-wisely since i.i.d. signals were assumed. On the other hand, (A·8) has been defined for each section because β[l] has dependent elements. Here is the main difference in the derivations of conventional and proposed EP algorithms.