Simplified Carrier Recovery for Intradyne Optical PSK Receivers in udWDM-PON

Jeison Tabares, Saeed Ghasemi, Víctor Polo, and Josep Prat, Member, IEEE

Abstract—We present an optimized carrier recovery architecture based on differential detection for coherent optical receivers that substantially reduces the required DSP hardware resources, aimed to cost-effective transceivers for access networks applications. The proposed architecture shares the 1-symbol complex correlation required for differential phase detection within both the frequency estimation and the phase recovery blocks of the receiver DSP, thus lowering the energy consumption of the digital coherent receiver and increasing the tolerance against fast wavelength drifts of the lasers. We prototyped the proposed carrier recovery in a commercial FPGA for real-time evaluation with DPSK data at 1.25 Gb/s. The optical transmission system implemented direct-phase modulation of commercial DFB lasers, 25 km of single-mode fiber, and a coherent intradyne receiver with low-cost optical front-end based on 3x3 coupler and three photodiodes providing phase-diversity operation. Results show high performance in real-time for DPSK, achieving -55 dBm sensitivity at BER = 10^-3 in a 6.25 GHz spaced ultra-dense WDM grid, high tolerance to optical phase noise, and enhanced mitigation of the fast wavelength drifts from lasers enabled by feed-forward DSP correction and feed-back LO automatic tuning.

Index Terms—Carrier recovery, differential optical coherent detection, digital signal processing, frequency estimation, phase shift keying.

I. INTRODUCTION

The recent developments on fast and reconfigurable transceivers emerge as a solution for the continuous growing of the connectivity demand. In this regard, the efficient hardware implementation of digital signal processing (DSP) techniques by using field-programmable gate array (FPGA) has demonstrated big potential, in terms of capacity and performance, for deployment of the future optical access networks [1], [2]. Although more complex compared with the analog counterpart mostly due to the need of high speed A/D data converters for mapping into the digital domain, the digital implementation enables effective mitigation of transmission impairments, as well as fast reconfiguration allowing for different modulation formats and transmission rates, valuable characteristic for the new concept of flexible optical networks with software-defined transceivers [3].

A highly important premise when implementing the access network is to maintain high the performance but lowering the cost, which lies on the final users. This translates into reducing the energy consumption and complexity of the optical network units (ONUs) at most. To fulfill the requirements of the access networks, efforts in research are being taken to develop the Gigabit-to-the-user concept towards user bandwidths in excess of 1 Gb/s, but enhancing the spectral efficiency by allocating hundreds of wavelengths in ultra-dense grid configuration [4]-[6]. For instance, the research involved in [7] aimed to implement a cost-effective ultra-dense wavelength division multiplexing passive optical network (udWDM-PON) exploiting coherent detection with user terminals based on commercial low-cost devices and low-complexity techniques. Fig. 1 depicts the architecture of the udWDM-PON, consisting of a standard tree-based PON with dedicated wavelength (λ) per user in separated bands for the upstream (US) and the downstream (DS). The channel spacing in the ultra-dense WDM grid, denoted by Δλ, is as close as 6.25 GHz for symmetric 1.25 Gb/s per wavelength, serving up to 256 users, in contrast with the 32/64 users of typical PONs. The high sensitivity of the coherent receivers enables the large power splitting without the need of optical amplification, maintaining the optical distribution network totally passive and compatible with the standards for PONs. The use of directly modulated lasers instead of external modulators contributes to reduce the form factor and power consumption of the transceivers.

Since the udWDM-PON is filterless, channel selection is done by wavelength tuning of the local oscillator (LO) laser at the coherent receiver (Rx) front-end. Therefore, an accurate LO wavelength control is required for correct data detection and channel stability in such a closely-spaced udWDM grid. The LO λ-control is driven by the carrier recovery (CR) subsystem of the Rx DSP (Fig. 1, highlighted in red), in which this article focuses on.

One of the key points of the Rx DSP in [7] was the use of differential encoding and detection for optical phase recovery (PR), due to its straightforward implementation and high robustness against the phase noise of low-cost lasers. Although synchronous detection with phase estimation algorithms overcomes the differential phase detection in high data-rates scenarios, for low transmission rates the performance of differential detection is comparable to that of the synchronous detection, but with a remarkable lower implementation.
complexity [8]. Moreover, differential encoding is widely applied and recommended, even for synchronous detection instead of differential detection, to avoid the phase-noise induced cycle-slips that produce phase ambiguity and error propagation [9]. Alternatively, data-aided DSP (including PR) could be used to avoid the effect of cycle slips by transmitting pilot symbols for carrier synchronization and further channel equalization [10].

In this article, we propose a simpler CR architecture based on differential detection for optical PSK signals, that shares the 1-symbol complex correlation needed for both the frequency estimation (FE) and the PR blocks of the Rx DSP, relaxing the required hardware resources. This optimizes the parallel hardware prototyping in the FPGA and reduces the overall process delay of the DSP, enhancing the performance of the proposed CR against the fast wavelength drifts of lasers. We evaluate our proposal in a real-time experiment with a digital coherent Rx implemented in a commercial FPGA, an optical front-end based on low-cost 3x3 coupler, distributed feed-back (DFB) lasers, and direct DPSK modulation at 1.25 Gb/s. The operating principle of the proposed method and the first experimental results are reported in our previous work [11].

The article is organized as follows. Section II presents the proposed CR architecture. The FPGA-based coherent Rx and the experimental setup are introduced in Section III, followed by the experimental results in Section IV which include real-time transmission tests.

II. CARRIER RECOVERY ARCHITECTURE

Conventional CR for digital PSK receivers comprises FE followed by PR in separated DSP blocks. Several schemes are suitable for FE, like the differential $m^{th}$-power estimator [12], the IQ correlation-based estimator [13], and the pre-decision angle estimator [14]. On the other hand, PR can be achieved by the well-known Viterbi & Viterbi phase estimator [15], or by differential demodulation [16], for instance.

This work focuses on the differential detection schemes for CR. We first selected the differential $m^{th}$-power estimator for FE, and the differential demodulator for PR. The complete CR architecture is depicted in Fig. 2(a), where $m$ is the number of constellation points of the PSK modulation (e.g., $m = 2$ for DPSK), $N$ is the block length for averaging, $k$ is the symbol index inside the estimation block, $D$ is the process delay of the FE algorithm, and $(\cdot)^*$ stands for complex conjugate. In this CR scheme, the phase shift $\Delta \phi$ induced by the frequency detuning between Tx and LO lasers is estimated from each sample of the received signal $r[n] = I[n] + jQ[n]$, then cancelled by the phase rotator $e^{-j\Delta n}$ before PR. Afterwards, the corrected signal $r'[n]$ is differentially demodulated by a 1-symbol complex correlation (i.e., 1-symbol delay-and-multiply) to recover the phase-encoded data $d[n]$. Note that both FE and PR stages implement a complex correlation with different signal delay to extract the phase difference between samples $(z^{-1})$ in FE, or between symbols $(z^{-P})$ in PR, where $P = R_s / f_b$ represents the oversampling factor defined as the ratio between the sample rate $(R_s)$ and the symbol rate $(f_b)$. Usually, $R_s \geq 2 f_b$ at the analog-to-digital converter (ADC) for mapping into the digital domain, to satisfy the Nyquist sampling criterion.

Some FE algorithms based on differential detection were specifically designed to estimate the frequency error between samples, requiring a delay of signal for the correlation shorter than the symbol duration, to avoid the influence of phase-modulated data (e.g., [13]). However, the differential $m^{th}$-power FE algorithm in this work removes the PSK data by rising to the power of $m$, and the frequency error can be correctly estimated either between samples or between symbols [12]. Therefore, if clock (CLK) recovery at the Rx DSP is performed before CR (i.e., the in-phase ($I$) and quadrature ($Q$) signals going to CR are already at one sample per symbol), then $P = 1$ and both FE and PR blocks in Fig. 2(a) have the same 1-symbol complex correlation for differential phase detection; otherwise, for unrecovered CLK ($P \geq 2$) the signal delays for correlation are different.

As a consequence, for the case of an already recovered CLK ($P = 1$), we can modify the architecture in Fig. 2(a) to now calculate the 1-symbol correlation for PR only once per symbol, then reuse it for the next feed-forward FE, as represented in Fig.
Fig. 2. (a) Conventional CR based on differential $m^{th}$-power for FE and differential demodulation for PR. (b) Proposed CR that optimizes the architecture by reusing the 1-symbol complex correlation required for both the FE and the PR. (c) Evolution of the constellation for DPSK data along the proposed CR.

TABLE I

<table>
<thead>
<tr>
<th>CR architecture</th>
<th>Multipliers</th>
<th>Adders</th>
<th>Process delay [symbols]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional</td>
<td>$16 \times M$</td>
<td>$13 \times M - 2$</td>
<td>$4 \times M$</td>
</tr>
<tr>
<td>Proposed</td>
<td>$12 \times M$</td>
<td>$10 \times M - 2$</td>
<td>$3 \times M$</td>
</tr>
<tr>
<td>Reduction</td>
<td>25%</td>
<td>23%</td>
<td>25%</td>
</tr>
</tbody>
</table>

To evaluate the proposed CR, a real-time experiment was conducted with DPSK data at 1.25 Gb/s. The experimental setup is depicted in Fig. 3. The transmitter (Tx) implemented a DFB laser ($\lambda = 1550$ nm) with spectral linewidth $\Delta \nu = 4$ MHz, emitting at 0 dBm, that was properly equalized by a high-pass RC network for direct 0-180° PSK modulation [4]. Data consisted of non-return-to-zero pseudo random binary sequences (NRZ-PRBS) from a pulse-pattern generator (PPG) running at 1.25 Gb/s. Since PRBS data were used, differential encoding can be assumed at the Tx.

After optical modulation, the signal was transmitted through 25 km of single mode fiber (SMF). A variable optical attenuator (VOA) emulated for the optical distribution network losses. This experiment only considers one state of polarization, manually adjusted at the input of the Rx.

The coherent Rx included a free-running LO to operate in intradyne regime. The LO was another DFB laser with $\Delta \nu = 4$ MHz, emitting at 3 dBm, and thermally-tunable for automatic frequency control (AFC) from the DSP feed-back. The optical front-end was based on 3x3 coupler instead of the widespread 90° optical hybrid, to beat the incoming optical signal with the LO. This allows for phase-diversity with only three photodiodes instead of four. Next, the three photodetected signals were linearly combined in passive hardware, according to the transfer matrix pointed in [19], to recover the $I$ and $Q$ signals. Although $IQ$ recovery can be performed by the DSP, this analog pre-processing saves one ADC channel, whose cost increases with its sample rate. The aforementioned front-end was integrated in an FR4 printed-circuit-board substrate, with commercial TO-CAN packaged PIN photodetectors including trans-impedance amplifier (TIA), as shown in Fig. 3 (inset).

Both orthogonal $IQ$ signals were low-pass filtered by two standard 4$th$-order Bessel filters with 1 GHz cut-off frequency,
for antialiasing and noise suppression, then mapped into the digital domain by two ADC channels sampling at 2.5 GSa/s.

All the subsequent DSP was carried out by an ML605 Xilinx Virtex-6 FPGA with 8-bit architecture. Its process clock was set to 156.25 MHz for parallel processing of the real-time 1.25 Gb/s data streaming. The main DSP subsystems consisted of well-known techniques for digital coherent receivers [20], [21]. First, the deskew block cross-correlated the I and Q signals to compensate for the temporal delays related to non-symmetrical paths in the Rx front-end. The IQ skew is estimated when the receiver lights up, then applied to the real-time data streaming. Next, the CLK recovery block downsamples each I/Q signal, converting from two to one sample per symbol. The implemented algorithm was derived from the early-late gate synchronizer, which takes advantage of an interpolator to find the optimal sampling point by looking at the maximum eye diagram aperture. The interpolation was carried out by a Farrow structure with two 3-tap FIR filters performing piecewise parabolic interpolation [22].

In the CR block, both architectures analyzed in Section II were prototyped. As previously discussed, the estimated phase shift \( \Delta \phi \) due to the frequency error is also fed-back towards the LO for automatic tuning. To make the feed-back loop, the estimated \( \Delta \phi \) controlled the duty-cycle of a 20 kHz square wave to obtain a pulse-width modulation (PWM), which was later filtered in the analog domain by a low-pass RC network with a time constant of 10 ms. It becomes a simple 1-bit digital-to-analog converter (DAC) for continuous thermal tuning of the DFB LO.

Finally, binary data were extracted by comparing with a decision threshold, and the real-time error rate was measured by a bit-error-rate tester (BERT).

IV. RESULTS

Initially, the optical transmission system comprising the DPSK Tx based on direct-DFB modulation and the digital Rx prototyped in the FPGA with both CR algorithms, was lighted up, achieving error-free transmission of DPSK data at 1.25 Gb/s in real-time. Next, the CR algorithm was optimized by evaluating the optimal block length \( (N) \) for averaging. The received optical power was adjusted to -53 dBm to provide a bit error rate (BER) of \( 10^{-4} \). The LO detuning was first set to an arbitrary value of 300 MHz, and the estimation of the frequency error \( \Delta f \) was evaluated 100 times for each value of \( N \). Results are plotted in Fig. 4, in terms of the root mean square (RMS) of the estimated \( \Delta f \) as a function of the block length \( N \) in symbols, for two different PRBS data. As observed, the estimation converges to the real LO detuning (300 MHz) for \( N \) larger than 200 symbols. In this work, we thus selected \( N = 2^8 \) in agreement with the test. The same optimal \( N \) was observed for both PRBS data, as well as for both CR architectures.

Afterwards, the system was evaluated in terms of sensitivity and tolerance to the phase noise from lasers, related with the total spectral linewidth \( \Delta \nu \). For the test, three different lasers were used as LO: an external cavity laser (ECL) with narrow linewidth of 100 kHz, and two commercial DFB lasers with 4 MHz and 15 MHz linewidth respectively. Hence, the total spectral linewidth \( \Delta \nu \), including the 4 MHz linewidth of the DFB Tx, ranged from 4 MHz to 19 MHz. Results in Fig. 5 show that, for a forward error correction (FEC) threshold of BER = \( 10^{-3} \), a high sensitivity of -55 dBm was achieved, with no apparent penalty for \( \Delta \nu = 4 \) MHz and \( \Delta \nu = 8 \) MHz. In the case of \( \Delta \nu = 19 \) MHz, the sensitivity penalty at FEC level was about 4 dB, exhibiting error-floor close to BER = \( 10^{-5} \). Both CR architectures show similar performance. We remark that the achieved sensitivity of -55 dBm, operating at a ratio \( \Delta \nu / R_b = 6.4 \cdot 10^{-3} \) with total spectral linewidth \( \Delta \nu = 8 \) MHz, is the highest reported for a real-time experiment with optical coherent
Fig. 5. BER versus received power for DPSK at 1.25 Gb/s, for the case of proposed (prop.) and conventional (conv.) CR, with different total spectral linewidth $\Delta \nu$.

In the next test, the frequency error estimation and correction was assessed by sweeping the optical frequency of the LO operating in open loop (without LO feed-back for automatic tuning). Fig. 7 shows the phase-shift $\Delta \varphi$ induced by the LO detuning and estimated by the FPGA within the $\pm 1$ GHz range. Larger LO detuning values led to failure of the CLK recovery algorithm and synchronization loss, then we restricted the estimation to $\pm 1$ GHz range. As expected, the estimated phase exhibits cycle slips when the value goes beyond $\pm \pi / m$ (or $\pm \pi / 2$ for DPSK), the theoretical limit due to the $m^{th}$-power in FE. By implementing a simple phase unwrap algorithm [23], the estimated phase is completely linear over all the detuning. In addition, Fig. 7 (right axis) also plots the analog signal measured at the output of the 1-bit DAC, going towards the LO for AFC.

The BER curves in Fig. 8 show the performance of CR without FE (i.e., only differential demodulation), and CR with the conventional and proposed algorithms, for two different values of total spectral linewidth. The received power was adjusted to obtain a reference BER $= 10^{-4}$. As observed, without FE the BER is highly degraded by the frequency detuning, with about $\pm 60$ MHz tolerance for 1 dB sensitivity penalty. On the other hand, FE can effectively correct the frequency error up to $\sim \pm 400$ MHz for 1 dB penalty even in presence of strong phase noise, since there is no difference in the test for 4 MHz and 19 MHz total linewidth. Both CR architectures perform the same. Note that the reference BER $= 10^{-4}$ is roughly constant within $\sim \pm 300$ MHz detuning, but suffers degradation for larger detuning values. This behavior is not related to incorrect estimation of the frequency error, which is linear for all the range (see Fig. 7), but due to the bandwidth of the Rx that was adjusted to be 1 GHz for the 1.25 Gb/s DPSK data. Then, for large LO detuning the received spectrum falls beyond the Rx bandwidth, producing penalty at the detection.

At this point, we can see the benefit of simultaneous strategies to correct the LO frequency detuning: the feed-forward DSP corrects the remaining frequency error from each symbol in real-time, whereas the feedback to the LO continuously adjusts its optical frequency by thermal tuning to maintain the photodetected spectrum in base-band and matched with the electrical filtering of the Rx, that rejects out-band noise and optimizes the detection. Although thermal tuning of commercial DFB lasers does not provide fast frequency drifts, with time constants in the order of seconds, it suffices for the detection, to the best of the author’s knowledge. It validates the high performance and robustness of differential detection despite its simplicity, and makes the proposed CR feasible for commercial low-cost DFB lasers.
coarse correction of the LO detuning, taking into account that the rate of the frequency drifts from free-running DFB lasers due to environmental changes and laser fluctuations, as characterized in [18], is below 3.6 MHz/s.

Next test aimed to evaluate how fast the CR algorithm, prototyped in the FPGA, can effectively estimate and correct the frequency detuning. In this regard, a triangle current waveform was applied to the bias current of the DFB Tx to produce an optical frequency dithering. The amplitude of the dithering was set to ±250 MHz with respect to its central frequency, to match it to the constant BER region of Fig. 8, and the frequency of the dithering was varied to determine the maximum tolerance. Fig. 9 shows the electrical spectra after photodetection for DPSK data at 1.25 Gb/s without dithering at the Tx (upper), and with ±250 MHz dithering amplitude (lower). Results plotted in Fig. 10, in terms of the BER as a function of the frequency of the optical dithering, show that the conventional CR can tolerate up to 70 kHz dithering frequency for 1 dB sensitivity penalty, whereas our proposed CR can tolerate up to 350 kHz due to the substantial reduction in the number of operations and process delay of the CR algorithm, as analyzed in Table I, resulting in faster tracking of the frequency error. Note that for these high frequencies of the optical dithering the LO feed-back did not contribute anymore due to the limited speed of the thermal laser tuning. Thus, the frequency error correction was completely done by the DSP.

As a final test, the required channel spacing in the udWDM grid was assessed. The achieved high sensitivity of the coherent receiver (-55 dBm) enables the power splitting for a large number of users, as large as 256, equivalent to $10 \cdot \log_{10} \left( \frac{1}{256} \right) = -24$ dB splitting losses. For the test, a second user (hereafter User 2) with another direct-DPSK modulated laser at 1.25 Gb/s with uncorrelated data was placed. The received power of User 1 was adjusted to obtain $BER = 10^{-4}$ for the case of single user, then the optical frequency of User 2 was swept ±10 GHz with respect to User 1 to compute the power penalty at the reference BER. Fig. 11 shows the electrical spectra after photodetection for two considered scenarios: both users emitting at the same optical power (upper), and User 1 emitting 15 dB lower than the interferer User 2 (lower). The latter emulated for the maximum allowed differential optical path loss, as specified in the ITU-T standard for NG-PON2 [24]. The signal-to-interference ratio (SIR) denotes the optical
power ratio between User 1 and the interferer User 2. Results in Fig. 12, in terms of the sensitivity penalty at $BER = 10^{-4}$ as a function of the spectral separation between users, indicate that for a maximum sensitivity penalty of 0.5 dB, the 6.25 GHz channel spacing can be implemented in the udWDM-PON even for 15 dB differential link loss with adjacent channels. We also remark that the results in Fig. 12 show the impact of a single adjacent channel interfering the measured channel; however, the conclusions for the minimum channel separation at 0.5 dB penalty can be extended to the complete udWDM scenario, where the users are interfered by two adjacent channels at each side, by adding $\sim 0.6$ dB extra penalty yielding to 1.1 dB total power penalty at $BER = 10^{-4}$. These results were obtained from direct laser modulation and band limited electronics influencing on the spectral width of the DPSK data. Further DSP at the transmitter for spectral shaping may lower the required channel spacing.

Although not experimentally tested in this work, the proposed CR also allows for detection of higher differential modulation formats, like DQPSK or 8-PSK, to increase the spectral efficiency of the PON, which have already been demonstrated with direct-phase modulation and differential detection [25]. Hence, a software-defined transceiver scenario can be envisioned with this CR architecture, where the modulation format can be simply selected by changing the value of $m$ “on the fly”.

V. CONCLUSION

A simplified CR architecture based on differential detection for optical PSK intradyne receivers, has been proposed and successfully tested in real-time. The proposed method shares the 1-symbol complex correlation required for both the FE and the PR stages of the CR, thus lowering the required hardware resources, the power consumption, and the process delay of the whole CR algorithm. Notably, it increases by a factor of five the tolerance against fast wavelength drifts of the lasers compared with the conventional CR architecture.

The proposed CR was assessed in real-time with DPSK data at 1.25 Gb/s. Results show high sensitivity and tolerance to the phase noise, as well as robustness against the wavelength drifts of lasers, owing to simultaneous feed-forward DSP and feedback LO tuning strategies for frequency error correction. The achieved sensitivity of $-55$ dBm in a 6.25 GHz spaced udWDM grid, with commercial DFB lasers, direct phase modulation, and low-cost optical front-end, demonstrates that a cost-effective PON can be implemented with user terminals based on low-cost devices, low-speed ADCs and low-complexity DSP techniques, but still achieving high performance.

Furthermore, the proposed CR allows for fast reconfiguration to potentially detect multilevel modulation formats with the same digital receiver, intended for the future flexible optical networks with software-defined transceivers.

REFERENCES


