Grant Moulton headed up the OC-192 hardware team for Cerent after being hired by Hui Liu in late 1998. With Dyke Shaffer as his mentor, Grant quickly readied an OC-192 prototype transceiver in early 1999. However, the optical engineers had to wait for two ASICs – the BTC192 and SXC192 – to arrive to allow them to verify module functionality. Martin Roberts and Phu Le, an unparalleled wizard of hardware coding in the digital realm, were busy writing code for these two ASICs.
Even though some within the engineering community believed no one would ever buy OC-192 capacity for the metro, such misplaced sentiments were part of the minority of doubters. Marketing projections showed the next bandwidth bottleneck would soon occur at OC-48 rates. On top of that, OC-192 capacity needs were beginning to appear in large metropolitan centers around the United States.
Even though some within the engineering community believed no one would ever buy OC-192 capacity for the metro, such misplaced sentiments were part of the minority of doubters. Marketing projections showed the next bandwidth bottleneck would soon occur at OC-48 rates. On top of that, OC-192 capacity needs were beginning to appear in large metropolitan centers around the United States.
Industry analysts, including Ryan Hankin Kent (r.h.k.) at its STARTRAX ’98 conference, showed a graph of projected SONET spending and it revealed that while capital spending on OC-3, OC-12, and OC-48 speeds would hold steady, OC-192 investment would experience average annual increases of 31 percent for the years ahead. Lucent missed the boat on this capability while Nortel cleaned up in the OC-192 segment, especially for the long-haul market. For Cerent, OC-192 orders for metro applications flooded in once this high bit rate feature on the company’s ‘454’ was announced.
But all was not rosy with Cerent’s (now Cisco’s) OC-192 development. The first hint of technical trouble occurred in January 2000 when bits coming from Phu’s OC-192 ASIC – the SXC192 – were “swizzled.” Bits were permuted or inadvertently rearranged in the data stream as the ASIC processed them. While mixing cream in your coffee is permissible, mixing up digital ones and zeroes is not. Meanwhile, Martin Roberts made the trek from England to Petaluma, and verified that the BTC192 ASIC performed correctly.
The schedule for the feature introduction allowed some wiggle room to fix this anomaly, unless, a re-spin of the SXC-192 ASIC was required. Investigations into the movement of bad bits commenced even as Cisco’s operations team balked at making this complex OC-192 board.
“This is too complicated for manufacturing,” Kevin Smith, Cisco’s operations leader, declared at the conclusion of a Feb 10, 2000 meeting held between engineering and manufacturing to discuss how the OC-192 board would be built. This contentious session marked the last time the Cerent 454 engineering team dealt with this group in San Jose. Carl Russo swooped in and became the OC-192 development team’s “concrete umbrella.” He shielded Grant, Hui, Phu, and other engineers in Petaluma from the Cisco naysayers. Carl trusted his engineering team. The vision was set: OC-192 was needed by customers, and so Carl had Tom Fallon assigned to manage the relationship with the Cisco operations folks in San Jose, with the objective of making 10G operation on the ‘454’ a reality.
Little progress was made by March 2000. Only a single useful frame of data emerged from the SXC192 ASIC. A band-aid was needed to salvage the impending roll-out disaster – a missed product introduction of the biggest ‘454’ capability to date.
Things went from bad to worse in July 2000 as the sixth revision of the OC-192 boards using an assortment of ASIC-related band-aids failed. Pressure mounted as pending customer orders worth more than $100 million had to be filled, with QWEST leading the growing backlog for 10G capability. Customers were also clamoring for the Cisco (Cerent) solution that would give them an alternative to Nortel as a 10 Gbps supplier.
By September 2000, signal loopback on the OC-192 board was achieved, but connections through the ‘454’ backplane from the BTC192 to the SXC192 failed. This meant that traffic could not move into or out of the ‘454’ chassis.
What was going on?
Phu argued the ASICs were performing.
The optical engineers countered, disagreeing with Phu and the ASIC designers.
Both groups within engineering were culpable for the stalemate. A lack of communication on two fronts produced a string of failures: physical connectivity of the 10G bandwidth was non-existent and a lack of collaboration between the ASIC and hardware teams failed to define the system level design of the OC-192 feature.
First, to get to the root of the problem, Phu was challenged over the algorithm he chose for the ASIC design. Decisions as to whether a digital bit was a “1” or a “0” were made at the edge of the data stream’s eye pattern, not at the center of the eye, as is typically done in transmission systems. This problem, as viewed by the hardware engineers, was not discovered until Phu was compelled to explain to them how he had designed the ASICs. This misstep would have been caught if an early architectural review of the 10G feature had been conducted at the outset.
Blind faith was granted to Phu by Hui and consequently, earlier attempts at sharing ASIC design methodology were spurned. After all, ASIC simulation “rules,” so the ASIC designers believed. This thinking was, in part, like the previous mindset held by Fiberlane’s early ASIC designers – if simulation works, the ASIC will work. Designers are not infallible, however, especially when they are working on a systems-based product like the ‘454.’
By December 2000, Raghu Belur, one of Cerent’s OC-48 designers, and Martin Fornage, an experienced telecom engineer, were drafted to work full-time on the OC-192 problem. They supported the contention that the ASICs were problematic and a re-spin was needed, in spite of Phu’s protestations. On top of the internal engineering battles of the “Cerent” team, other technical problems plaguing the OC-192 board were solved one by one. A frustrating voltage stability problem was solved by taming the “evil regulator” and the excessive heat produced by the board was dealt with by employing a heat sink the size of the board itself while still keeping the cooling element within the width demands of a single shelf slot.
“This is too complicated for manufacturing,” Kevin Smith, Cisco’s operations leader, declared at the conclusion of a Feb 10, 2000 meeting held between engineering and manufacturing to discuss how the OC-192 board would be built. This contentious session marked the last time the Cerent 454 engineering team dealt with this group in San Jose. Carl Russo swooped in and became the OC-192 development team’s “concrete umbrella.” He shielded Grant, Hui, Phu, and other engineers in Petaluma from the Cisco naysayers. Carl trusted his engineering team. The vision was set: OC-192 was needed by customers, and so Carl had Tom Fallon assigned to manage the relationship with the Cisco operations folks in San Jose, with the objective of making 10G operation on the ‘454’ a reality.
Little progress was made by March 2000. Only a single useful frame of data emerged from the SXC192 ASIC. A band-aid was needed to salvage the impending roll-out disaster – a missed product introduction of the biggest ‘454’ capability to date.
Things went from bad to worse in July 2000 as the sixth revision of the OC-192 boards using an assortment of ASIC-related band-aids failed. Pressure mounted as pending customer orders worth more than $100 million had to be filled, with QWEST leading the growing backlog for 10G capability. Customers were also clamoring for the Cisco (Cerent) solution that would give them an alternative to Nortel as a 10 Gbps supplier.
By September 2000, signal loopback on the OC-192 board was achieved, but connections through the ‘454’ backplane from the BTC192 to the SXC192 failed. This meant that traffic could not move into or out of the ‘454’ chassis.
What was going on?
Phu argued the ASICs were performing.
The optical engineers countered, disagreeing with Phu and the ASIC designers.
Both groups within engineering were culpable for the stalemate. A lack of communication on two fronts produced a string of failures: physical connectivity of the 10G bandwidth was non-existent and a lack of collaboration between the ASIC and hardware teams failed to define the system level design of the OC-192 feature.
First, to get to the root of the problem, Phu was challenged over the algorithm he chose for the ASIC design. Decisions as to whether a digital bit was a “1” or a “0” were made at the edge of the data stream’s eye pattern, not at the center of the eye, as is typically done in transmission systems. This problem, as viewed by the hardware engineers, was not discovered until Phu was compelled to explain to them how he had designed the ASICs. This misstep would have been caught if an early architectural review of the 10G feature had been conducted at the outset.
Blind faith was granted to Phu by Hui and consequently, earlier attempts at sharing ASIC design methodology were spurned. After all, ASIC simulation “rules,” so the ASIC designers believed. This thinking was, in part, like the previous mindset held by Fiberlane’s early ASIC designers – if simulation works, the ASIC will work. Designers are not infallible, however, especially when they are working on a systems-based product like the ‘454.’
By December 2000, Raghu Belur, one of Cerent’s OC-48 designers, and Martin Fornage, an experienced telecom engineer, were drafted to work full-time on the OC-192 problem. They supported the contention that the ASICs were problematic and a re-spin was needed, in spite of Phu’s protestations. On top of the internal engineering battles of the “Cerent” team, other technical problems plaguing the OC-192 board were solved one by one. A frustrating voltage stability problem was solved by taming the “evil regulator” and the excessive heat produced by the board was dealt with by employing a heat sink the size of the board itself while still keeping the cooling element within the width demands of a single shelf slot.
Jeff Hamilton-Gahart, soon became an integral part of the OC-192 team, hired to develop the means to pull the heat out of the “hot ASICs” as well as find a way to route the fiber around the board without bending it too tightly. His heatsink was part of the controversy with Cisco’s operations team, who believed it was much too hard for them to build as designed. The heatsink was “actually a brilliant mechanical design able to solve all of the thermal problems faced."
A major step forward occurred with the introduction of the BackPlane Interface ASIC (BPIA), a third (and new) ASIC placed between the BTC192 and the SXC192 ASICs. BPIA mitigated the noise of the other two chips and steadied the jitter of the high-speed data streams in order for an eye pattern to be characterized. Probe points were added to the latest iteration of the OC-192 board, a sign that prudent measurements replaced blind faith to validate the expectations of designers. On April 27, 2001, the first “awful” eye patterns were detected and by September 2001, the ultimate band-aid was applied. A second BPIA appeared and a second ‘454’ backplane was introduced to support the product’s 10G capability. OC-192 came to market almost two years after it had been announced.
The 18-month delay hurt Cisco’s credibility, cost the company more than $100 million in missed revenue, and levied over $22 million in costs for scrapped OC-192 boards. Regardless of the setbacks in cost and time, not a single ‘454’ customer was lost to a competitor. Customer faith in the product remained strong during 2001 and 2002, in spite of the dot.com bust and the telecom meltdown.
Chalk it up to lessons learned; lessons made possible by the financial strength of Cisco to stick with Carl Russo, Tom Fallon, and its Cerent-acquired team members.
As a startup, this miscue on OC-192 would have likely sunk Cerent as a private company. Fortunately the Cisco cocoon allowed the Cerent 454 to support its OC-192 capability. Phu and Grant persevered through thick and thin, and ultimately, with supportive colleagues in Petaluma, adopted the BPIA solution. In terms of achieving the fastest time to market, Grant says, “It was the correct thing to do.”
The 10 Gbps capability quickly became a big hit for Cisco and it spurred on the optical development that made wavelength division multiplexing in the metropolitan network mainstream.
Chalk it up to lessons learned; lessons made possible by the financial strength of Cisco to stick with Carl Russo, Tom Fallon, and its Cerent-acquired team members.
As a startup, this miscue on OC-192 would have likely sunk Cerent as a private company. Fortunately the Cisco cocoon allowed the Cerent 454 to support its OC-192 capability. Phu and Grant persevered through thick and thin, and ultimately, with supportive colleagues in Petaluma, adopted the BPIA solution. In terms of achieving the fastest time to market, Grant says, “It was the correct thing to do.”
The 10 Gbps capability quickly became a big hit for Cisco and it spurred on the optical development that made wavelength division multiplexing in the metropolitan network mainstream.