Sean Murphy has the best one sentence description of DAC that I have ever read:
The emotional ambience at DAC is what you get when you pour the excitement of a high school science fair, the sense of the recurring wheel of life from the movie Groundhog Day, and the auld lang syne of a high school re-union, and hit frappe.
That perfectly describes my visit with Oasys Design Systems at DAC.
Auld Lang Syne
When I joined Synopsys in June of 1992, the company had already gone public, but still felt like a startup. Logic synthesis was going mainstream, challenging schematic entry for market dominance. ASICs (they were actually called gate arrays back then) were heading towards 50K gates capacity using 0.35 uM technology. And we were aiming to change the world by knocking off Joe Costello’s Cadence as the #1 EDA company.
As I walked through the Oasys booth at DAC, I recognized familiar faces. A former Synopsys sales manager, now a sales consultant for Oasys. A former Synopsys AE, now managing business development for Oasys. And not to be forgotten, Joe Costello, ever the Synopsys nemesis, now an Oasys board member. Even the company’s tag line “the chip synthesis company” is a takeoff on Synopsys’ original tag line “the synthesis company”. It seemed like 1992 all over again … only 17 years later.
Groundhog Day
In the movie Groundhog Day, Bill Murray portrays Phil, a smug, self-centered, yet popular TV reporter who is consigned by the spirits of Groundhog Day to relive Feb 2nd over and over. After many tries, Phil is finally able to live a “perfect day” that pleases the spirits and he is able to move on, as a better person, to Feb 3rd.
As I mentioned in a previous post, I’ve seen this movie before. In the synthesis market, there was Autologic on Groundhog Day #1. Then Ambit on Groundhod Day #2. Then Get2chip on Groundhod Day #3. Compass had a synthesis tool in there somewhere as well. (I’m sure Paul McLellan could tell me when that was.) None of these tools, some of which had significant initial performance advantages, were able to knock off Design Compiler as market leader. This Groundhog Day it’s Oasys’ turn. Will this be the day they finally “get it right”?
Science Fair
A good science fair project is part technology and part showmanship. Oasys had the showmanship with a pre-recorded 7-minute rock medley featuring “Bass ‘n’ Vocal Monster” Joe Costello, Sanjiv “Tropic Thunder” Kaul, and Paul “Van Halen” Besouw. Does anyone know if this has been posted on Youtube yet?
On the technology side, I had one main mission at the Oasys booth … to find out enough about the RealTime Designer product to make my own judgment whether it was “too good to be true”. In order to do this, I needed to get a better explanation of the algorithms working on “under-the-hood”, which I was able to get from founder Paul van Besouw.
For the demo, Paul ran on a Dell laptop with a 2.2 GHz Core Duo processor, although he claims that only 1 CPU was used. The demo design was a 1.6M instance design based on multiple instantiations of the open source Sparc T1 processor. The target technology was the open source 45nm Nangate library. Parts of the design flow ran in real time as we spoke about the tool, but unfortunately we did not run through the entire chip synthesis on his laptop in the 30 minutes I was there, so I cannot confirm the actual performance of the tool. Bummer.
Paul did describe, though, in some detail, the methods that enable their tool to achieve such fast turnaround time and high capacity. For some context, you need to go back in time to the origins and evolution of logic synthesis.
At 0.35 uM, gate delays were 80%+ of the path delay and the relatively small wire delays could be estimated accurately enough using statistical wire load models. At 0.25 uM, wire delays grew as a percentage of the path delay. The Synopsys Floorplan Manager tool allowed front-end designers to create custom wire load models from an initial floorplan. This helped maintain some accuracy for a while, but eventually was also too inaccurate. At 180 nM and 130 nM, Physical Compiler (now part of IC Compiler) came along to do actual cell placement and estimate wire lengths based on a global route. At 90 nM and 65 nM came DC-Topographic and DC-Graphical, further addressing the issues of wire delay accuracy and also layout congestion.
These approaches seem to work well, but certain drawbacks are starting to appear:
- Much of the initial logic optimization takes place prior to placement, so the real delays (now heavily dependent on placement) are not available yet.
- The capacity is limited because the logic optimization problem scales faster than order(n). Although Synopsys has come out with methods to address the turnaround time issue, such as automatic chip synthesis, these approaches amount to not much more than divide and conquer (i.e.budget and compile).
- The placement developed by the front-end synthesis tool (e.g. DC-Topographic) is not passed on to the place and route tool. As a result, once you place the design again in the place and route tool, the timing has changed.
According to Paul van Besouw, Oasys decided to take an approach they call “place first”. That is, rather than spend a lot of cycles in logic optimization before even getting to placement, they do an initial placement of the design as soon as possible so they are working with real interconnect delays from the start. Because of this approach, RealTime Designer can get to meaningful optimizations almost immediately in the first stage of optimization.
A second key strategy according to van Besouw is the RTL partitioning which chops the design up into RTL blocks that are floorplaned and placed on the chip. The partitions are fluid, sometimes splitting apart, sometimes merging with other partitions during the optimization process as the design demands. The RTL can be revisited and changed for a new structure during the optimization as well. Since the RTL partitions are higher-level than gates, the number of design objects in much fewer, leading to faster runtime with lower memory foot print according to van Besouw. Exactly how Oasys does the RTL partitioning and optimizations is the “secret sauce”, so don’t expect to hear a lot of detail.
Besides this initial RTL optimization and placement, there are 2 more phases of synthesis in which the design is further optimized and refined to a legal placement. That final placement can be taken into any place and route tool and give you better results than the starting point netlist from another tool, says van Besouw.
In summary, Oasys claims that they achieve faster turnaround time and higher capacity by using a higher level of abstraction (RTL vs. gate). They claim that they can achieve a better starting point for and timing correlation with place and route because they use actual placement from the start and feed that placement on to the place and route tool. And the better placement also runs faster because it converges faster.
What Does Harry Think?
Given the description that I got from Oasys at DAC, I am now convinced that it is “plausible” that Oasys can do what they claim. Although gory detail is still missing, the technical approach described above sounds exactly right, almost obvious when you think about it. Add to that the advantage of starting from scratch with modern coding languages and methods and not being tied to a 20 year old code base, and you can achieve quite a bit of improvement.
However, until I see the actual tool running for myself in a neutral environment on a variety of designs and able to demonstrate faster timing closure through the place and route flow, I remain a skeptic. I’m not saying it is not real, just that I need to see it.
There are several pieces of the solution that were not addressed adequately, in my opinion:
- Clock tree synthesis - How can you claim to have a netlist and placement optimized to meet timing until you have a clock tree with its unique slew and skew. CTS is not address in this solution. (To be fair, it’s not addressed directly in Design Compiler either).
- A robust interface to the backend - Oasys has no backend tools in-house, which means that the work they have done integrating with 3rd party place and route has been at customer sites, either by them or by the customer. How robust could those flows be unless they have the tools in-house (and join the respective partner programs).
- Bells and whistles - RealTime designer can support multi-voltage, but not multi-mode optimization. Support for low power design is not complete. What about UPF? CPF? All of these are important in a real flow and it is not clear what support Oasys has.
- Tapeouts - This is probably the key question. For as long as EDA has existed, tapeouts have been the gold standards by which to evaluate a tool and its adoption. When I asked Paul if there are any tapeouts to date, he said “probably”. That seems odd to me. He should know.
However, if Oasys can address these issues, this might actually be the game changer that gets us out of the Groundhog Day rut and onto a new day.
harry the ASIC guy