DAC Theme #2 – “Oasys Frappe”

Sean Murphy has the best one sentence description of DAC that I have ever read:

FrappeThe emotional ambience at DAC is what you get when you pour the excitement of a high school science fair, the sense of the recurring wheel of life from the movie Groundhog Day, and the auld lang syne of a high school re-union, and hit frappe.

That perfectly describes my visit with Oasys Design Systems at DAC.

Auld Lang Syne

When I joined Synopsys in June of 1992, the company had already gone public, but still felt like a startup. Logic synthesis was going mainstream, challenging schematic entry for market dominance. ASICs (they were actually called gate arrays back then) were heading towards 50K gates capacity using 0.35 uM technology. And we were aiming to change the world by knocking off Joe Costello‘s Cadence as the #1 EDA company.

As I walked through the Oasys booth at DAC, I recognized familiar faces. A former Synopsys sales manager, now a sales consultant for Oasys. A former Synopsys AE, now managing business development for Oasys. And not to be forgotten, Joe Costello, ever the Synopsys nemesis, now an Oasys board member. Even the company’s tag line “the chip synthesis company” is a takeoff on Synopsys’ original tag line “the synthesis company”. It seemed like 1992 all over again … only 17 years later.

Groundhog Day

In the movie Groundhog Day, Bill Murray portrays Phil, a smug, self-centered, yet popular TV reporter who is consigned by the spirits of Groundhog Day to relive Feb 2nd over and over. After many tries, Phil is finally able to live a “perfect day” that pleases the spirits and he is able to move on, as a better person, to Feb 3rd.

As I mentioned in a previous post, I’ve seen this movie before. In the synthesis market, there was Autologic on Groundhog Day #1. Then Ambit on Groundhod Day #2. Then Get2chip on Groundhod Day #3. Compass had a synthesis tool in there somewhere as well. (I’m sure Paul McLellan could tell me when that was.) None of these tools, some of which had significant initial performance advantages, were able to knock off Design Compiler as market leader. This Groundhog Day it’s Oasys’ turn. Will this be the day they finally “get it right”?

Science Fair

A good science fair project is part technology and part showmanship. Oasys had the showmanship with a pre-recorded 7-minute rock medley featuring “Bass ‘n’ Vocal Monster” Joe Costello, Sanjiv “Tropic Thunder” Kaul, and Paul “Van Halen” Besouw. Does anyone know if this has been posted on Youtube yet?

On the technology side, I had one main mission at the Oasys booth … to find out enough about the RealTime Designer product to make my own judgment whether it was “too good to be true”. In order to do this, I needed to get a better explanation of the algorithms working on “under-the-hood”, which I was able to get from founder Paul van Besouw.

For the demo, Paul ran on a Dell laptop with a 2.2 GHz Core Duo processor, although he claims that only 1 CPU was used. The demo design was a 1.6M instance design based on multiple instantiations of the open source Sparc T1 processor. The target technology was the open source 45nm Nangate library. Parts of the design flow ran in real time as we spoke about the tool, but unfortunately we did not run through the entire chip synthesis on his laptop in the 30 minutes I was there, so I cannot confirm the actual performance of the tool. Bummer.

Paul did describe, though, in some detail, the methods that enable their tool to achieve such fast turnaround time and high capacity. For some context, you need to go back in time to the origins and evolution of logic synthesis.

At 0.35 uM, gate delays were 80%+ of the path delay and the relatively small wire delays could be estimated accurately enough using statistical wire load models. At 0.25 uM, wire delays grew as a percentage of the path delay. The Synopsys Floorplan Manager tool allowed front-end designers to create custom wire load models from an initial floorplan. This helped maintain some accuracy for a while, but eventually was also too inaccurate. At 180 nM and 130 nM, Physical Compiler (now part of IC Compiler) came along to do actual cell placement and estimate wire lengths based on a global route. At 90 nM and 65 nM came DC-Topographic and DC-Graphical, further addressing the issues of wire delay accuracy and also layout congestion.

These approaches seem to work well, but certain drawbacks are starting to appear:

  1. Much of the initial logic optimization takes place prior to placement, so the real delays (now heavily dependent on placement) are not available yet.
  2. The capacity is limited because the logic optimization problem scales faster than order(n). Although Synopsys has come out with methods to address the turnaround time issue, such as automatic chip synthesis, these approaches amount to not much more than divide and conquer (i.e.budget and compile).
  3. The placement developed by the front-end synthesis tool (e.g. DC-Topographic) is not passed on to the place and route tool. As a result, once you place the design again in the place and route tool, the timing has changed.

According to Paul van Besouw, Oasys decided to take an approach they call “place first”. That is, rather than spend a lot of cycles in logic optimization before even getting to placement, they do an initial placement of the design as soon as possible so they are working with real interconnect delays from the start. Because of this approach, RealTime Designer can get to meaningful optimizations almost immediately in the first stage of optimization.

A second key strategy according to van Besouw is the RTL partitioning which chops the design up into RTL blocks that are floorplaned and placed on the chip. The partitions are fluid, sometimes splitting apart, sometimes merging with other partitions during the optimization process as the design demands. The RTL can be revisited and changed for a new structure during the optimization as well. Since the RTL partitions are higher-level than gates, the number of design objects in much fewer, leading to faster runtime with lower memory foot print according to van Besouw. Exactly how Oasys does the RTL partitioning and optimizations is the “secret sauce”, so don’t expect to hear a lot of detail.

Besides this initial RTL optimization and placement, there are 2 more phases of synthesis in which the design is further optimized and refined to a legal placement. That final placement can be taken into any place and route tool and give you better results than the starting point netlist from another tool, says van Besouw.

In summary, Oasys claims that they achieve faster turnaround time and higher capacity by using a higher level of abstraction (RTL vs. gate). They claim that they can achieve a better starting point for and timing correlation with place and route because they use actual placement from the start and feed that placement on to the place and route tool. And the better placement also runs faster because it converges faster.

What Does Harry Think?

Given the description that I got from Oasys at DAC, I am now convinced that it is “plausible” that Oasys can do what they claim. Although gory detail is still missing, the technical approach described above sounds exactly right, almost obvious when you think about it. Add to that the advantage of starting from scratch with modern coding languages and methods and not being tied to a 20 year old code base, and you can achieve quite a bit of improvement.

However, until I see the actual tool running for myself in a neutral environment on a variety of designs and able to demonstrate faster timing closure through the place and route flow, I remain a skeptic. I’m not saying it is not real, just that I need to see it.

There are several pieces of the solution that were not addressed adequately, in my opinion:

  1. Clock tree synthesis – How can you claim to have a netlist and placement optimized to meet timing until you have a clock tree with its unique slew and skew. CTS is not address in this solution. (To be fair, it’s not addressed directly in Design Compiler either).
  2. A robust interface to the backend – Oasys has no backend tools in-house, which means that the work they have done integrating with 3rd party place and route has been at customer sites, either by them or by the customer. How robust could those flows be unless they have the tools in-house (and join the respective partner programs).
  3. Bells and whistles – RealTime designer can support multi-voltage, but not multi-mode optimization. Support for low power design is not complete. What about UPF? CPF? All of these are important in a real flow and it is not clear what support Oasys has.
  4. Tapeouts – This is probably the key question. For as long as EDA has existed, tapeouts have been the gold standards by which to evaluate a tool and its adoption. When I asked Paul if there are any tapeouts to date, he said “probably”. That seems odd to me. He should know.

However, if Oasys can address these issues, this might actually be the game changer that gets us out of the Groundhog Day rut and onto a new day.

harry the ASIC guy

Tags: , , , ,

5 Responses to “DAC Theme #2 – “Oasys Frappe””

  1. […] Oasys Design: Harry Gries on “DAC Theme #2: Oasys Frappe“ […]

  2. Jeff Flieder says:

    Thanks for posting this information about Oasys, it was very informative. As you know, I’m a long time synthesis guy as well, having worked at Synopsys for many years as an AE and Consulting Manager. What you probably don’t know is that for the past 7 years I’ve been at Cadence working with RTL-Compiler and associated products. I’d like to comment on some of your statements in this posting.
    First of all, comparing RTL-Compiler to all the other “Groundhog Day” products is a completely unfair comparison. All of the other synthesis tools you mentioned are all based on the same Berkeley Algorithms as Design-Compiler and you quite rightly refer to them as “me too products”. RTL-Compiler is very different in that it is based on an entirely new set of Global Synthesis algorithms that are unrelated to Berkeley Based synthesis tools. RTL-Compiler was architected from the ground up by a group of very experienced synthesis engineers, many of which started their careers as software developers at Synopsys. They understood the shortcomings of single objective, rule based synthesis, and set out to create an entirely new type of synthesis tool. After many years of work they created RTL-Compiler, which is a multi-objective globally synthesis tool. While I don’t have the time to go into all the details of this groundbreaking technology, or it’s impact on modern chip design, suffice it to say that we consistently see better results than any other synthesis tool on the market. I would be happy to discuss this in gory detail with you (and anyone else for that matter) at your convenience.
    I would like to address some of the technology that you mentioned in your blog and explain why I believe that the RTL-Compiler approach is better than what you describe.

    1. My first comment has to do with getting to meaningful optimization almost immediately in the synthesis flow and for this I could not agree more. In the RTL-Compiler world, we have something called Physical Layout Estimators that are used to accurately estimate 80 – 90% of the wires in the design during the initial synthesis. We have found this to give the best results while leaving the estimation of the last 10-20% of wires to our RC-Spatial or RC-Physical technology. These technologies use real placement to estimate these outlying wires.
    2. The RTL partitioning idea, while interesting, seems a bit impractical in today’s world of global design teams and IP reuse. It is very unlikely that any designer has the ability to let a tool repartition the design and move RTL from one layout block to another automatically. Most of these decisions are made based on the chip level architecture and data flow and are defined very early in a project.. I would also imagine that this would make any post-synthesis verification a nightmare at best, and impossible at worst. And trying to perform an automated ECO after this would most likely be impossible.
    3. As for your comment on physical optimization after the initial synthesis, this is one place where RTL-Compiler really shines. Over the last few years we have developed significant advances in physical synthesis. Not only can we optimize based on legal placement, but we have also added the ability to modify the design in order to reduce congestion, improve timing, and reduce area and power, all based on detailed physical information. This technology also allows us to write out a full legal placement to feed forward to any P&R tool.

    So, in conclusion, I would like to say to Oasys welcome to the Logic Synthesis world. It has always been the case that more awareness of the issues surrounding synthesis are good for the tools and good for our customers. We are confident that we have the absolute best synthesis on the market and with continued research and development will continue to do so for the foreseeable future. If you would like to discuss this further, please feel free to contact me at jflieder@cadence.com. If nothing else, it would be good to catch up, it’s been a long time.

  3. harry says:

    Hi Jeff,

    Good to hear from you after all these years. I should have posted something about synthesis sooner 🙂

    Someone came up with a idea on Twitter that we really need to objectively benchmark the tools to see which one really performs the best. Just like with the BCS in college football, we really need a playoff to settle this thing on the field of play.

    If such a benchmark could be arranged, and if it were fair, would Cadence participate?

    Oasys and Synopsys: If you are reading this, would you participate?

    Moment of truth.


  4. Marc Swinnen says:

    I want to applaud Oasys for introducing new technology and trying to move synthesis forward. Taking risks and tackling the hard problems are what startups are supposed to do.

    From a technical perspective, I think you are absolutely correct with your first concern where you say it is not really possible to close timing without considering the real, propagated clock tree.

    You are also correct in pointing out that the other synthesis vendors do not optimize with the clock either.

    The best technical solution for your concern is a combination of Oasys’ product and a clock concurrent optimization tool like Rubix: Oasys does a very fast synthesis and placement to get timing into the ballpark. Rubix then does a detailed physical optimization while building the clock trees at the same time. Rubix essentially builds a skewed clock network specifically tailored to match the logic while at the same time optimizing the logic to match the clock. This offers a fundamental solution to the problem you highlight. There is really no reason to view physical synthesis and CTS as two different steps. And you can forget about clock balancing – that has become an increasingly irrelevant holdover from the past.

    Clock concurrent optimization closes design timing in propagated clock mode with full visibility of OCV derates, clock-gate timing, actual insertion delays, etc.

    I’m glad that you recognized that clocks area major missing piece in traditional timing optimization approaches.

  5. Nick says:

    While the concepts fundamental to any logic synthesis system remain the same (even if it is global synthesis algorithms) and all being derived from berkeley synthesis in one form or another, synopsys or any other CAD vendor would have made significant changes to their algorithms to accommodate growing design sizes and moore’s law and more importantly to stay ahead of competetion.

    And any innovation in synthesis has been very incremental over a long period of time with a lot of hardwork. More recently (past 10 years!) a push for physical synthesis which pushes the placement information back into the logic synthesis environment. Almost every major cad vendor has physical synthesis in the flow.

    In the present day synthesis market, jargon and marketing claims wont just cut it and only benchmarks will prove which synthesis tool is better.

Leave a Reply