When I was in college, like most CS type majors, 1 I took the algorithms class. Sorting, binary trees, O(n) notation… all of that.
One of the things I didn’t understand is why such a big deal was made about some algorithms like the graph algorithms. Why would I want a minimum spanning tree of a million nodes?
A bunch of years ago, I did need to compute the MST of a couple hundred thousand nodes for my job. Later on, I needed to do it three dimensionally. Something made me think of it recently, and I have a hankering to write about it.
I’ve also recorded a series of videos to explain the programming aspects:
What problem was I trying to solve?
At the time, I was working on code to generate a clock buffer tree for large seas of logic gates. Say you have a million gates in your logic design and that 10% of those are clocked elements.2. So we have 100k logic gates that need a clock signal.
So, clock buffers can’t drive the capacitance of a 100k gates, at least not if you want the design to run at more than 1 cycle per second. The solution is a clock tree. The input clock drives 5 buffers, which each drive 5 more buffers,… until everyone has their clock.
How do you decide which buffer drives which cell? That’s where the MST comes in. Once I had computed the MST of all of the clocked elements, I picked a random end (of the MST) node 3 and collapsed it into it’s adjacent node. When a collapsed node passes a capacitance threshold (also need to include wiring capacitance), that’s where I add a buffer. Keep doing this until you have your first level of buffers, then recursively do the same up the tree.
But some cells are more timing critical than others.
In a design of 1M gates and 100k clocked elements, most of those gates will easily meet timing, no matter how badly you design things. Most. If memory serves me, we’re talking 90% (99%?) of the design meets timing, easy peasy.
So in those 100k gates, some are timing critical, many are not. An easy solution is to have two subtrees. One for the easy stuff and the other for the hard. The problem there is that you can have one little cell surrounded by critical stuff. Do I really need to segregate it? How can I allow the user to control the thresholds for such things?
The solution I wanted to implement, but never did (though I’m convinced this will work) is to turn the problem into a 3D MST problem. The X and Y axes are the normal X/Y of the design. The third axis represents timing. I simply need to devise a timing to distance coefficient.
MST basics
First off, the MST algorithm is pretty easy. To quote wikipedia.org:
Initially, T contains an arbitrary vertex. In each step, T is augmented with a least-weight edge (x,y) such that x is in T and y is not yet in T. By the Cut property, all edges added to T are in the MST. Its run-time is either O(m log n) or O(m + n log n), depending on the data-structures used.
Sounds pretty easy. The problem is you need a graph to operate on. I only had a set of points. The easy solution is to create a complete graph. All nodes are connected to all other nodes. O(n^2) in the number of nodes. 100k*100k can take up some space.
Triangulation to the rescue
Again, I refer to wikipedia.org:
every edge not in a Delaunay triangulation is also not in any EMST
I don’t really understand the math behind it but first compute a triangulation of your points and run MST on that.
The CGAL library will give you the Delaunay triangulation.
The BOOST Graph Library will give you the MST.
The problem is that these libraries were difficult for me to really understand. Now that I’ve spent some time with them, they’re actually pretty simple. I’ve created some videos that perhaps will help you also see them as simple.
Visualization with paraview
First, it’s important to be able to see that data. I’ve found that paraview works well and is easy to use (once you know how to use it)
CGAL how to
Boost Graph library how to
My major was basically computer science except it was in the math department. Instead of the normal 150 students per year, there were 18 of us doing the Math with CS variant. ↩
The design size was actually smaller. 100k-500k. The percentage of clocked elements was higher. I’m going by memory, but I think the final number of nodes is in the same ballpark. ↩
I think I actually picked the bottom right one, but it doesn’t really make a difference. The main benefit to the bottom right is that you get the same or similar results from one run to the next. ↩