Japan Advanced Institute of Science and Technology
JAIST Repository
https://dspace.jaist.ac.jp/
Title 極微細LSIのタイミング設計 : Timing Issues in Nanotechnology LSI
Author(s) 金子, 峰雄 Citation
Issue Date 2007-03-07 Type Presentation Text version publisher
URL http://hdl.handle.net/10119/8303 Rights
Description
4th VERITE : JAIST/TRUST-AIST/CVS joint workshop on VERIfication TEchnologyでの発表資料, 開催 :2007年3月6日∼3月7日, 開催場所:北陸先端科学技 術大学院大学・知識講義棟2階中講義室
Timing Issues in Nanotechnology LSI
金子峰雄
北陸先端科学技術大学院大学 情報科学研究科
極微細LSIのタイミング設計
Mineo Kaneko
Minimum width 60nm
VLSI in the Year 2007
1cm 1cm Clock frequency 3GHz 0.33ns 1mm 167m 167m
Arranging 1mm φ wire in Baseball ground
] m [ 167 ] mm [ 1 ] cm [ 1 ] nm [ 60 = Light propagates 10cm in 0.33ns. ] m [ 1 . 0 ]) s [ 10 33 . 0 ( ]) m [ 10 3 ( 8 9 = × × × −
Moore's Law (Gordon E. Moore)
Complexity grows double in every 18-24 months
"Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965
History of IC = History of Shrinking
Shrink → High Space-Density → More Transistors in a chip Shrink → Improved Tr. performance → High Speed IC Further shrinking
→ Large propagation delay,
Inaccuracy in delay estimation,
Electrical Aspect of VLSI
MOS Transistor
L
W
dox
Current I, Voltage V
,
Capacitance CgL W d C V L W d I ox g ox ⋅ ⋅ ∝ ⋅ ⋅ ⋅ ∝ − − − 1 2 1 1
Switching delay; delay
1 2 2 1 1 1 − − − − ⋅ = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = ⋅ ∝ L V V L W d V L W d I V C delay ox ox g Wire L H W dOX Resistance Rw,Capacitance Cw 1 1 1 − − − ⋅ ⋅ ∝ ⋅ ⋅ ∝ OX W W d L W C W H L R I I OX W W W d W H L W R C I V C delay ⋅ ⋅ ⋅ = ⋅ = ⋅ ∝ 2
L1, W1, dox1 L2, W2, dox2
Lr, Wr, Hr, dOXr
Delay = Switching delay + Propagation delay
Elmore delay
(First Moment Model)
Difficulty in delay estimation: Need higher-order model
Various parasitic effects
Static/Dynamic delay fluctuation
Static/Dynamic delay fluctuation: Fluctuations of chemical density and
physical size in the fabrication process Noise on supply voltage
Cross-talk noise
(
)
Electrical Aspect of VLSI
⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + + ⋅ + = − − − − − 2 2 1 2 1 1 1 1 1 2 2 1 2 1 2 d W L L W d H W L V W d L L W d L W d delay OXr r r ox r r r ox ox r r OXr κ λ
Propagation delay is not improved by shrinking
Mask pattern Specification
x y
Physical-Level Design Transistor circuit Module generation Mask pattern design
Top-down Hierarchical Design of VLSI
Data flow Control flow Algorithm-Level Design Two-stage, Multi-stage State assignment Technology mapping Logic-Level Design Register Transfer-Level Design Scheduling Binding
Bus, multiplexer, net
t
System-Level Design HW/SW partitioning Memory architecture
Timing issues in Data-path
Execution of c = a + b R1 FU1 R2 R3 Register Functional unit Register R1 Input of FU Register R3 Output of FUControl signal to latch a, b
time
Functional delay + Propagation delay
arrival of the result c
Control signal to latch c
Timing of control signals ( , ) determines data-path behavior
・ Many operations share the same FU, many data share the same register. ・ Various different delay values
Register R1 Input of FU Register R3 Output of FU time Functional delay + Propagation delay
ー Synchronous Systemー
Clock signal・ Design needs ``Delay estimation'' + ``Timing margin'' ・ Easy to implement as a circuit
・ Worst-case estimation + Sufficient margin = Low performance
Arrival of the result R1 FU1 R2 R3 Register Functional unit
Register R1
Register R3 Output of FU
Input of FU
time Detect the arrival of the result c
ー Asynchronous System ー
Send out latch control signal
Controller
・ No delay-estimation, no timing-margin
・ Tolerance to a large range of delay fluctuation
・ Large area (circuit) overhead in detecting-circuitry
R1 FU1 R2 R3 Register Functional unit
Timing issues in Data-path
Functional delay + Propagation delay
R1 R2 R3 FU1 FU2 Data-path part +Control part Resource Binding;
assigns each operation to one of available functional units, and assigns each data to one of available registers
Scheduling;
determines the start time of each operation
What is High-Level Synthesis?
u v w y dx du x dv dw FU1 FU2 R1 R2 R3 time
u
w
v
y
x
Behavioral description of an application algorithmLayout (Floorplan) Connection Delay information Timing constraints
Wire-delay aware high-level synthesis: Scheduling Resource binding
Constraints on parallel execution
Constraints on resource sharing
High-level synthesis under FU-delay dominant situation
A. New Approach to High-Level Synthesis
Scheduling-Centric Scheduling Binding Layout Delay Extract Evaluation Binding-Centric Binding Layout Delay Extract Scheduling Evaluation 3D-Approach Binding Layout Scheduling Evaluation Delay ExtractA. 3D-Approach High-Level Synthesis
Execution Reconfiguration of a FU Data lifetime Reconfiguration of a registerComputation algorithm to be implemented (Dependence Graph)
O, D
:Set of operations,and set of dataA
: Dependencye
:O
→N
; Operation delaySizes of functional units and registers
x
y
t
( ) i x i w p + ( )i px ( )i py ( ) i y i h p + σ( )i ( )i +e σ( ) ( ) ( )
(
)
( )
i N p( )
i N( )
i N p i i p i p y x y x ∈ ∈ ∈ σ σ , , , , data operation/ each ForNeed to check conflicts
Need to check timing constraints
(
O D A)
DG = , ,
Naive solution space N3(O+ D )
Execution or data lifetime
x
y
t
( ) i x i w p + ( )i px ( )i py ( ) i y i h p + σ( )i ( )i +e σ Execution or data lifetime ・ Each of Γ1, Γ2, Γ3, Γ4 is apermutation of elements in
O,D
・ Γ5 is a permutation of elements in O (a topological order w.r.t. DG)
・ (Γ1, Γ2, Γ3, Γ4, Γ5) represents relative spatial relation in x-y-t space.
・ O((|O|+|D|)^2) computation-time algorithm to compute
which has the minimum layout area and the minimum makespan among all solutions satisfying the spatial relation specified by the code.
・ The size of the solution space
Constrained Sequence-Quintuple 5-tuple (Γ1, Γ2, Γ3, Γ4, Γ5)
(
)
(
)
5 ! D O +( ) ( ) ( )
(
)
{
px i , py i ,σ i |i∈OU D}
A. New Approach to High-Level Synthesis
Basic Theory
Elementary Technology
・ Condition for feasible binding
・ Efficient solution space for 3D-Approach to High-Level Synthesis
・ Binding constrained scheduling
・ Data-path layout, performance estimation
Synthesis System
・ Synthesis system considering wire delay ・ Synthesis system for reconfigurable systems ・ Synthesis system considering control skew ・ Synthesis system for asynchronous systems
B. Design Considering Skew
clock R1 FU1 R2 R3 Register Functional unit R1 R3 clock R1 R3 R1 R3 Max. delay Min. delay Skew (timing difference)B. Schedule and Skew Optimization
Optimum schedule under zero-skew Minimum clock period=8
Schedule length=3
Total computation time = 8×3+0 = 24
8 6 8 3 6 4 R1 R2 R3
Applying skew optimization
→skew values
Minimum clock period=7 Schedule length=3
Total computation time =7×3+1=22
) 1 , 1 , 0 ( ) , , ( 2 3 1 r r = − r τ τ τ 8 6 8 3 6 4 R1 R2 R3
Simultaneous schedule and skew optimization
→skew value
Minimum clock period=5 Schedule length=3
Total computation time=5×3+3=18
) 3 , 1 , 0 ( ) , , ( 2 3 1 r r = r τ τ τ 8 6 8 3 6 4 R1 R2 R3
B. Skew-aware High-Level Synthesis
Basic Theory
Elementary Technology
・ Computational Complexity:
Fixed Schedule, Optimize Skew → P
Simultaneous Schedule and Skew Optimization (even if the execution order is fixed) → NP-hard
・ Exact algorithm to compute optimum skew
・ Heuristic algorithm for simultaneous schedule and skew optimization
Synthesis System
・ Binding-centric approach/3D approach to skew-aware data-path synthesis
C. Delay Fluctuation
clock R1 FU1 R2 R3 Register Function unit R1 R3 Max. delay Min. delay R1R3 Violation of hold condition
R1
C. Delay Fluctuation
R1 FU1 R2 R3 Register Functional unit Double latch φ1 φ2 φ1 φ2effective computation time
φ φ MSS M S1 S2 M S1 S2 margin setup/hold Proposed method R1 R2
effective computation time
margin setup/hold
effective computation time
margin through M S R1 R2 input data output data overwrite output timing setup hold read timing setup hold
read timing output timing
setup hold read/output timing
High-Level, Logic-Level, Circuit-Level
Synthesis for VLSI which has the
D. Asynchronous System
R1 FU1 R2 R3 Register Functional unit Max. delay R1 R3 Min. delay Detect the output arrivalDetect the latch completion Send out latch control signal
Send out latch control signal
Controller
R1
R3
Controller
Setup and hold conditions are always satisfied.
o1 o2 o3 o4 d3 d2 d1 FU_1 FU_2 Reg_1 Reg_2 Reg_3 working phase idle phase d4 0 1 2 3 4 5 6 7 8 9 time c1 c2 c4 d5 d6 o4 d7 c3 MUX o1 o2 o3 d4 DEMUX d1 d3 d2 Reg_5 C C Reg_1 Reg_2 Reg_3 Reg_4 Reg_6 Datapath Controller data control signal FU_1 FU_2 o1 o2 o3 o4
High-Level, Logic-Level, Circuit-Level
Synthesis for Asynchronous System
High performance/Low power/Reliable System on Chip
High speed, low power
・Propagation delay and power consumption on signal/clock wires ・Static/dynamic delay fluctuation
Large scale, system on chip
・Huge size of optimization problems
VLSI: a core device for reliable e-society
Efficient algorithms for huge size of problems
Design methodologies to break through the design crisis
Reliable chip: VLSI test, fault-tolerance
Reliability
Reliable design: Reliable EDA tools, 100% automation Considering layout in high-level design
Robustness, tolerance, insensitiveness to delay fluctuation
・Complex design constraints, a large number of design variables