• 検索結果がありません。

JAIST Repository: 極微細LSIのタイミング設計 : Timing Issues in Nanotechnology LSI

N/A
N/A
Protected

Academic year: 2021

シェア "JAIST Repository: 極微細LSIのタイミング設計 : Timing Issues in Nanotechnology LSI"

Copied!
25
0
0

読み込み中.... (全文を見る)

全文

(1)

Japan Advanced Institute of Science and Technology

JAIST Repository

https://dspace.jaist.ac.jp/

Title 極微細LSIのタイミング設計 : Timing Issues in Nanotechnology LSI

Author(s) 金子, 峰雄 Citation

Issue Date 2007-03-07 Type Presentation Text version publisher

URL http://hdl.handle.net/10119/8303 Rights

Description

4th VERITE : JAIST/TRUST-AIST/CVS joint workshop on VERIfication TEchnologyでの発表資料, 開催 :2007年3月6日∼3月7日, 開催場所:北陸先端科学技 術大学院大学・知識講義棟2階中講義室

(2)

Timing Issues in Nanotechnology LSI

金子峰雄

北陸先端科学技術大学院大学 情報科学研究科

極微細LSIのタイミング設計

Mineo Kaneko

(3)

Minimum width 60nm

VLSI in the Year 2007

1cm 1cm Clock frequency 3GHz 0.33ns 1mm 167m 167m

Arranging 1mm φ wire in Baseball ground

] m [ 167 ] mm [ 1 ] cm [ 1 ] nm [ 60 = Light propagates 10cm in 0.33ns. ] m [ 1 . 0 ]) s [ 10 33 . 0 ( ]) m [ 10 3 ( 8 9 = × × × −

(4)

Moore's Law (Gordon E. Moore)

Complexity grows double in every 18-24 months

"Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965

History of IC = History of Shrinking

Shrink → High Space-Density → More Transistors in a chip Shrink → Improved Tr. performance → High Speed IC Further shrinking

→ Large propagation delay,

Inaccuracy in delay estimation,

(5)

Electrical Aspect of VLSI

MOS Transistor

L

W

dox

Current I, Voltage V

Capacitance Cg

L W d C V L W d I ox g ox ⋅ ⋅ ∝ ⋅ ⋅ ⋅ ∝ − − − 1 2 1 1

Switching delay; delay

1 2 2 1 1 1 − − − − ⋅ = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = ⋅ ∝ L V V L W d V L W d I V C delay ox ox g Wire L H W dOX Resistance Rw,Capacitance Cw 1 1 1 − − − ⋅ ⋅ ∝ ⋅ ⋅ ∝ OX W W d L W C W H L R I I OX W W W d W H L W R C I V C delay ⋅ ⋅ ⋅ = ⋅ = ⋅ ∝ 2

(6)

L1, W1, dox1 L2, W2, dox2

Lr, Wr, Hr, dOXr

Delay = Switching delay + Propagation delay

Elmore delay

(First Moment Model)

Difficulty in delay estimation: Need higher-order model

Various parasitic effects

Static/Dynamic delay fluctuation

Static/Dynamic delay fluctuation: Fluctuations of chemical density and

physical size in the fabrication process Noise on supply voltage

Cross-talk noise

(

)

Electrical Aspect of VLSI

⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + + ⋅ + = − − − − − 2 2 1 2 1 1 1 1 1 2 2 1 2 1 2 d W L L W d H W L V W d L L W d L W d delay OXr r r ox r r r ox ox r r OXr κ λ

Propagation delay is not improved by shrinking

(7)

Mask pattern Specification

x y

Physical-Level Design Transistor circuit Module generation Mask pattern design

Top-down Hierarchical Design of VLSI

Data flow Control flow Algorithm-Level Design Two-stage, Multi-stage State assignment Technology mapping Logic-Level Design Register Transfer-Level Design Scheduling Binding

Bus, multiplexer, net

t

System-Level Design HW/SW partitioning Memory architecture

(8)

Timing issues in Data-path

Execution of c = a + b R1 FU1 R2 R3 Register Functional unit Register R1 Input of FU Register R3 Output of FU

Control signal to latch a, b

time

Functional delay + Propagation delay

arrival of the result c

Control signal to latch c

Timing of control signals ( , ) determines data-path behavior

・ Many operations share the same FU, many data share the same register. ・ Various different delay values

(9)

Register R1 Input of FU Register R3 Output of FU time Functional delay + Propagation delay

ー Synchronous Systemー

Clock signal

・ Design needs ``Delay estimation'' + ``Timing margin'' ・ Easy to implement as a circuit

・ Worst-case estimation + Sufficient margin = Low performance

Arrival of the result R1 FU1 R2 R3 Register Functional unit

(10)

Register R1

Register R3 Output of FU

Input of FU

time Detect the arrival of the result c

ー Asynchronous System ー

Send out latch control signal

Controller

・ No delay-estimation, no timing-margin

・ Tolerance to a large range of delay fluctuation

・ Large area (circuit) overhead in detecting-circuitry

R1 FU1 R2 R3 Register Functional unit

Timing issues in Data-path

Functional delay + Propagation delay

(11)

R1 R2 R3 FU1 FU2 Data-path part +Control part Resource Binding;

assigns each operation to one of available functional units, and assigns each data to one of available registers

Scheduling;

determines the start time of each operation

What is High-Level Synthesis?

u v w y dx du x dv dw FU1 FU2 R1 R2 R3 time

u

w

v

y

x

Behavioral description of an application algorithm

(12)

Layout (Floorplan) Connection Delay information Timing constraints

Wire-delay aware high-level synthesis: Scheduling Resource binding

Constraints on parallel execution

Constraints on resource sharing

High-level synthesis under FU-delay dominant situation

(13)

A. New Approach to High-Level Synthesis

Scheduling-Centric Scheduling Binding Layout Delay Extract Evaluation Binding-Centric Binding Layout Delay Extract Scheduling Evaluation 3D-Approach Binding Layout Scheduling Evaluation Delay Extract

(14)

A. 3D-Approach High-Level Synthesis

Execution Reconfiguration of a FU Data lifetime Reconfiguration of a register

(15)

Computation algorithm to be implemented (Dependence Graph)

O, D

:Set of operations,and set of data

A

: Dependency

e

:

O

N

; Operation delay

Sizes of functional units and registers

x

y

t

( ) i x i w p + ( )i px ( )i py ( ) i y i h p + σ( )i ( )i +e σ

( ) ( ) ( )

(

)

( )

i N p

( )

i N

( )

i N p i i p i p y x y x ∈ ∈ ∈ σ σ , , , , data operation/ each For

Need to check conflicts

Need to check timing constraints

(

O D A

)

DG = , ,

Naive solution space N3(O+ D )

Execution or data lifetime

(16)

x

y

t

( ) i x i w p + ( )i px ( )i py ( ) i y i h p + σ( )i ( )i +e σ Execution or data lifetime ・ Each of Γ1, Γ2, Γ3, Γ4 is a

permutation of elements in

O,D

・ Γ5 is a permutation of elements in O (a topological order w.r.t. DG)

・ (Γ1, Γ2, Γ3, Γ4, Γ5) represents relative spatial relation in x-y-t space.

・ O((|O|+|D|)^2) computation-time algorithm to compute

which has the minimum layout area and the minimum makespan among all solutions satisfying the spatial relation specified by the code.

・ The size of the solution space

Constrained Sequence-Quintuple 5-tuple (Γ1, Γ2, Γ3, Γ4, Γ5)

(

)

(

)

5 ! D O +

( ) ( ) ( )

(

)

{

px i , py ii |iOU D

}

(17)

A. New Approach to High-Level Synthesis

Basic Theory

Elementary Technology

・ Condition for feasible binding

・ Efficient solution space for 3D-Approach to High-Level Synthesis

・ Binding constrained scheduling

・ Data-path layout, performance estimation

Synthesis System

・ Synthesis system considering wire delay ・ Synthesis system for reconfigurable systems ・ Synthesis system considering control skew ・ Synthesis system for asynchronous systems

(18)

B. Design Considering Skew

clock R1 FU1 R2 R3 Register Functional unit R1 R3 clock R1 R3 R1 R3 Max. delay Min. delay Skew (timing difference)

(19)

B. Schedule and Skew Optimization

Optimum schedule under zero-skew Minimum clock period=8

Schedule length=3

Total computation time = 8×3+0 = 24

8 6 8 3 6 4 R1 R2 R3

Applying skew optimization

→skew values

Minimum clock period=7 Schedule length=3

Total computation time =7×3+1=22

) 1 , 1 , 0 ( ) , , ( 2 3 1 r r = − r τ τ τ 8 6 8 3 6 4 R1 R2 R3

Simultaneous schedule and skew optimization

→skew value

Minimum clock period=5 Schedule length=3

Total computation time=5×3+3=18

) 3 , 1 , 0 ( ) , , ( 2 3 1 r r = r τ τ τ 8 6 8 3 6 4 R1 R2 R3

(20)

B. Skew-aware High-Level Synthesis

Basic Theory

Elementary Technology

・ Computational Complexity:

Fixed Schedule, Optimize Skew → P

Simultaneous Schedule and Skew Optimization (even if the execution order is fixed) → NP-hard

・ Exact algorithm to compute optimum skew

・ Heuristic algorithm for simultaneous schedule and skew optimization

Synthesis System

・ Binding-centric approach/3D approach to skew-aware data-path synthesis

(21)

C. Delay Fluctuation

clock R1 FU1 R2 R3 Register Function unit R1 R3 Max. delay Min. delay R1

R3 Violation of hold condition

R1

(22)

C. Delay Fluctuation

R1 FU1 R2 R3 Register Functional unit Double latch φ1 φ2 φ1 φ2

effective computation time

φ φ MSS M S1 S2 M S1 S2 margin setup/hold Proposed method R1 R2

effective computation time

margin setup/hold

effective computation time

margin through M S R1 R2 input data output data overwrite output timing setup hold read timing setup hold

read timing output timing

setup hold read/output timing

High-Level, Logic-Level, Circuit-Level

Synthesis for VLSI which has the

(23)

D. Asynchronous System

R1 FU1 R2 R3 Register Functional unit Max. delay R1 R3 Min. delay Detect the output arrival

Detect the latch completion Send out latch control signal

Send out latch control signal

Controller

R1

R3

Controller

Setup and hold conditions are always satisfied.

(24)

o1 o2 o3 o4 d3 d2 d1 FU_1 FU_2 Reg_1 Reg_2 Reg_3 working phase idle phase d4 0 1 2 3 4 5 6 7 8 9 time c1 c2 c4 d5 d6 o4 d7 c3 MUX o1 o2 o3 d4 DEMUX d1 d3 d2 Reg_5 C C Reg_1 Reg_2 Reg_3 Reg_4 Reg_6 Datapath Controller data control signal FU_1 FU_2 o1 o2 o3 o4

High-Level, Logic-Level, Circuit-Level

Synthesis for Asynchronous System

(25)

High performance/Low power/Reliable System on Chip

High speed, low power

・Propagation delay and power consumption on signal/clock wires ・Static/dynamic delay fluctuation

Large scale, system on chip

・Huge size of optimization problems

VLSI: a core device for reliable e-society

Efficient algorithms for huge size of problems

Design methodologies to break through the design crisis

Reliable chip: VLSI test, fault-tolerance

Reliability

Reliable design: Reliable EDA tools, 100% automation Considering layout in high-level design

Robustness, tolerance, insensitiveness to delay fluctuation

・Complex design constraints, a large number of design variables

参照

関連したドキュメント

P1 and P2 membranes were resuspended in the binding assay buffer to a final protein concentration of 1 mg/ml and treated with combinations of increasing concentrations of CHAPS

Calcula- tion result of RMSD, B-factor and binding free energy suggests that wild type HA has much structural stabil- ity, which contributes to binding affinity with Fab frag-

Northern blot analysis using 5’ portion of the chicken DDB1 cDNA as a probe detected a single transcript of ~ 4.3 kb in chicken DT40 cells as well as in human HeLa cells

Effect of Porcine Placental Extract on Collagen Production in Human Skin Fibroblasts In Vitro.. Chikako Yoshikawa 1 , Fumihide Takano 2,3 , Yasuhito Ishigaki 4 , Masahiko Okada 1

Wu, “A generalisation model of learning and deteriorating effects on a single-machine scheduling with past-sequence-dependent setup times,” International Journal of Computer

Then, an improved artificial immune network algorithm aiNet approach is presented to solve the multi- mode resource-constrained multiproject scheduling problem MRCMPSP2. The

Suppose the basic data are as shown in Section 4.1, no shifting-berth operation exists and all tugboats do not return to the anchorage base during the planning horizon, use the

The performance of scheduling algorithms for LSDS control is usually estimated using a certain number of standard parameters, like total time or schedule