

# Compositional Timing in Concurrent, Parallel, and Distributed Real-Time Systems

### Edward A. Lee

Robert S. Pepper Distinguished Professor UC Berkeley

### Invited Keynote Talk

3rd Workshop on Compositional Theory and Technology for Real-Time Embedded Systems

November 30, 2010, San Diego, CA, USA (co-located with RTSS 2010)

#### Thanks to:

- Danica Chang
- David Culler
- Gage Eads
- Stephen Edwards
- John Eidson
- Jeff Jensen
- Sungjun Kim
- Isaac Liu
- Slobodan Matic
- Jan Reineke

Hiren Patel

- Alberto Sangiovanni-Vincentelli
- Jia Zou

#### Abstract

Edward A. Lee Professor, UC Berkeley

of PREcision Timed (PRET) machines, which introduce temporal semantics timing can facilitate efficient parallel execution in suitably-designed concurrent tasks on a processor, enabling the sharing of resources coarse-grained time multiplexing of shared resources. In this talk, multicore architectures and in distributed real-time systems. with repeatable timing behavior. The approach builds on the concept In general-purpose computing, concurrency is achieved by relatively into an instruction-set architecture. I then explore how repeatable l explore alternatives that provide temporal isolation between

#### resources with physical systems Cyber-Physical Systems (CPS): Orchestrating networked computational



**Automotive** 



E-Corner, Siemens

**Telecommunications** 

(Soleil Synchrotron) *Instrumentation* 

generation and Power distribution



Military systems:



General Electric



Courtesy of Kuka Robotics Corp.

Lee, Berkeley 3

# Approaching the CPS Challenge

Physicalizing the cyber (PtC): to endow software and network components with abstractions and interfaces that represent their physical properties, such as dynamics in time



Cyberizing the Physical (CtP): to endow physical subsystems with cyber-like abstractions and interfaces

## The Problem We Address

```
// Perform the convolution.
                                            for (int i=0; i<10; i++) {
 x[i] = a[i]*b[j-i];
notify(x[i]);
                       // Notify listeners
```

Today, correct execution of a program has nothing to do with

how long it takes to do anything. We are correcting this by introducing precise and repeatable

timing into core computing infrastructure.



- Timing becomes a property of a hardware executing the program. *program*, not just a property of particular
- Timing of programs becomes predictable, repeatable, and portable
- Behavior of systems becomes predictable, repeatable, and portable

## Problems that Arise Today

- Execution time analysis requires very detailed models of processors.
- Execution time analysis is intractable for many real programs and processors.



- Multicore interactions disrupt timing.
- Network interactions disrupt timing. Pipeline stalls and speculation make timing of programs hard to predict and hard to control
- Memory hierarchy (caches) introduce timing variability and timing interference across tasks

Consequence: System behavior is hard to predict and control.

## These problems are not intrinsic to the technology!



Electronics technology delivers highly reliable and precise timing...



20.000 MHz (± 100 ppm)

... and the overlaying software abstractions discard it.

```
// Perform the convolution.
for (int i=0; i<10; i++) {
   x[i] = a[i]*b[j-i];
   // Notify listeners.
   notify(x[i]);
}</pre>
```

### **PRET Machines**

- **PRE**cision-Timed processors = **PRET**
- Predictable, REpeatable Timing = PRET
- Performance with REpeatable Timing = PRET

```
// Perform the convolution.
                                      for (int i=0; i<10; i++) { x[i] = a[i]*b[j-i];
notify(x[i]);
                   // Notify listeners
                             = PRET
```

Computing

With time

Lee, Berkeley 8

## Make Timing a Semantic Property of Computers PRET Machines:

Make temporal behavior as important as logical function.

Timing precision with performance: Challenges:

- ISAs with timing (repeatable instr. timing? deadline instructions?)
- Deep pipelines (interleaving?)
- Memory hierarchy (scratchpads? DRAM banks?)
- Predictable memory management (Metronome?)
- Languages with timing (discrete events? Giotto?)
- Predictable concurrency (synchronous languages?)
- Composable timed components (actor-oriented?)
- Multicore PRET (conflict-free routing?)
- Precision networks (TTA? Time synchronization?)

Edwards and Lee, "The Case for the Precision Timed (PRET) Machine," Wild and Crazy Ideas Track, Design Automation Conference (DAC), June 2007.

### **Pipelining**



Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2007.

## Pipeline Hazards



Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2007.

# An Alternative: Pipeline Interleaving



T0: cmp %g2, 9

T0: bg, a 40011b8

T0: add %i1, %i2, %l3



#### Stall pipeline

Dependencies result in complex timing behaviors

## Thread-interleaved pipeline:

T0: cmp %g2, 9 T1: add %o0, %g1, %g2

T3:bn 430011a0

T2: sub %g1, %g2, %g1

T4: ld [ %fp + -12 ], %g1

T5: cmp %g1, 4

T0: bg, a 40011b8

T1:cmp %g1,4

П t+1 t+2 t+3 t+4 t+5 O  $\mathcal{D}$ Ш R **≤** ≥ R 0+1 \$  $\leq$ Ш П R 8 t+7 t+8 t+9 t+10 t+1" 3 \$ ≥  $\mathbb{R}$ ≥ 8 ≥

Repeatable timing behavior of instructions

## Pipeline Interleaving

#### An old idea:

- o 1960s:
- CDC 6600
- Denelcore HEP
- C
- o 2000s
- Sandbridge Sandblaster
   (John Glossner, et al.)
- XMOS (David May, et al.)



Lee and Messerschmitt, Pipeline Interleaved Programmable DSPs ASSP-35(9), 1987.

processors with explicit multithreading." Computing Surveys 35(1): 29-63. There are various detractors. See Ungerer, T., B. Robic and J. Silc (2003). "A survey of

### **Compositional Timing** Precise, Regular Timing, and Pipeline Interleaving Enables

- If interrupts are always handled by thread 1, affected by I/O. then the timing of threads 2 through *n* is not
- XMOS approach: Activate thread n+1 to standard methods other threads, but much more smoothly than handle interrupts. This does affect the timing of

### Our solution: PRET Machines

Make temporal behavior as important as logical function.

Timing precision with performance: Challenges:

- ISAs with timing (repeatable instr. timing? deadline instructions?)
- Deep pipelines (interleaving?)
- Memory hierarchy (scratchpads? DRAM banks?)
- Predictable memory management (Metronome?)
- Languages with timing (discrete events? Giotto?)
- Predictable concurrency (synchronous languages?)
- Composable timed components (actor-oriented?)
- Multicore PRET (conflict-free routing?)
- Precision networks (TTA? Time synchronization?)

Edwards and Lee, "The Case for the Precision Timed (PRET) Machine," Wild and Crazy Ideas Track, Design Automation Conference (DAC), June 2007.

## Memory Hierarchy



Hennessey and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2007

- Register file is a temporary memory under program control.
- Why is it so small? Instruction word size
- Cache is a temporary memory under hardware control.
- Why is replacement strategy is application independent? Separation of concerns.

PRET principle: all temporary memory is under program control.

# Dynamic RAM vs Static RAM

- Static RAM (SRAM) is
- low capacity: each cell has six transistors
- low access latency
- typically on-chip in lower levels of memory hierarchy
- Dynamic RAM (DRAM) is
- high capacity: one transistor and one capacitor
- higher access latency
- typically off-chip in higher levels of memory hierarchy (with some recent exceptions)

# One Possible PRET Architecture



# What about Main Memory? Modern DRAMs:



atch

DDRn: 2<sup>n</sup> pipelined banks?

DDR3: Eight pipelined banks

# General-Purpose DRAM Controllers

- Timing is hard to predict even for single client:
- Timing of request depends on past requests:
- Request to same/different bank?
- Request to open/closed row within bank?
- Controller might reorder requests to minimize latency
- Controllers dynamically schedule refreshes

The result is non-composable timing.

## Predator (TU Eindhoven) and AMC (Barcelona) Predictable DRAM Controllers:

- Schedule predefined patterns:
- Decouple access timing from execution history
- Still schedule refreshes dynamically
- Allow to determine worst-case access timing (pessimistic regarding refreshes)
- o Composable arbitration:
- Round robin (AMC)
- Latency-rate servers + Buffers to hide interaction between clients (Predator)

## PRET DRAM Controller

- [Kim and Reineke]
- Exploit internal structure of DRAM module:
- Consists of 4-8 banks in 1-2 ranks
- Partition into four groups of banks in alternating ranks

Share only command and data bus, otherwise independent

Cycle through groups in a time-triggered fashion



- Successive accesses to timing constraints same group obey
- Reads/Writes to intertere different groups do not

predictable resources independent and Provides four

# Resulting PRET Architecture



## Make Timing a Semantic Property of Computers PRET Machines:

Make temporal behavior as important as logical function.

Timing precision with performance: Challenges:

- ISAs with timing (repeatable instr. timing? deadline instructions?)
- Deep pipelines (interleaving?)
- Memory hierarchy (scratchpads? DRAM banks?)
- Predictable memory management (Metronome?)
- Languages with timing (discrete events? Giotto?)
- Predictable concurrency (synchronous languages?)
- Composable timed components (actor-oriented?)
- Multicore PRET (conflict-free routing?)
- Precision networks (TTA? Time synchronization?)

Edwards and Lee, "The Case for the Precision Timed (PRET) Machine," Wild and Crazy Ideas Track, Design Automation Conference (DAC), June 2007.

# Extending an ISA with Timing Instructions

### [V1] Best effort:

```
set_time r1, 1s
// Code block
delay_until r1
```

## [V2] Late miss detection

```
set_time r1, 1s
// Code block
branch_expired r1, <target>
delay_until r1
```

## [V3] Immediate miss detection

```
set_time r1, 1s
exception_on_expire r1, 1
// Code block
deactivate_exception 1
delay_until r1
```

## [V4] Exact execution:

```
set_time r1, 1s
// Code block
MTFD r1
```

### Example: Exporting the Timed Semantics to a \_ow-Level Language (like C)

```
tryin (500ms) {
  // Code block
} catch {
  panic();
}
```

non zero when invoked by longjmp(). setjmp() returns 0 when directly invoked, and returns using setjmp and longjmp to handle timing exceptions. This realizes variant 3, "immediate miss detection,"

run, causing the panic procedure to be called. but returning non zero. The else statement will then be run longjmp, which will return control flow to setjmp, exception 0 will be thrown, and the handler code will If the code block takes longer than 500ms to run, then

```
jmp_buf buf;
if (!setjmp(buf)){
    set_time r1, 500ms
    exception_on_expire r1, 0
    // Code block
    deactivate_exception 0
} else {
    panic();
}
exception_handler_0 () {
    longjmp(buf)
}
```

This pseudocode is neither C-level nor assembly, but is meant to explain an assembly-level implementation.

# Summary of ISA extensions

- [V1] Execute a block of code taking at least a specified time [lp & Edwards, 2006]
- [V2] Do [V1], and then conditionally branch if the specified time was exceeded
- [V3] Do [V1], but if the specified time is exceeded to an exception handler. during execution of the block, branch immediately
- [V4] Execute a block of code taking exactly the specified time. MTFD

#### Variants:

- •For V2 V4, may not impose minimum execution time.
- Time may be literal (seconds) or abstract (cycles).

# A Brief Word on Multicore PRET Machines

Make temporal behavior as important as logical function.

Timing precision with performance: Challenges:

- ISAs with timing (repeatable instr. timing? deadline instructions?)
- Deep pipelines (interleaving?)
- Memory hierarchy (scratchpads? DRAM banks?)
- Predictable memory management (Metronome?)
- Languages with timing (discrete events? Giotto?)
- Predictable concurrency (synchronous languages?)
- Composable timed components (actor-oriented?)
- Multicore PRET (conflict-free routing?)
- Precision networks (TTA? Time synchronization?)

Edwards and Lee, "The Case for the Precision Timed (PRET) Machine," Wild and Crazy Ideas Track, Design Automation Conference (DAC), June 2007.

### Multicore PRET

A preliminary project by Dai Bui shows that control over timing enables conflict-free routing of messages in a network on chip.

This means that it becomes possible to have programs on a multicore PRET with compositional timing!



## **Opportunities** A Few of the (Many) Remaining Challenges and

- DRAM designs today compromise efficiency even with private banks (e.g. write-after-read latencies)
- Interleaved pipelines may not be the best choice for power optimization
- Exposing timing properties in programming models (completely absent in today's languages)
- I/O mechanisms that do not disrupt repeatable timing
- More work needed on multicore.
- .

## A Top Down Approach:

Make Timing a Semantic Property of Software Components

"model time" and "real time" bound at sensors and actuators. PTIDES: Distributed execution under discrete-event semantics, with



## PTIDES: Programming Temporally Integrated Distributed Embedded Systems

events can be safely processed (preserving DE semantics) PTIDES uses static causality analysis to determine when



## PTIDES: Programming Temporally Integrated Distributed Embedded Systems

analyze control system dynamics... ... and being explicit about time delays means that we can



Lee, Berkeley 33

### First Test Case



This device was designed by Jeff Jensen, now at National Instruments.

- Tunneling Ball Device
- sense ball
- track disk
- adjust trajectory



Lee, Berkeley 34

# Tunneling Ball Device in Action





# Tunneling Ball Device

Mixed event sequences





# Synchrophasor Measurement & Control Second Test Case: Distributed



#### Experiment Diagram

Grid emulator built with
National Instruments PXI
Primary Measurement
Unit (PMU) built with
Renesas demo boards

Renesas demo boards
with DP83640

Ethernet bridge or 1588
boundary/transparent clock

Synchrophasor Vector
Processing unit (SVP) built
with Renesas demo board with
DP83640



# The question addressed by the PTIDES project:

software? some bounded error, how does this change If you assume that computers on a network how we develop distributed real-time can agree on the current time of day within

Our answer: It changes everything!

with synthesis of embedded software. on distributed discrete-event (DE) models Our approach: Model-based design based

# Synchronization with Bounded Error Distributed PTIDES Relies on Network Time

Press Release October 1, 2007



#### NEWS RELEASE

#### National Semiconductor Naomi Mitchell Media Contact naomi.mitchell@nsc.com 408) 721-2142

Reader Information

Design Support Group (800) 272-9959 www.national.com

### Industry's First Ethernet

Outstanding Clock Accuracy Hardware Support from National Semiconductor Delivers Transceiver with IEEE 1588 PTP

Achieve 8- Nanosecond Precision with Maximum System Flexibility Using DP83640, Designers May Choose Any Microcontroller, FPGA or ASIC to

#### This may become routine!

on a LAN agree on the techniques like NTP within 8ns, far more current time of day to precise than older With this PHY, clocks

A question we are this change how we addressing at CPS software? develop distributed Berkeley: How does



# An Extreme Example: The Large Hadron Collider

and synchronous ethernet. 10 km apart to within about 80 psec using a combination of IEEE 1588 PTP The WhiteRabbit project at CERN is synchronizing the clocks of computers



# More Generally than PTIDES: Rethinking Software Compone

Object Oriented vs. Actor Oriented Rethinking Software Components to Admit Time.

The established: Object-oriented:



What flows through an object is sequential control

Things happen to objects

### The alternative: Actor oriented:



Actors make things happen

What flows through an object is evolving data

Input data

Output data

# Examples of Actor-Oriented Systems

- UML 2 and SysML (activity diagrams)
- ASCET (time periods, interrupts, priorities, preemption, shared variables )
- Autosar (software components w/ sender/receiver interfaces)
- Simulink (continuous time, The MathWorks)
- LabVIEW (structured dataflow, National Instruments)
- SCADE (synchronous, based on Lustre and Esterel)
- CORBA event service (distributed push-pull)
- ROOM and UML-2 (dataflow, Rational, IBM)
- VHDL, Verilog (discrete events, Cadence, Synopsys, ...)
- Modelica (continuous time, constraint-based, Linkoping)
- **OPNET** (discrete events, Opnet Technologies)
- SDL (process networks)
- Occam (rendezvous)
- SPW (synchronous dataflow, Cadence, CoWare)

0

The semantics of these differ considerably in their approaches to concurrency and time. Some are loose (ambiguous) and some rigorous. Some are strongly actororiented, while some retain much of the flavor (and flaws) of threads.

## **Actor-Oriented Design** Ptolemy II: Our Laboratory for Experiments with



#### Conclusions

computational systems. Today, timing is a property only of realizations of

computational models Tomorrow, timing will be a semantic property of



## The Ptolemy Pteam





New Text: Lee & Seshia: Introduction to Embedded Systems - A Cyber-Physical Systems Approach

http://LeeSeshia.org/

This book strives to identify and introduce the durable intellectual ideas of embedded systems as a technology and as a subject of study. The emphasis is on modeling, design, and analysis of cyber-physical systems, which integrate computing, networking, and physical processes.