Skip to content

Loops

for, while, do-while, repeat, forever, foreach — synthesisable vs simulation-only.

Module 5 · Page 5.4

for — The Workhorse Loop

The for loop is identical in form to C. It is the most common loop in SystemVerilog RTL because synthesis tools can unroll it: when the bounds are known at compile time, the tool simply replicates the loop body N times in hardware — no actual loop exists in the synthesised netlist.

SystemVerilog — for loop
// ── Basic for loop ──────────────────────────────────────────────
always_comb begin
parity = 1'b0;
for (int i = 0; i < 8; i++) // unrolls to 8 XOR gates
parity ^= data[i];
end
 
// ── for with parameter (synthesis-friendly) ─────────────────────
parameter WIDTH = 16;
 
always_comb begin
result = '0;
for (int i = 0; i < WIDTH; i++)
if (mask[i]) result[i] = data[i];
end
 
// ── Nested for (2D array init) ───────────────────────────────────
initial begin
for (int r = 0; r < 4; r++)
for (int c = 0; c < 4; c++)
mem[r][c] = r * 4 + c;
end
 
// ── Declare loop variable inside for (SV style) ─────────────────
always_comb begin
for (int unsigned k = 0; k < 32; k++)
popcount += data[k];   // k is local to this for block
end

🏗 Synthesis Insight: What Loop Unrolling Actually Produces

When synthesis unrolls for(int i=0; i<8; i++) parity ^= data[i];, it does not generate 8 sequential XOR operations — it generates 8 parallel XOR gates wired together as a tree. The synthesized netlist is identical to writing parity = data[0]^data[1]^data[2]^data[3]^data[4]^data[5]^data[6]^data[7]; directly. Critical path through the XOR tree: 3 gate levels (log₂8) ≈ 300ps in 28nm. The loop variable i does not exist in the netlist — it is purely a code-generation tool that the synthesizer uses once and discards. This is why for loops in RTL are powerful: they let you describe N-way parallel hardware concisely without manually writing N statements.

🧠 How the Simulator Executes for in always_comb

In simulation, a for loop inside always_comb executes sequentially — iteration 0 completes before iteration 1 starts. But since the entire always_comb block runs in the Active region (zero simulation time), all 8 iterations finish within a single delta cycle. From the waveform viewer's perspective, the output parity changes atomically when any input changes — it appears combinational. This is simulation-accurate: sequential execution within zero time is equivalent to parallel evaluation.

while — Condition-First Loop

A while loop evaluates its condition before executing the body. If the condition is false from the start, the body never runs. In simulation it runs as long as the condition remains true. In synthesis it is only valid if the tool can prove the iteration count is bounded and constant.

SystemVerilog — while loop
// ── Simulation: while loop with dynamic condition ───────────────
initial begin
count = 0;
while (count < 100) begin  // runs 100 times
@(posedge clk);
count++;
end
end
 
// ── RTL: while synthesises only with statically bounded count ───
always_comb begin
int tmp = data;
lz = 0;
while (tmp[7] == 0 && lz < 8) begin  // bound = 8 → unrollable
tmp = tmp << 1;
lz++;
end
end
 
// ── Testbench: wait for handshake ───────────────────────────────
initial begin
while (!dut.ready) @(posedge clk); // spin until DUT asserts ready
$display("DUT ready at t=%0t", $time);
end

do-while — Body Executes at Least Once

The do-while loop evaluates its condition after the body. This guarantees the body runs at least once, regardless of the initial condition value. It is less common in RTL but useful in testbenches where you need at least one cycle of stimulus.

SystemVerilog — do-while loop
// ── Body runs at least once even if condition is false initially ─
initial begin
do begin
@(posedge clk);          // always samples at least one edge
data_in = $random;
end while (data_in != 8'hFF); // keep driving until 0xFF
end
 
// ── Compare: while vs do-while ──────────────────────────────────
initial begin
count = 10;
 
// while — condition false immediately, body never runs
while (count < 5) begin count++; end  // body skipped
 
count = 10;
 
// do-while — body runs once, then condition checked
do begin count++; end while (count < 5);  // body runs once, count=11
end

repeat — Fixed Iteration Count

repeat(N) executes its body exactly N times. N can be any expression — it is evaluated once at the start and then held fixed. Unlike for, there is no loop variable. It is frequently used in testbenches to drive a fixed number of clock cycles or transactions.

SystemVerilog — repeat loop
// ── Drive 10 clock cycles ───────────────────────────────────────
initial begin
repeat(10) @(posedge clk);
$display("10 cycles done at t=%0t", $time);
end
 
// ── Send N transactions (N from parameter) ──────────────────────
initial begin
repeat(NUM_PKTS) begin
@(posedge clk);
valid = 1;
data  = $random;
@(posedge clk);
valid = 0;
end
end
 
// ── Synthesis: repeat with constant N synthesises ───────────────
always_comb begin
shifted = data;
repeat(4) shifted = {shifted[6:0], 1'b0};  // left-shift 4 times = <<4
end

forever — Runs Without End

forever has no condition and no counter — it runs indefinitely. It is simulation-only: synthesis tools will reject it because they cannot unroll an infinite loop into hardware. It is the correct way to write clock generators, bus monitors, and any testbench process that must run for the duration of simulation.

SystemVerilog — forever loop (simulation only)
// ── Clock generator ─────────────────────────────────────────────
initial begin
clk = 0;
forever #5 clk = ~clk;   // 10-ns period
end
 
// ── Bus monitor ─────────────────────────────────────────────────
initial begin
forever begin
@(posedge clk);
if (valid && ready)
$display("[%0t] Transfer: data=%0h", $time, data);
end
end
 
// ── Watchdog (ends simulation if timeout) ───────────────────────
initial begin
forever begin
@(posedge clk);
if (cycle_count > 10_000) begin
$error("TIMEOUT: simulation exceeded 10000 cycles");
$finish;
end
end
end

🔍 Debugging Insight: How to Detect a Zero-Time Forever Loop

Symptom: simulation binary starts, prints nothing, consumes 100% CPU, and never terminates. No VCD output is created. The simulator is stuck in the Active region executing your forever body millions of times per second in zero simulation time. Diagnostic steps: (1) Check all forever blocks for missing @, #, or wait. (2) Search for always begin (without sensitivity) — same zero-time loop risk. (3) VCS will print "Iteration limit reached" before aborting if you add +nbaopt — use this flag to get a stack trace pointing to the offending loop. (4) In Questa, add -iterationlimit 10000 to get an error with file/line number instead of a hang.

💡 Senior Verification Engineer Tip: forever vs always for Clock Generation

Both always #5 clk = ~clk; and initial begin clk=0; forever #5 clk=~clk; end generate a 10ns clock. The difference: the always form starts with an undefined clock value (X) at T=0 until the first toggle at T=5. The initial + forever form starts with clk=0 at T=0 and toggles at T=5, T=10, etc. In strict setup-hold checking, the initial X state from the always form can cause spurious timing violations in the first cycle. Production testbenches always use the initial + forever pattern with an explicit initial clock value.

foreach — Iterate Over Arrays Automatically

foreach is SystemVerilog's array-aware loop. It automatically generates the correct loop variable and bounds for any unpacked array dimension — you never have to manually write the size. It handles multi-dimensional arrays elegantly.

SystemVerilog — foreach loop
// ── 1D array ────────────────────────────────────────────────────
int arr[8];
 
initial begin
foreach (arr[i])         // i automatically declared, range 0..7
arr[i] = i * 2;
end
 
// ── 2D array ────────────────────────────────────────────────────
int matrix[4][4];
 
initial begin
foreach (matrix[r, c])   // r iterates rows, c iterates columns
matrix[r][c] = r + c;
end
 
// ── Dynamic array ───────────────────────────────────────────────
int dyn[];
dyn = new[16];
 
initial begin
foreach (dyn[k])
dyn[k] = $random;
end
 
// ── Queue ───────────────────────────────────────────────────────
string q[$] = {"alpha", "beta", "gamma"};
 
initial begin
foreach (q[j])
$display("q[%0d] = %s", j, q[j]);
end
 
// ── Partial dimension (outer loop only) ─────────────────────────
int cube[4][4][4];
 
initial begin
foreach (cube[x])        // iterate only the first dimension
cube[x][0][0] = x;
end

🚀 RTL Design Insight: foreach in Synthesis vs Simulation

foreach on a static unpacked array (fixed size, declared with logic arr[8]) is synthesizable — the tool knows the size at compile time and unrolls identically to a for loop. foreach on a dynamic array (logic arr[]) or queue (logic arr[$]) is simulation-only — dynamic size means the iteration count cannot be determined at synthesis time. This is the most common mistake engineers make when porting verification code to RTL: a foreach that worked in the testbench silently errors in synthesis because the array was dynamic.

Synthesis vs. Simulation — Which Loops Are Synthesisable?

The golden rule is simple: a loop synthesises if and only if the synthesiser can statically determine the number of iterations at compile time. If it cannot, the tool will error out.

Loop typeSynthesisable?ConditionTypical use
forYesBounds must be compile-time constantsBit manipulation, parallel operations, RTL
repeat(N)YesN must be a constant expressionFixed replication, RTL and testbench
foreachYesArray size must be fixed (packed/unpacked static arrays)Array initialisation and transformation
whileConditionalOnly if tool can prove bounded iteration countRTL if bounded; simulation freely
do-whileConditionalOnly if tool can prove bounded iteration countTestbenches, one-or-more semantics
foreverNoInfinite — cannot unrollClock generators, monitors, testbench processes

🏗 RTL Loop Patterns — What Real Hardware Engineers Write

Every production RTL design contains loops. They appear in every data-path block — ALUs, encoders, decoders, shifters, CRC generators, memory controllers. Here are the canonical patterns that show up repeatedly in code reviews.

SystemVerilog — Production RTL Patterns Using for Loops
// ── Pattern 1: Parameterized parity generator ─────────────────────
module parity_gen #(parameter int WIDTH = 32) (
input  logic [WIDTH-1:0] data,
output logic             parity_even, parity_odd
);
always_comb begin
parity_even = 1'b0;
for (int i = 0; i < WIDTH; i++)
parity_even ^= data[i];  // XOR tree: log₂(WIDTH) gate levels
parity_odd = ~parity_even;
end
endmodule
// Synthesis: 32-input XOR tree → 5 gate levels → ~250ps critical path (28nm)
 
// ── Pattern 2: Population count (popcount / Hamming weight) ──────
module popcount #(parameter int W = 32) (
input  logic [W-1:0]        data,
output logic [$clog2(W):0]  count
);
always_comb begin
count = '0;
for (int i = 0; i < W; i++)
count += data[i];   // W single-bit adders → adder tree
end
endmodule
// Synthesis: W=32 → 5-level carry-save adder tree
// Used in: network packet header processing, LDPC decoders, fault injection
 
// ── Pattern 3: Parameterized priority encoder ─────────────────────
module priority_enc #(parameter int N = 8) (
input  logic [N-1:0]     req,
output logic [$clog2(N)-1:0] grant_id,
output logic              valid
);
always_comb begin
grant_id = '0;
valid    = 1'b0;
for (int i = N-1; i >= 0; i--) begin  // scan high→low: lowest index wins
if (req[i]) begin
grant_id = i[$clog2(N)-1:0];
valid = 1'b1;
end
end
end
endmodule
// for loop scans MSB to LSB → last overwrite wins = lowest-index priority
 
// ── Pattern 4: 8-bit CRC-8 (serial) using for in always_ff ───────
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
crc <= 8'hFF;
end else if (valid) begin
automatic logic [7:0] tmp = crc;
for (int i = 0; i < 8; i++) begin
if (tmp[7] ^ data_in[7-i])
tmp = (tmp << 1) ^ 8'h07;  // CRC-8/SMBUS polynomial
else
tmp = tmp << 1;
end
crc <= tmp;
end
end
// for loop inside always_ff: synthesizes to 8-stage combinational chain
// The 8 iterations are all computed in parallel before registering the result
 
// ── Pattern 5: Find-first-set-bit using for ───────────────────────
always_comb begin
first_set = '0;
found     = 1'b0;
for (int i = 0; i < 32; i++) begin
if (!found && data[i]) begin
first_set = i[4:0];
found = 1'b1;
end
end
end
// The 'found' flag gates later iterations — synthesis: priority encoder

🔬 Verification Loop Patterns — Drivers, Monitors, and Checkers

Every UVM component and every directed testbench is built on loops. The pattern you choose determines simulation performance, debuggability, and correctness. Here are the canonical patterns used in production verification environments.

SystemVerilog — Verification Loops: Clock, Driver, Monitor, Scoreboard
module tb_full_verif_pattern;
logic       clk = 0;
logic       rst_n;
logic [7:0] data_in, data_out;
logic       valid_in, valid_out;
int         pass_cnt = 0, fail_cnt = 0;
 
// ── ① Clock generator: forever in initial ────────────────────
initial forever #5 clk = ~clk;  // clk starts 0, toggles at 5,10,15...
 
// ── ② Reset generator: repeat for fixed-cycle hold ───────────
initial begin
rst_n = 0;
repeat(5) @(posedge clk);  // hold reset for 5 cycles
rst_n = 1;
end
 
// ── ③ Directed driver: for loop drives N test vectors ─────────
initial begin
@(posedge rst_n);               // wait for reset deassertion
for (int i = 0; i < 256; i++) begin
@(posedge clk);
data_in  = i[7:0];         // sweep all 8-bit values
valid_in = 1'b1;
@(posedge clk);
valid_in = 1'b0;
end
$display("Directed test done. Pass:%0d Fail:%0d", pass_cnt, fail_cnt);
$finish;
end
 
// ── ④ Random stimulus driver: while loop with coverage goal ──
initial begin
@(posedge rst_n);
while (pass_cnt < 1000) begin  // run until 1000 passing transactions
@(posedge clk);
data_in  = $urandom_range(0, 255);
valid_in = $urandom_range(0, 1);
end
end
 
// ── ⑤ Output monitor: forever sampling on posedge ────────────
initial forever begin
@(posedge clk);
if (valid_out) begin
automatic logic [7:0] expected = ~data_in; // reference model
if (data_out === expected) pass_cnt++;
else begin
$error("FAIL: got %h exp %h", data_out, expected);
fail_cnt++;
end
end
end
 
// ── ⑥ Watchdog: forever with cycle counter ────────────────────
initial begin
forever begin
@(posedge clk);
if ($time > 1_000_000) begin
$fatal(1, "WATCHDOG: simulation timeout");
end
end
end
 
// ── ⑦ Memory scoreboard: foreach iterates result array ────────
logic [7:0] results[256];
initial begin
@(posedge rst_n); #1000;
foreach (results[i]) begin   // check all collected results
if (results[i] !== ~i[7:0])
$error("results[%0d]=%h expected %h", i, results[i], ~i[7:0]);
end
end
endmodule
TB ComponentLoop UsedWhy That Loop
Clock generatorinitial + foreverStarts with known clk=0, runs for entire simulation
Reset sequencerrepeat(N)Exactly N cycles of reset — readable, no counter variable needed
Directed driverforKnown count, index used for data value — natural fit
Random driverwhileRuns until coverage/count goal met — dynamic termination
MonitorforeverMust run for entire simulation, no termination condition
WatchdogforeverContinuous check, ends simulation on timeout
Array checkerforeachIterates collected results array — clean, automatic bounds

⚙ Loop Unrolling — What the Synthesis Tool Actually Does

Understanding how synthesis unrolls a loop is the key to writing RTL that produces efficient hardware. The unrolling decision is binary: either the tool can determine the exact iteration count at elaboration time, or it cannot. There is no middle ground. Synthesis Unrolling — for(int i=0; i<8; i++) parity ^= data[i]Before unrolling (what you write): for(int i=0; i<8; i++) parity ^= data[i];After unrolling (what synthesis sees — identical to writing this directly): parity = data[0]; ← step 0: initial value parity ^= data[1]; ← step 1: XOR gate #1 parity ^= data[2]; ← step 2: XOR gate #2 parity ^= data[3]; ← step 3: XOR gate #3 parity ^= data[4]; ← step 4: XOR gate #4 parity ^= data[5]; ← step 5: XOR gate #5 parity ^= data[6]; ← step 6: XOR gate #6 parity ^= data[7]; ← step 7: XOR gate #7Technology mapping result (28nm standard cell library): Level 1: xnor2(data[0],data[1]) → t01 xnor2(data[2],data[3]) → t23 xnor2(data[4],data[5]) → t45 xnor2(data[6],data[7]) → t67 Level 2: xnor2(t01,t23) → t0123 xnor2(t45,t67) → t4567 Level 3: xnor2(t0123,t4567) → parityCritical path: 3 gate delays ≈ 300ps. All 8 data bits evaluated in PARALLEL.❌ Variable bound — synthesis FAILS// N is a port/signal — runtime value module bad_shift( input logic [7:0] data, input logic [2:0] shift_amt, output logic [7:0] out ); always_comb begin out = data; for(int i=0; i<shift_amt; i++) out = {out[6:0], 1'b0}; end // ❌ shift_amt is a signal — // loop count unknown at compile time // synthesis ERROR: cannot unroll✅ Barrel shifter — synthesis works// Unroll MAX, gate with enable module barrel_shift( input logic [7:0] data, input logic [2:0] shift_amt, output logic [7:0] out ); always_comb begin out = data; for(int i=0; i<8; i++) // ✅ constant 8 if(shift_amt > i) out = {out[6:0],1'b0}; end // ✅ Always 8 iterations // Mux selects actual shift count

🏗 Synthesis Concern: Large Loops Create Large Netlists

A for loop with 1024 iterations and a complex body creates 1024 copies of that logic in the netlist. A 32-bit adder inside a 1024-iteration loop produces 1024 32-bit adders — ~32,768 full-adder cells. This is intentional for parallel hardware, but if you accidentally put a large loop in RTL, synthesis will run for a very long time and produce an enormous (and probably wrong) netlist. Always review loop bounds in RTL: for(int i=0; i<32; i++) with a simple body (XOR, AND) is fine. for(int i=0; i<1024; i++) with a 64-bit multiplier is probably a mistake.

⚠ Infinite Loop Debugging — Finding Simulation Hangs

A zero-time infinite loop is one of the most disorienting simulation failures. The process hangs, no output appears, and the VCD file is empty. Here is a systematic approach to finding and fixing it.

  1. **** — Check CPU usage. If one simulation core is pegged at 100% with no VCD output and no console messages, a zero-time loop is almost certain.
  2. **** — Add +nbaopt (VCS) or -iterationlimit (Questa) to get the simulator to abort with a stack trace when the delta-cycle limit is exceeded. The stack trace points directly to the offending line.
  3. **** — Search for loops without timing controls. Find every forever, while, always, and for loop in the file. Check each one for at least one @(event), #delay, or wait(condition) inside the body.
  4. **** — Check for delta-cycle oscillation. Even with timing controls, an always_comb chain that feeds back onto itself can cause unlimited delta cycles. Look for combinational feedback loops.
  5. **** — Add $display at the loop entry as a quick diagnostic. If you see millions of prints in microseconds, the loop is the culprit.
SystemVerilog — Common Infinite Loop Patterns and Fixes
// ── Zero-time loop pattern 1: forever without timing ─────────────
// ❌ BUG: simulation hangs at T=0
initial forever begin
data = $random;  // no @, no #, no wait → infinite loop in Active region
end
 
// ✅ FIX: add timing control
initial forever begin
@(posedge clk);  // yields time — simulation advances every iteration
data = $random;
end
 
// ── Zero-time loop pattern 2: while with condition never false ────
// ❌ BUG: count never changes because no timing control
initial begin
count = 0;
while (count < 10) begin
$display("count=%0d", count);  // no timing → prints count=0 infinitely
end                                 // count never increments!
end
 
// ✅ FIX: either add timing OR increment inside loop
initial begin
count = 0;
while (count < 10) begin
@(posedge clk);
$display("count=%0d", count);
count++;   // ✅ count changes → loop eventually terminates
end
end
 
// ── Delta cycle oscillation: two always_comb blocks feed each other
// ❌ BUG: combinational feedback loop → unlimited deltas
always_comb a = b ^ input1;  // a depends on b
always_comb b = a & input2;  // b depends on a → loop!
// Simulator: b changes → a re-evaluates → b changes → a re-evaluates → ...
// VCS: "Delta limit exceeded" error after 1M iterations
 
// ✅ FIX: break the loop with a register
always_comb a = b_reg ^ input1;  // reads registered b
always_ff @(posedge clk) b_reg <= a & input2;  // register breaks the loop

⚙ Advanced Code Examples — Real Project Patterns

Example A — Parametrized Shift Register with Tap Output (for in RTL)

SystemVerilog — Parameterized Shift Register with Configurable Tap
// ── Parameterized N-stage shift register ─────────────────────────
// Used in: SDR/DDR clock domain buffering, pipeline delay lines,
// spread-spectrum clock generators, digital delay chains
module shift_reg #(
parameter int DEPTH = 8,
parameter int WIDTH = 8,
parameter int TAP   = 4    // tap output at stage N
) (
input  logic             clk, rst_n,
input  logic [WIDTH-1:0] d,
output logic [WIDTH-1:0] q,      // last stage output
output logic [WIDTH-1:0] q_tap   // tap at stage TAP
);
logic [WIDTH-1:0] pipe [DEPTH];  // array of flip-flops
 
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (int i = 0; i < DEPTH; i++)
pipe[i] <= '0;
end else begin
pipe[0] <= d;
for (int i = 1; i < DEPTH; i++)
pipe[i] <= pipe[i-1];  // each stage captures previous stage
end
end
 
assign q     = pipe[DEPTH-1];
assign q_tap = pipe[TAP-1];
 
// Synthesis: DEPTH flip-flop stages, no combinational logic between stages
// For DEPTH=8,WIDTH=8: 64 flip-flops, 8-cycle pipeline delay
endmodule

Example B — AXI Burst Write Driver (while + repeat in testbench)

SystemVerilog — AXI Write Burst Driver using while + repeat
// ── AXI4-Lite write burst task ────────────────────────────────────
task automatic axi_write_burst(
input logic [31:0] base_addr,
input logic [7:0]  burst_len,
input logic [31:0] data[]
);
// Phase 1: Write address channel handshake
awaddr  = base_addr;
awvalid = 1'b1;
awlen   = burst_len - 1;
while (!awready) @(posedge clk);  // wait for slave ready
@(posedge clk);
awvalid = 1'b0;
 
// Phase 2: Write data channel — drive all beats
for (int beat = 0; beat < burst_len; beat++) begin
wdata  = data[beat];
wstrb  = 4'hF;
wvalid = 1'b1;
wlast  = (beat == burst_len - 1);  // assert LAST on final beat
while (!wready) @(posedge clk);    // wait for slave to accept
@(posedge clk);
end
wvalid = 1'b0; wlast = 1'b0;
 
// Phase 3: Write response — wait for BRESP
bready = 1'b1;
while (!bvalid) @(posedge clk);    // wait for response
@(posedge clk);
bready = 1'b0;
assert (bresp == 2'b00) else $error("AXI write error resp: %b", bresp);
endtask

Example C — Coverage-Driven Loop (foreach on covergroup bins)

SystemVerilog — Coverage-Driven Verification using foreach
// ── Coverage-driven test: run until all opcodes covered ───────────
typedef enum logic [2:0] {ADD,SUB,AND,OR,XOR,NOT,SHL,SHR} op_t;
logic [7:0] a, b;
op_t         op;
bit          opcode_hit[8];   // track which opcodes have been tested
 
initial begin
forever begin
@(posedge clk);
a  = $urandom;
b  = $urandom;
op = op_t'($urandom_range(0, 7));
opcode_hit[op] = 1'b1;    // mark this opcode as hit
 
// Check if all opcodes have been exercised
begin
automatic bit all_hit = 1;
foreach (opcode_hit[i])
if (!opcode_hit[i]) all_hit = 0;
if (all_hit) begin
$display("All opcodes covered — stopping test");
break;   // exit forever loop (see 5.5 for break/continue)
end
end
end
end

🔬 Debugging Academy — 8 Real Loop Bugs from the Field

1forever Without Timing — Simulation Hangs at Time 0Zero-Time HangBuggy Code

Bug 1 — forever Without Timing Hangs Simulation
// ❌ BUG: stimulus driver with no timing control
initial begin
forever begin
data = $random;     // no @, no #, no wait
valid = 1;          // simulator loops here infinitely at T=0
end
end
// Symptom: simulation runs, CPU=100%, no output, no VCD, no $finish
// VCS: hangs indefinitely (default: no delta limit)
// Questa: hangs until timeout (default delta limit = 1 million)
 
// ✅ FIX: add clock-edge timing control
initial begin
forever begin
@(posedge clk);  // ✅ yields time — simulation advances
data  = $urandom_range(0, 255);
valid = 1'b1;
end
end

1Root Cause / DiagnosisRoot CauseThe forever body runs in the Active simulation region. Without a timing control, the Active region never completes — the simulator loops infinitely at T=0 without ever advancing time. The event queue is never empty, so time never advances.Simulator DifferencesVCS: hangs indefinitely by default. Add +vcs+loopdetect or +nbaopt to get a warning. Questa: default delta limit is 1M iterations — after that it aborts with "Maximum iteration limit exceeded" and shows the file/line. Xcelium: same behavior as Questa with configurable limit via -maxdelays.2for Loop with Signal Bound — Synthesis Error "Cannot Unroll"Synthesis ErrorBuggy Code

Bug 2 — for Loop Bound Depends on Runtime Signal
// ❌ BUG: n is an input port — not a compile-time constant
module bad_rotate(
input  logic [7:0] data,
input  logic [2:0] n,       // rotation amount — runtime signal
output logic [7:0] rotated
);
always_comb begin
rotated = data;
for(int i = 0; i < n; i++)   // ❌ n is a signal — cannot unroll!
rotated = {rotated[6:0], rotated[7]};
end
endmodule
// Simulation: works fine — runs n iterations at runtime
// Synthesis: ERROR "Cannot evaluate loop bound 'n' at compile time"
 
// ✅ FIX: always unroll max, select with mux
always_comb begin
logic [7:0] stages[8];
stages[0] = data;
for(int i = 1; i <= 8; i++)            // constant bound = 8 ✅
stages[i] = {stages[i-1][6:0], stages[i-1][7]};
rotated = stages[n];                       // mux selects correct rotation
end

3Loop Variable Not Localized — Outer Loop Variable CorruptedScope BugBuggy Code

Bug 3 — Shared Loop Variable Causes Nested Loop Corruption
// ❌ BUG: outer loop uses i, inner loop also uses i
integer i;   // declared at module scope — shared variable!
 
initial begin
for (i = 0; i < 4; i++) begin   // outer loop: row 0..3
for (i = 0; i < 4; i++)   // ❌ inner loop reuses i → clobbers outer!
mem[i][i] = i;          // only diagonal set (both i are same)
// After inner loop: i=4. Outer loop increments to 5 → loop ends!
// Only ONE outer iteration actually runs
end
end
 
// ✅ FIX: declare loop variables inside for (SV feature)
initial begin
for (int r = 0; r < 4; r++) begin   // ✅ r is local to this for block
for (int c = 0; c < 4; c++)     // ✅ c is local — no interference
mem[r][c] = r * 4 + c;
end
end
// Rule: ALWAYS declare loop variables inside the for() declaration in SV.
// Never use module-scope integer variables as loop counters.

4repeat(0) — Silent No-Op That Causes Test to Pass Without RunningSilent FailureBuggy Code

Bug 4 — repeat(0) Silently Skips All Stimulus
// ❌ BUG: NUM_PKTS parameter accidentally set to 0
parameter NUM_PKTS = 0;   // ← should be 100, but typo or misconfiguration
 
initial begin
@(posedge rst_n);
repeat(NUM_PKTS) begin   // repeat(0) → body NEVER runs
send_packet();
end
$display("Test complete: sent %0d packets", NUM_PKTS);
$finish;
end
// Output: "Test complete: sent 0 packets"
// Simulation completes instantly. All checks pass (vacuously).
// Coverage: 0%. Regression: "PASS" — but no test was actually run!
 
// ✅ FIX: add assertion that loop count is non-zero
initial begin
assert (NUM_PKTS > 0) else $fatal(1, "NUM_PKTS must be > 0");
@(posedge rst_n);
repeat(NUM_PKTS) send_packet();
$finish;
end

4Why This Is DangerousImpactrepeat(0) is one of the most insidious verification bugs. The simulation reports PASS, the regression suite shows green, and coverage appears clean (or zero, which might be ignored). This exact scenario has caused tape-out escapes when a parameter misconfiguration in a regression script caused all directed tests to run zero iterations while reporting success. Always assert non-zero counts before repeat loops in testbenches.5for Loop Variable Overflow — Unsigned Integer Wraps to 0 → Infinite LoopInteger OverflowBuggy Code

Bug 5 — Unsigned Loop Variable Wraps and Creates Infinite Loop
// ❌ BUG: bit-width loop variable wraps instead of terminating
initial begin
for (logic [3:0] i = 15; i >= 0; i--) begin
// ❌ PROBLEM: i is 4-bit unsigned
// When i=0 and we decrement: i wraps to 15 (0-1 = 4'b1111)
// Condition i >= 0 is ALWAYS true for unsigned!
// Loop runs forever: 15,14,13...0,15,14,13...0,15...
$display("%0d", i);
end
end
 
// ✅ FIX: use int (signed) for count-down loops
initial begin
for (int i = 15; i >= 0; i--) begin   // ✅ int is signed — i goes to -1
$display("%0d", i);              // -1 < 0 → loop terminates
end
end
// Rule: Always use signed int for loop variables. Use int, not logic, not bit.
// logic and bit are unsigned — count-down termination never works.

6foreach on Packed Array — Iterates Bit Positions, Not Elementsforeach MisuseBuggy Code

Bug 6 — foreach on Packed Array Is Illegal (Compile Error)
// ❌ BUG: foreach only works on UNPACKED dimensions
logic [7:0] packed_byte;    // packed array — 8-bit vector
logic [7:0] unpacked[4];   // unpacked array — 4 elements of 8-bit
 
// foreach on packed: ILLEGAL — compile error
foreach (packed_byte[i])   // ❌ ERROR: packed dimensions not allowed
$display("%b", packed_byte[i]);
 
// foreach on unpacked: LEGAL — iterates 4 elements
foreach (unpacked[i])      // ✅ iterates i=0,1,2,3
$display("%h", unpacked[i]);
 
// ✅ For packed bits: use for loop instead
for (int i = 0; i < 8; i++)
$display("%b", packed_byte[i]);
 
// ✅ For unpacked elements accessing packed contents:
foreach (unpacked[i])      // foreach for the unpacked dimension
for (int b = 0; b < 8; b++)  // for loop for packed bits
parity ^= unpacked[i][b];

7while in always_comb Without Constant Bound — Synthesis FailureSynthesis ErrorBuggy Code

Bug 7 — while in always_comb with Dynamic Condition
// ❌ BUG: while condition depends on input signal — synthesis cannot unroll
always_comb begin
int tmp = data_in;
result = '0;
while (tmp != 0) begin       // ❌ termination depends on data_in (runtime)
result = result + tmp[0];  // synthesis: cannot determine iterations
tmp = tmp >> 1;
end
end
// Synthesis ERROR: "Cannot evaluate loop bound at elaboration time"
// Note: simulation works perfectly — for testbench use this is fine!
 
// ✅ FIX: convert to for loop with max bound
always_comb begin
int tmp = data_in;
result = '0;
for(int i = 0; i < 32; i++) begin  // ✅ constant 32 — always unrollable
result += tmp[0];
tmp = tmp >> 1;
end
end

8for Loop Inside always_comb Re-evaluating Each Iteration — Delta ExplosionDelta CycleBuggy Code

Bug 8 — Loop Writes Signal That Triggers always_comb Re-entry
// ❌ BUG: for loop writes 'accum' which is read by another always_comb
logic [7:0] accum;
 
always_comb begin    // Block A: computes partial sums
accum = '0;
for (int i = 0; i < 8; i++)
accum += data[i];
end
 
always_comb begin    // Block B: uses accum
result = accum * scale;
end
 
// This is actually CORRECT — accum changes once per Block A evaluation,
// triggers Block B once (delta 1), Block B output settles. Fine.
 
// ❌ REAL BUG: if Block A reads AND writes accum (combinational feedback)
always_comb begin
for (int i = 0; i < 8; i++) begin
if (data[i]) accum += data[i];   // reads accum in condition AND writes it
// accum changing triggers re-eval of this same block → delta oscillation
end
end
 
// ✅ FIX: use a local automatic variable inside always_comb
always_comb begin
automatic logic [7:0] tmp = '0;   // local — no re-trigger
for (int i = 0; i < 8; i++)
if (data[i]) tmp += data[i];
accum = tmp;                          // write once to the actual signal
end

💡 Senior Verification Engineer Tip: Always Declare Loop Variables with int, Not integer or bit

In SystemVerilog, int is a signed 32-bit 2-state variable — it is the correct type for loop counters. integer is the Verilog legacy 32-bit 4-state (can hold X/Z) — functionally equivalent but slower in simulation due to X-propagation overhead. bit and logic are unsigned and cause wrap-around bugs in count-down loops. The modern rule: declare loop variables inside for declarations as int — they are automatically scoped to the loop, cannot pollute outer scope, and perform optimally.

🎯 Interview Q&A — Loops in RTL and Verification

Beginner Level

BeginnerWhich SystemVerilog loops are synthesizable and which are simulation-only?Synthesizable (when bounds are compile-time constants): for, repeat(N), foreach on static arrays.Simulation-only: forever (infinite — cannot unroll), while with dynamic condition, do-while with dynamic condition. The key rule: synthesis requires knowing the exact iteration count at compile/elaboration time. forever is infinite by definition — never synthesizable. while and do-while may synthesize if the tool can prove the bound is static (e.g., while(count < 8) where count increments by 1 each iteration), but this is tool-dependent and risky — prefer for with an explicit constant bound in RTL.BeginnerWhat happens if you write a forever loop without a timing control?The simulator enters an infinite zero-time loop — the loop body executes millions of times per second in simulation time zero, without ever advancing the simulation clock. The simulation binary consumes 100% CPU and never terminates. No VCD output is produced, no $display messages appear. This is one of the most common testbench bugs. The fix is to add at least one timing control inside the forever body: @(posedge clk), #10, or wait(condition). Each timing control yields the simulation scheduler, allowing time to advance.BeginnerWhat is the difference between for and foreach? When should you use each?foreach: designed for array iteration. The loop variable and bounds are automatically derived from the array declaration — you don't specify them. Works on static unpacked arrays, dynamic arrays, and queues. Best choice when you need to visit every element of an array without manual index management.for: general-purpose. You explicitly control the initial value, condition, and increment. Required when: you need non-zero starting index, non-unit step, counting down, or the index is used for calculations beyond array access. In RTL: use for with a constant bound for synthesizable parallel hardware. Use foreach only on statically-sized arrays — it synthesizes identically to for in that case.

Intermediate Level

IntermediateExplain what synthesis does when it encounters a for loop. What is the hardware result?Synthesis unrolls a for loop — it literally executes the loop at elaboration time, producing N copies of the loop body as N separate concurrent hardware statements. The loop variable is a compile-time integer that the tool substitutes into each copy. For example, for(int i=0; i<8; i++) parity ^= data[i]; unrolls to 8 XOR operations that synthesis maps to an 8-input XOR gate tree (3 logic levels). There is no "loop" in the synthesized netlist — it becomes flat parallel logic. Critical implication: a loop with 1024 iterations creates 1024 copies of the body in the netlist. A complex body inside a large loop produces a very large (and slow) netlist. Always think about the physical hardware a loop implies before writing it in RTL.IntermediateYou have a loop that works in simulation but fails synthesis with "cannot evaluate loop bound." What are the possible root causes?The synthesis tool cannot determine the exact number of iterations at elaboration time. Root causes: 1. Loop bound depends on an input port or signal:for(int i=0; i<n; i++) where n is an input — runtime value, unknowable at compile time. 2. while with dynamic condition:while(data != 0) where data is a signal — the number of iterations depends on the signal value. 3. foreach on a dynamic array: Dynamic arrays have variable size — the size is unknown at elaboration. Fix: replace with for using a constant bound (parameter or localparam), then use an if inside the loop to gate the body for the dynamic condition.IntermediateWhat is the correct way to generate a clock in a testbench? Compare always #5 and initial + forever.always #5 clk = ~clk; starts with an undefined (X) clock value at T=0, then toggles at T=5, T=10, etc. The clock is X for the first 5ns. In strict setup-hold checking, this can cause spurious timing violations in the very first clock cycle when flip-flops sample X on their clock input.initial begin clk=0; forever #5 clk=~clk; end starts with clk=0 at T=0, toggles at T=5, T=10, etc. The clock is always a valid 0 or 1 value.Production testbenches always use the initial + forever form. The assignment clk=0 at T=0 ensures the clock starts in a known state. This is important for waveform viewers, protocol monitors, and assertion monitors that start sampling immediately at T=0.

Debugging / Advanced Level

AdvancedA regression suite shows all tests PASS but functional coverage is zero. You suspect a loop-related issue. What do you investigate?Zero functional coverage with all tests passing is the signature of stimulus not actually being driven — the tests are completing vacuously. First investigation: look for repeat(N), for, or while loops where N might be zero or the condition might be immediately false. Specifically: 1. Check all repeat(N) calls where N is a parameter — has any parameter been inadvertently set to 0 via command-line override? 2. Check while(condition) — is the condition false from the start (e.g., the count goal was already met)? 3. Check for loops where the initial value exceeds the bound. Fix: add assert(N > 0) or $display at the start of each major stimulus loop to verify it actually runs. Use plusarg-driven checks in the testbench to detect zero-iteration loops as errors.AdvancedExplain why a count-down for loop using logic [3:0] as the loop variable creates an infinite loop in simulation.A 4-bit logic [3:0] variable is unsigned. When you write for(logic [3:0] i = 7; i >= 0; i--), the intended terminal condition is i < 0 (negative). But unsigned integers cannot be negative — when i decrements from 0, it wraps to 4'b1111 = 15 (unsigned overflow). The condition i >= 0 is always true for any unsigned value — the loop can never terminate.Fix: always use int (signed 32-bit) for loop variables.int can represent negative values, so i = 0; i-- gives i = -1, and i >= 0 is false, terminating the loop correctly. In synthesis, the signed-ness of int doesn't change the unrolled hardware — it only matters for the compile-time loop termination check.