Blocking vs Non-Blocking Assignments

= vs <=, the NBA scheduler, the golden rule, race conditions.

Module 5 · Page 5.6

The Two Assignment Operators

= Blocking Assignment — Executes in order, immediately. The next statement in the same block sees the updated value. Think of it like a regular C assignment.
<= Non-Blocking Assignment (NBA) — Evaluates the RHS now but schedules the LHS update for the end of the current time step. All NBA RHS expressions in a time step are sampled before any LHS is written.

SystemVerilog — blocking vs. non-blocking behaviour

// ── Blocking (=): sequential execution ──────────────────────────
always_comb begin
a = b + 1;     // a is immediately b+1
c = a * 2;     // c uses the NEW value of a → c = (b+1)*2
end
 
// ── Non-blocking (<=): all RHS captured first ───────────────────
always_ff @(posedge clk) begin
a <= b + 1;    // schedules a = b+1 for end of time step
c <= a * 2;    // uses OLD value of a (before posedge) → c = a_old*2
end
// Both a and c update simultaneously at end of time step.
// This is exactly how flip-flops work in real hardware.
 
// ── The classic swap: only non-blocking works without a temp ────
always_ff @(posedge clk) begin
a <= b;         // schedule: a ← old b
b <= a;         // schedule: b ← old a
end
// Result: a and b are swapped. No temp variable needed.
 
// ── With blocking = the swap fails ──────────────────────────────
always_ff @(posedge clk) begin
a = b;          // a is immediately b — old a is LOST
b = a;          // b = b (the same value!) — swap broken
end

🧠 The Most Important Mental Model: = Sees New Values, <= Sees Old Values

When reading code that uses <=, ask yourself: "What did this signal look like before this time step started?" That is what the RHS captures. With =, ask: "What did the previous statement in this block just set this to?" That is what you see. This distinction drives everything: why pipelines need <=, why combinational chains need =, and why mixing them in the same block is non-deterministic.

The NBA Scheduler — How It Works

SystemVerilog's event scheduler divides each time step into two phases. Understanding this is the key to understanding why <= behaves the way it does. Figure 1 — The Two Phases of a Simulator Time StepPhase 1 — Active Region• Blocking (=) assignments execute• RHS of <= assignments are evaluated (RHS values captured, LHS not written yet)• $display, $monitor events fire• Continuous assignments propagatethenPhase 2 — NBA Region• LHS of <= assignments update (all at once, using values from Phase 1)• Signals updated simultaneously• No race between flip-flop outputs regardless of statement orderThe NBA region is why <= is the right tool for flip-flops: all outputs update together, exactly like real hardware. Figure 1 — Each simulator time step has two phases. Blocking assignments happen in Phase 1. Non-blocking LHS updates all happen together in Phase 2, after all RHS values have been sampled.

🚀 RTL Design Insight: Why NBA Makes Multi-Block Designs Deterministic

Without the NBA scheduler, two always_ff blocks at the same posedge clk would race to read and write the same signals — the result would depend on which block the simulator processes first (non-deterministic). The NBA scheduler eliminates this by separating READ and WRITE: all RHS expressions are evaluated in the Active region (everyone reads the pre-clock-edge values simultaneously), then all LHS assignments happen in the NBA region (everyone writes simultaneously). Order of always_ff block processing becomes irrelevant — the result is always the same. This is why <= is mandatory in always_ff.

The Golden Rule

always_comb → use = — Combinational logic has no memory. Statements execute in order and you want each result to be available immediately to the next statement. Use =.
always_ff → use <= — Flip-flops sample their input and hold the value until the next clock edge. Non-blocking exactly models this behaviour — all outputs update simultaneously. Use <=.

SystemVerilog — the golden rule in practice

// ── CORRECT: = in always_comb ───────────────────────────────────
always_comb begin
sum    = a + b;         // purely combinational
carry  = sum[8];        // uses updated sum immediately
result = sum[7:0];
end
 
// ── CORRECT: <= in always_ff ─────────────────────────────────────
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
q     <= '0;
count <= '0;
end else begin
q     <= d;
count <= count + 1;
end
end
 
// ── WRONG: = in always_ff ────────────────────────────────────────
always_ff @(posedge clk) begin
q = d;      // ⚠ blocking in clocked block — race condition risk
end
// Simulates correctly when only one always_ff drives q.
// Breaks silently if two always_ff blocks both assign q.
 
// ── WRONG: <= in always_comb ─────────────────────────────────────
always_comb begin
out <= a | b;  // ⚠ NBA in comb: out updates at end of delta, not now
// subsequent uses of out see the OLD value
end

🏗 Synthesis Concern: = in always_ff Creates Tool-Dependent RTL

When synthesis sees = inside always_ff, different tools behave differently. Synopsys DC infers a flip-flop (ignoring the blocking semantics and treating it as <=). Cadence Genus may generate the same. But now your simulation model (which uses blocking semantics — immediate update) disagrees with your synthesized netlist (which always uses flip-flop capture semantics). This is the simulation-synthesis mismatch — the netlist does the right thing, but your verification environment validated the wrong behavior. The discrepancy is only revealed during gate-level simulation or integration testing.

Race Conditions — When = in always_ff Goes Wrong

A race condition between two always_ff blocks occurs when both blocks assign the same signal using blocking assignment. Which block executes first is determined by the simulator's event scheduler — not by the code order, not by any rule you can rely on. The result is simulator-dependent and will not match synthesised hardware.

SystemVerilog — race condition caused by blocking in always_ff

// ── Classic race: shift register with blocking ──────────────────
// WRONG — outcome depends on which always_ff the simulator runs first
always_ff @(posedge clk) a = d_in;   // ⚠ blocking
always_ff @(posedge clk) b = a;      // ⚠ which 'a'? old or new?
 
// If 'a' block runs first: b captures the NEW a (= d_in) — acts like wire
// If 'b' block runs first: b captures the OLD a — acts like flip-flop
// Synthesis always infers a flip-flop. Simulation may not match.
 
// ── CORRECT: non-blocking eliminates the race ───────────────────
always_ff @(posedge clk) a <= d_in;   // schedules NBA
always_ff @(posedge clk) b <= a;      // RHS sampled before NBA region
 
// Both RHS values captured in Phase 1 (a_old, b_old).
// Both LHS updated in Phase 2: a = d_in, b = a_old.
// Deterministic regardless of block evaluation order. Matches hardware.

🔍 Debugging Insight: The Two-Simulator Test for Blocking Races

The definitive way to find blocking assignment races in always_ff is to simulate on two different tools — VCS and Questa (or Xcelium). If the results differ between tools, you have a race condition. VCS and Questa have different event scheduling implementations — they process always_ff blocks in different orders within the same time step. Non-blocking (<=) code produces identical results regardless of order. Blocking (=) code may produce different results depending on which block the tool processes first. This is a build-your-CI-pipeline-now insight: always run regressions on two simulators.

Common Mistakes at a Glance

SystemVerilog — mistakes and their correct forms

// ── Mistake 1: mixing = and <= in the same always_ff ────────────
always_ff @(posedge clk) begin
temp = a + b;    // ⚠ blocking — tool-dependent, lint error
out  <= temp;    // ⚠ uses immediately-updated temp — breaks NBA model
end
 
// FIX: split into comb + ff, or use NBA throughout
always_comb temp = a + b;           // combinational intermediate
always_ff @(posedge clk) out <= temp;
 
// ── Mistake 2: NBA in always_comb for intermediate ──────────────
always_comb begin
mid <= a ^ b;   // ⚠ NBA in comb: mid hasn't updated when next line runs
out <= mid | c;  // ⚠ reads OLD mid
end
 
// FIX: use blocking in always_comb
always_comb begin
mid = a ^ b;    // mid updates immediately
out = mid | c;  // uses NEW mid
end
 
// ── Mistake 3: using = for reset in always_ff ───────────────────
always_ff @(posedge clk) begin
if (!rst_n) q = '0;   // ⚠ blocking reset — inconsistent with NBA data path
else        q <= d;
end
 
// FIX: use <= for reset too
always_ff @(posedge clk) begin
if (!rst_n) q <= '0;   // consistent NBA throughout
else        q <= d;
end

Quick Reference

Operator	Name	When LHS updates	Use in	Models
`=`	Blocking	Immediately, in statement order	`always_comb`, functions, tasks	Combinational logic, sequential calculations
`<=`	Non-blocking (NBA)	End of current time step (NBA region)	`always_ff`, `always_latch`	Flip-flops, registers, state elements

🧠 Delta Cycle Deep Dive — The Full Event Scheduler

The simplified "Phase 1 / Phase 2" model is useful but incomplete. The IEEE 1800 simulator has nine distinct event regions within each simulation time step. Understanding all of them explains why $display sometimes shows wrong values, why assertions sample differently from RTL, and why testbench code in program blocks sees different values than RTL.

Region	What Executes	Assignment Type	Relevant To
Active	Blocking (=) assignments, continuous assigns, $display, non-blocking RHS evaluation	= executes; <= RHS captured	RTL (always_comb, always_ff), combinational logic
Inactive	#0 delay events. Avoid in RTL — used for special ordering tricks.	Rare	Legacy workarounds
NBA	Non-blocking LHS updates. All <= assignments commit simultaneously.	<= LHS written	always_ff, all flip-flop outputs
Observed	SVA concurrent assertions sample stable values after NBA.	Read-only	assert property(...)
Reactive	Program blocks, clocking block driven outputs.	Testbench = or <=	UVM drivers via clocking block
Postponed	$strobe and $monitor. Always shows final settled values for the time step.	Read-only	$strobe — correct post-NBA display

SystemVerilog — $display vs $strobe: Active vs Postponed Region

// ── Why $display shows wrong value after posedge ─────────────────
always_ff @(posedge clk) q <= d;  // NBA: q updates in NBA region
 
initial begin
forever begin
@(posedge clk);
// $display fires in ACTIVE region — BEFORE NBA updates q
$display("$display: q=%h (may be old value)", q);   // ← sees q_old
 
// $strobe fires in POSTPONED region — AFTER NBA updates q
$strobe(" $strobe: q=%h (final correct value)", q);  // ← sees q_new
end
end
 
// ── Expected Output (d=0x55, q was 0x00) ─────────────────────────
// $display: q=00  ← Active region: NBA not yet applied
// $strobe:  q=55  ← Postponed region: q updated by NBA
 
// ── Rule: Always use $strobe when printing flip-flop outputs.
// Use $display only for combinational values or testbench signals
// that are driven with blocking (=) assignments.

Delta Cycle Timeline — posedge clk at T=10nsT=10nsRegionActive Δ0 │ NBA Δ0 │ Active Δ1 │ PostponedEventalways_ff evals RHS │ q ← d_old │ comb sees q↑ │ $strobeq= q_old (unchanged) │= d_old (new) │ visible (d_old) │ finalcomb_out= f(q_old) │ unchanged │= f(d_old) │ final$displayfires here (sees q_old) ─┘ ← may be WRONG$strobe fires here ✅────────────────────────────────────────────────────────────────────── If comb_out changes in Δ1, it triggers another Active cycle (Δ2). This process repeats until no new events — simulation time then advances.

📊 Waveform Analysis — Reading NBA Behavior

When you understand what the NBA scheduler does, waveforms become instantly readable. The key signature of correct non-blocking behavior is: all flip-flop outputs change at the same clock edge, simultaneously, using pre-edge values. Any deviation from this pattern indicates a problem. Waveform — Non-Blocking (Correct): 2-stage Pipelineclk_‾‾‾‾‾d A B C D Estage1 X →A → →B → →C → →D ← 1-cycle latency (= d from prev cycle)stage2 X X →A → →B → →C ← 2-cycle latency (= stage1 from prev)Both update simultaneously at posedge using PRE-EDGE values. Correct pipeline.Codealways_ff @(posedge clk) begin stage1 <= d; stage2 <= stage1; endWaveform — Blocking (WRONG): Pipeline Collapses to 1 Stageclk_‾‾‾‾‾d A B C D Estage1 X →A → →B → →C → →D ← 1-cycle latency (same as correct)stage2 X →A → →B → →C → →D ← WRONG: same as stage1 (NOT 2-cycle)stage2 = stage1 immediately because = (blocking) updates stage1 first. stage2 reads the NEW stage1 — they both hold d from the current clock cycle. Expected 2-stage pipeline is gone. Two registers collapsed to one.Codealways_ff @(posedge clk) begin stage1 = d; stage2 = stage1; end

🔍 How to Spot Blocking Race in a Waveform

In the waveform viewer, look for two registered signals that should have a 1-cycle latency between them but appear to track each other identically on every clock edge. If stage2 and stage1 change together at every posedge instead of stage2 being one cycle behind stage1, you have a blocking assignment collapse. The fix: replace = with <= inside always_ff. The waveform will immediately show the correct 1-cycle offset.

⚙ Pipeline Modeling — Why <= Is Mandatory for Registers

Pipeline modeling is where the difference between = and <= has the most profound practical impact. Every N-stage pipeline in a design depends on non-blocking assignments to correctly model the 1-cycle latency between stages.

SystemVerilog — 4-Stage Pipeline: Non-Blocking vs Blocking

// ── CORRECT: 4-stage pipeline using non-blocking ─────────────────
module pipeline4_correct (
input  logic       clk, rst_n,
input  logic [7:0] d,
output logic [7:0] q4
);
logic [7:0] s1, s2, s3;
 
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
s1 <= '0; s2 <= '0; s3 <= '0; q4 <= '0;
end else begin
s1 <= d;   // RHS: d_old
s2 <= s1;  // RHS: s1_old (= d from 1 cycle ago)
s3 <= s2;  // RHS: s2_old (= d from 2 cycles ago)
q4 <= s3;  // RHS: s3_old (= d from 3 cycles ago)
end                // All LHS update simultaneously in NBA → 4-cycle latency
end
endmodule
 
// ── WRONG: 4-stage pipeline using blocking — all collapse to 1 stage
module pipeline4_broken (
input  logic       clk, rst_n,
input  logic [7:0] d,
output logic [7:0] q4
);
logic [7:0] s1, s2, s3;
 
always_ff @(posedge clk) begin
s1 = d;    // s1 immediately = d (new value)
s2 = s1;   // s2 immediately = d (new s1) — 1 cycle latency GONE
s3 = s2;   // s3 immediately = d (new s2) — 2 cycle latency GONE
q4 = s3;   // q4 immediately = d (new s3) — 3 cycle latency GONE
end            // Synthesis: q4 = d with 1 clock latency only (synthesis is right)
// Simulation: q4 = d with 1 clock latency (looks the same!)
// But: the INTERMEDIATE stages s1, s2, s3 are wrong in simulation
endmodule
 
// ── Test: verify the 4-cycle latency ─────────────────────────────
module tb;
logic clk=0, rst_n;
logic [7:0] d, q4_correct, q4_broken;
 
always #5 clk = ~clk;
 
pipeline4_correct u_c(.clk,.rst_n,.d,.q4(q4_correct));
pipeline4_broken  u_b(.clk,.rst_n,.d,.q4(q4_broken));
 
initial begin
rst_n = 0; d = 8'h00;
repeat(3) @(posedge clk); rst_n = 1;
@(posedge clk); d = 8'hAA;   // push 0xAA into pipeline
repeat(6) @(posedge clk);
$strobe("correct: %h  broken: %h", q4_correct, q4_broken);
// correct: appears at q4 after 4 cycles
// broken: appears at q4 after 1 cycle
$finish;
end
endmodule

🏗 Synthesis Impact — What Each Form Generates in Hardware

Code Form	Simulation Behavior	Synthesis Result	Match?	Risk Level
`=` in `always_comb`	Sequential combinational evaluation: next line sees new value	Combinational gates — correct	✅ Match	None — this is correct
`<=` in `always_ff`	All RHS captured pre-clock, LHS updated in NBA	D flip-flops — correct	✅ Match	None — this is correct
`=` in `always_ff`	Sequential: next statement sees new value (race-prone)	Most tools infer D flip-flop (correct hardware)	❌ Mismatch risk	HIGH — sim/synth mismatch possible
`<=` in `always_comb`	Intermediate value NOT updated until next delta — stale reads	Tool warning; comb logic with delayed update semantics	❌ Mismatch	HIGH — always wrong, tool warns
Mixed `=` and `<=` in same `always_ff`	Non-deterministic — IEEE says this is undefined behavior	Tool-dependent — may or may not warn	❌ Non-deterministic	CRITICAL — different simulators give different answers

⚠ Common Industry Mistake: "It Works in Simulation, Must Be Fine"

The most dangerous class of blocking/non-blocking bugs is the one where simulation appears correct. This happens because the simulator happens to process the always_ff blocks in a favorable order — the order that produces the expected result. Change simulator, change version, add more logic, and the ordering changes — now different results. Silicon always infers flip-flops (captures pre-edge values) regardless of whether you used = or <=. The only guaranteed-correct code is: always use <= in always_ff, no exceptions.

🔬 Testbench Usage — Which Assignment to Use Where

The blocking vs non-blocking choice in testbenches is just as important as in RTL — but the rules are different. Incorrect assignment types in testbenches produce checks that fire with stale values or miss functional bugs entirely.

SystemVerilog — Testbench: When to Use = vs <= and Why

// ── Rule 1: Use = in initial blocks for immediate signal driving ──
initial begin
rst_n = 0;   // ✅ drives rst_n=0 IMMEDIATELY at T=0
data  = 8'hAA;
@(posedge clk);
rst_n = 1;   // ✅ drives rst_n=1 immediately after posedge
end
 
// ── Wrong: <= in initial block — signal changes in NBA region ────
initial begin
data <= 8'hAA;   // ❌ NBA: data doesn't change until NBA region
@(posedge clk);   // @(posedge clk) might fire BEFORE data updates!
check(data);       // might check old data value
end
 
// ── Rule 2: Use $strobe to print flip-flop outputs ───────────────
initial begin
forever begin
@(posedge clk);
$display("q=%h", q);   // ❌ Active region — q still old value!
$strobe( "q=%h", q);   // ✅ Postponed region — q has new value
end
end
 
// ── Rule 3: For clocking-block driven signals — use <= via cb ────
clocking drv_cb @(posedge clk);
output #1 valid, data;   // output skew: drives 1ns after posedge
endclocking
 
initial begin
@(drv_cb);
drv_cb.data <= 8'hBB;   // ✅ clocking block assignment: use <=
drv_cb.valid <= 1'b1;    // drives 1ns after next posedge
end
 
// ── Rule 4: Checking outputs — sample in Reactive region ─────────
clocking mon_cb @(posedge clk);
input #-1 data_out, valid_out;  // input skew: samples 1ns before posedge
endclocking
// Clocking block input samples are BEFORE the clock edge — setup time
// This is why UVM monitors use clocking blocks for accurate timing

💡 Senior Verification Engineer Tip: Use Clocking Blocks for All DUT Interface Interactions

The cleanest way to avoid blocking/non-blocking confusion in testbenches is to use clocking blocks for all DUT interface interactions. Clocking blocks automatically handle the timing: <= for driven outputs (via output skew), and input sampling before the clock edge (via input skew). The driver writes cb.signal <= value and the monitor reads cb.signal — the tool handles the scheduling. This is why UVM's clocking_block-based drivers never have blocking/non-blocking issues.

📋 RTL Coding Style Guide — The Complete Rule Set

These are not preferences — they are the rules enforced by RTL sign-off lint checks at every major semiconductor company. Breaking any of these rules is a blocker that prevents RTL from proceeding to synthesis.

Context	Use	Never Use	Reason	Lint Rule
`always_comb`	`=` (blocking)	`<=` (NBA)	Combinational logic needs immediate value propagation	STARC-2.1.4.1
`always_ff`	`<=` (NBA)	`=` (blocking)	Registers model simultaneous capture; blocking is race-prone	STARC-2.1.4.2
`always_latch`	`=` (blocking)	`<=` (NBA)	Latch is combinational-style: immediate update when transparent	Style guide
Functions	`=` (blocking)	`<=` (NBA)	Functions cannot have NBA semantics — blocking only	Language rule
Tasks (synthesis)	`=` for local; output via calling block	Mixed freely	Same rules as the calling block apply to synthesizable tasks	Tool-specific
Testbench `initial`	`=` for immediate; `<=` via clocking block	`<=` for direct signal driving	Direct NBA in initial block delays signal until NBA region	Methodology

⚡ Advanced Race Condition Analysis

SystemVerilog — Three Flavors of Race Condition

// ── Race Type 1: Two always_ff blocks writing same signal with = ──
always_ff @(posedge clk) shared = a;   // block 1
always_ff @(posedge clk) result = shared; // block 2 — sees new or old shared?
// VCS runs block 1 first: result = a (looks correct)
// Questa runs block 2 first: result = shared_old (different answer!)
// Synthesis: two flip-flops — result always gets shared_old (correct HW)
// ✅ Fix: use <= in both blocks
 
// ── Race Type 2: Read-after-Write in same always_ff block with = ─
always_ff @(posedge clk) begin
stage1 = d;       // writes stage1 = d (new value)
stage2 = stage1;  // reads NEW stage1 → stage2 = d (wrong — should be old)
end
// This is NOT a race between processes — it's sequential in one block.
// It's deterministic but wrong: both get d in same cycle.
// ✅ Fix: use <= for both
 
// ── Race Type 3: = and <= mixed in same block (non-deterministic) ─
always_ff @(posedge clk) begin
tmp    =  a + b;    // blocking: tmp = a+b immediately (Active)
result <= tmp;       // NBA: which tmp? pre-blocking (Active) or post?
end
// IEEE says: the value of tmp seen by <= is IMPLEMENTATION-DEFINED.
// VCS may give tmp_new, Questa may give tmp_old. Both are "correct" per spec.
// ✅ Fix: use automatic local variable for intermediate, then <=
always_ff @(posedge clk) begin
automatic logic [8:0] tmp = a + b;  // local var: no net/reg — no NBA
result <= tmp[7:0];                   // ✅ result gets tmp from this cycle
end
 
// ── Verification: detect blocking races with assertions ───────────
always_ff @(posedge clk) begin
// XMR check: assert that result matches EXPECTED pre-clock value
assert (result === $past(shared)) else
$error("result=%h expected=%h — blocking race!", result, $past(shared));
end

🔬 Debugging Academy — 8 Blocking/Non-Blocking Bugs from the Field

1Blocking in always_ff — 2-Stage Pipeline Collapses to 1 StagePipeline CollapseBuggy Code

Bug 1 — Blocking = Collapses Pipeline Register Stages

// ❌ BUG: intended 2-stage pipeline with 2-cycle latency
always_ff @(posedge clk) begin
stage1 = data_in;   // stage1 immediately = data_in (new)
stage2 = stage1;    // stage2 immediately = data_in (new stage1!)
end
// Result: stage2 = data_in with 1-cycle latency, NOT 2-cycle
// Simulation: both stages have identical value every cycle
// Synthesis: correctly infers 2 FFs, data arrives at stage2 1 cycle late
// Impact: filter/DSP algorithm sees wrong delayed data → incorrect output
 
// ✅ FIX: use non-blocking — both stages capture pre-edge values
always_ff @(posedge clk) begin
stage1 <= data_in;   // RHS: data_in_old (pre-edge)
stage2 <= stage1;    // RHS: stage1_old (pre-edge) → 2-cycle latency ✅
end

1Root Cause / Waveform / FixWaveform Symptomstage1 and stage2 show identical waveforms — they change simultaneously on every posedge. The expected 1-cycle offset between them is missing. This is the definitive visual signature of a blocking-in-pipeline bug.Root CauseBlocking assignment (=) updates stage1 immediately. The next line then reads the NEW stage1, not the pre-clock-edge value. Both assignments complete within the same Active region, eliminating the intended 1-cycle delay between stages.Real ImpactThis bug caused a reported issue in a DSP filter implementation where the FIR tap coefficients were applied at the wrong time step, producing a completely incorrect frequency response in silicon. The RTL simulation passed (because the blocking collapse still produced output, just wrong output), but the gate-level simulation revealed the timing discrepancy.2Non-Blocking in always_comb — Intermediate Value Is StaleStale Comb ValueBuggy Code

Bug 2 — NBA <= in always_comb Reads Stale Intermediate Value

// ❌ BUG: NBA in always_comb — mid is not updated when 'out' is assigned
always_comb begin
mid <= a ^ b;     // schedules mid update for NBA region
out <= mid | c;   // ❌ uses OLD mid (before this always_comb ran)
// out = (mid_old | c) — wrong!
end
// Simulation: out is 1 delta cycle behind its correct value
// The always_comb re-triggers (NBA updates mid) → out eventually correct
// But in between there's a glitch where out has wrong value
// Tool warning: "non-blocking assignment in always_comb" — treat as error
 
// ✅ FIX: use blocking = for all intermediate values in always_comb
always_comb begin
mid = a ^ b;     // mid updated immediately
out = mid | c;   // uses NEW mid — correct single-cycle behavior
end

3Mixed = and <= in Same always_ff — Non-Deterministic ResultNon-DeterministicBuggy Code

Bug 3 — Mixed = and <= in always_ff: IEEE-Undefined Behavior

// ❌ BUG: mixing = and <= in same always_ff
always_ff @(posedge clk) begin
tmp    =  a + b;     // blocking: tmp updates immediately (Active region)
result <= tmp;        // NBA: which tmp? IEEE says IMPLEMENTATION-DEFINED
status <= (tmp[8]);   // IEEE says IMPLEMENTATION-DEFINED
end
// VCS interpretation:  result = new tmp (a+b from this cycle)
// Questa interpretation: result = old tmp (0 on first cycle)
// Both are "correct" per IEEE 1800. Your code is ambiguous.
// Will cause regressions to differ between tool vendors.
 
// ✅ FIX: Use local automatic variable for intermediate, then <=
always_ff @(posedge clk) begin
automatic logic [8:0] tmp_local = a + b;  // local — no NBA semantics
result <= tmp_local[7:0];                   // ✅ deterministic
status <= tmp_local[8];                    // ✅ deterministic
end

4Blocking Shift Register Acts as Wire — Both FFs Hold Same ValueRace / Wire BugBuggy Code

Bug 4 — Two always_ff with Blocking = on Same Signal

// ❌ BUG: shift register split across two always_ff blocks
always_ff @(posedge clk) a = d_in;  // block 1: a = d_in (blocking)
always_ff @(posedge clk) b = a;    // block 2: b = a (but which a?)
 
// Scenario A: Simulator runs block 1 first
// a gets d_in (new). Then block 2: b = a = d_in (new). Result: b = d_in
// This makes b act like a wire to d_in, not a 1-cycle delayed register.
 
// Scenario B: Simulator runs block 2 first
// b = a (old). Then block 1: a = d_in. Result: b = a_old (correct FF behavior)
 
// Synthesis: always scenario B (b = a_old — two FFs in series)
// Simulation: depends on simulator. This is the classic sim/synth mismatch.
 
// ✅ FIX: use non-blocking — deterministic, matches synthesis
always_ff @(posedge clk) a <= d_in;  // RHS: d_in_old captured
always_ff @(posedge clk) b <= a;    // RHS: a_old captured. NBA updates: a=d_in, b=a_old

4This Is the Classic "Passes in VCS, Fails in Questa" BugReal ImpactThis bug passes regression on the primary simulator (whichever happened to run block 2 first) but is immediately caught when the second simulator (which runs block 1 first) is added to the regression. This is the exact signature the two-simulator test is designed to catch. Running two simulators in CI is the definitive detector for this class of bug.5Mixed Reset (=) and Data (<=) in Same always_ff — Tool WarningInconsistent BlockBuggy Code

Bug 5 — Blocking = for Reset, NBA <= for Data: Inconsistent Block

// ❌ BUG: = for reset path, <= for data path — inconsistent
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) q =  '0;   // ❌ blocking reset
else        q <= d;    // NBA data
end
// Lint: "Mixed blocking and non-blocking assignments in always_ff"
// IEEE: behavior when = and <= are mixed is implementation-defined
// VCS may simulate correctly; Questa may show glitches during reset
 
// ✅ FIX: use <= consistently for BOTH reset and data paths
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) q <= '0;   // ✅ consistent NBA
else        q <= d;    // ✅ consistent NBA
end
// Rule: every assignment in an always_ff block must use <=. Always.

6$display Shows Stale FF Value — Wrong Debug PrintDebug Print BugBuggy Code

Bug 6 — $display Fires Before NBA, Shows Wrong Flip-Flop Value

// ❌ BUG: $display fires in Active region — before NBA updates FF output
always_ff @(posedge clk) q <= d;
 
initial begin
d = 8'hAA; #10;  // d=0xAA, posedge at T=5
@(posedge clk);
$display("q = %h", q);  // ❌ prints 0x00 (old value!) — Active region
$strobe( "q = %h", q);  // ✅ prints 0xAA (new value)  — Postponed region
end
 
// ── Common engineer mistake that leads to wrong conclusions ───────
// Engineer sees $display showing 0x00 instead of 0xAA
// Thinks: "FF not working, d not being captured"
// Spends hours looking for a sensitivity list bug
// Reality: FF works fine, $display just fires before NBA
 
// ✅ RULE: Use $strobe for all debug prints involving FF outputs
// Use $display only for combinational signals driven with =

7NBA in Testbench Initial Block — Signal Not Ready When CheckedTestbench TimingBuggy Code

Bug 7 — NBA <= in Initial Block Delays Signal Until After Event

// ❌ BUG: <= in initial block — data not ready when @(posedge clk) fires
initial begin
data  <= 8'hAA;   // NBA: scheduled for NBA region
valid <= 1'b1;    // NBA: scheduled for NBA region
@(posedge clk);   // fires in Active region
// ← But NBAs haven't committed yet!
// DUT samples data and valid BEFORE they update
// DUT sees old values: data=0x00, valid=0
// Transaction appears to not be driven
end
 
// ✅ FIX: use = in initial blocks for immediate signal driving
initial begin
data  = 8'hAA;   // ✅ immediate — data=0xAA right now
valid = 1'b1;    // ✅ immediate
@(posedge clk);   // DUT samples correct values
end

8Swap Fails With Blocking — Classic Temp Variable RequirementSwap BugBuggy Code

Bug 8 — Blocking Swap Destroys One Value — Non-Blocking Doesn't

// ❌ BUG: swap with blocking — classic value-destruction bug
always_ff @(posedge clk) begin
if (swap_en) begin
a = b;   // a = b (new). Old a is LOST forever.
b = a;   // b = a = b (already overwritten!)
// Both a and b now hold b_old. Swap failed.
end
end
 
// ✅ FIX 1: non-blocking — RHS captured before any LHS updates
always_ff @(posedge clk) begin
if (swap_en) begin
a <= b;   // RHS: b_old captured
b <= a;   // RHS: a_old captured
// NBA: a ← b_old, b ← a_old simultaneously. Swap works. ✅
end
end
 
// ✅ FIX 2: if you must use blocking, need temp variable
always_ff @(posedge clk) begin
if (swap_en) begin
automatic logic [7:0] tmp = a;   // save a before overwriting
a = b;
b = tmp;   // restore old a from tmp. But still risky — use <=
end
end
 
// KEY INSIGHT: This is THE canonical example of why <= exists.
// Hardware registers naturally swap — each FF captures its D input
// at the clock edge simultaneously. <= models this perfectly.

💡 Senior Verification Engineer Tip: Enforce with Lint, Not Code Review

Blocking/non-blocking violations are too subtle and too consequential to rely on code review to catch. Configure your lint tool (Synopsys Spyglass, Cadence JasperGold, Aldec ALINT-PRO) to enforce: (1) STARC-2.1.4.1: no NBA in always_comb, (2) STARC-2.1.4.2: no blocking in always_ff, (3) W_MIXED_BA_NBAS: no mixing in same block. Set all three to ERROR severity, not WARNING. These three rules eliminate an entire class of RTL bugs automatically, without requiring engineers to remember the rule on every code write.

🎯 Interview Q&A — Blocking vs Non-Blocking

This topic appears in virtually every ASIC/DV interview. The questions range from fundamental definitions to subtle race condition scenarios. Here are the questions and the depth of answer that distinguishes a strong candidate.

Beginner Level

BeginnerWhat is the difference between blocking (=) and non-blocking (<=) assignments?Blocking (=): executes immediately, in statement order. The next statement in the same block sees the updated value. Used in always_comb to model combinational logic where each computed value feeds into the next.Non-blocking (<=): evaluates the RHS immediately, but defers the LHS update to the NBA region at the end of the current simulation time step. All <= RHS expressions in a block are sampled before any LHS is written. Used in always_ff to model flip-flops where all registers capture their inputs simultaneously at the clock edge. The key distinction: with =, the next line sees the new value. With <=, the next line still sees the old value — all LHS updates happen together in the NBA region after the entire block executes.BeginnerWhy does the swap (a <= b; b <= a) work without a temp variable?Because non-blocking captures all RHS values before committing any LHS values. When a <= b; executes, it samples b_old (the pre-clock-edge value). When b <= a; executes, it samples a_old. Then in the NBA region, both updates commit simultaneously: a ← b_old and b ← a_old. The old value of a is never overwritten before it's captured — it's "saved" in the NBA queue. With blocking (=), a = b overwrites a immediately. By the time b = a executes, a already holds b's value — the old a is permanently lost. The swap fails and both hold b_old.BeginnerWhat is the Golden Rule for blocking vs non-blocking?= (blocking) in always_comb. <= (non-blocking) in always_ff. Never use <= inside always_comb — the NBA deferred update means intermediate values are stale when the next statement reads them, producing wrong combinational results. Never use = inside always_ff — it creates race conditions when multiple always_ff blocks interact, and produces simulation behavior that may not match synthesis. This rule maps directly to the hardware: combinational logic passes values immediately through gates (blocking semantics). Flip-flops capture all inputs simultaneously at the clock edge (non-blocking semantics).

Intermediate Level

IntermediateExplain the NBA scheduler. What are the two phases of a simulation time step?The simulator divides each simulation time step into ordered regions:Active region: Blocking assignments execute immediately. The RHS of all <= assignments is evaluated and the values are queued for later commit. Continuous assignments propagate. $display executes here.NBA region: All queued non-blocking LHS updates commit simultaneously. Every always_ff flip-flop output gets its new value at the same moment, regardless of which block evaluated first in the Active region. This two-phase model makes always_ff blocks completely order-independent — it doesn't matter which block the simulator processes first, because all reads use pre-clock-edge values (sampled in Active) and all writes happen simultaneously (in NBA). This eliminates racing between always_ff blocks, exactly modeling real hardware.IntermediateTwo always_ff blocks both use blocking (=) and both trigger at the same posedge clk. Block A writes signal X and Block B reads signal X. What are the possible outcomes?There are two possible outcomes, and which one happens is non-deterministic — it depends on which block the simulator's event scheduler processes first:Outcome A (Block A executes first): Block A writes the new value to X. Block B reads the new X. B sees the post-clock-edge value of X. This does NOT match hardware (which always has B reading the pre-clock-edge X).Outcome B (Block B executes first): Block B reads the old X. Block A then writes the new X. B sees the pre-clock-edge value. This DOES match hardware behavior. Synthesis always produces Outcome B (both blocks model flip-flops that capture pre-clock values). Simulation gives Outcome A or B non-deterministically — the regression may appear to pass (if always getting Outcome B) or fail (if getting Outcome A). The fix is non-blocking: both blocks use the pre-clock X in the Active region, eliminating the outcome variation.IntermediateWhy does $display show a stale value after posedge clk when the flip-flop used <=?$display fires in the Active region — the same region where <= RHS is sampled and the blocking assignment for @(posedge clk) in the initial block triggers. The <= LHS update doesn't happen until the NBA region, which comes after the Active region. So when $display reads the flip-flop output q, it's reading the pre-clock-edge value (before the NBA has committed the new value). The solution: use $strobe instead of $display for flip-flop outputs. $strobe fires in the Postponed region — after all Active and NBA processing is complete and all signals have settled to their final values for the time step.

Advanced / Debugging Level

AdvancedRTL simulates correctly with VCS but produces different output with Questa. What do you investigate first?This is the definitive signature of a blocking assignment race condition in always_ff. The investigation: 1. Search for = in always_ff blocks. Every blocking assignment in an always_ff block is a potential race. Find them with: grep -n "always_ff" design.sv | grep "=" or use lint with STARC-2.1.4.2 enabled. 2. Check for shared variables. Two always_ff blocks that share a signal via blocking assignment will race — one block's write order relative to the other determines the outcome. 3. Replace all = with <= in always_ff. If the discrepancy disappears, the blocking assignments were the cause. 4. If mixed = and <= are in the same always_ff block — this is IEEE-undefined behavior and must be fixed regardless of which simulator is "right." Use automatic local variables for intermediates, then assign with <=.AdvancedCan you use blocking assignment inside always_ff if you're sure no other block reads the variable at the same clock edge?Technically yes — if only one always_ff block writes and reads a signal, using = inside that block won't cause a race with other blocks. However, this is a dangerous and fragile practice for several reasons: 1. Code maintainability: The next engineer to add another block that reads your signal doesn't know about your "safe" assumption and creates a race. 2. Lint violations: Most RTL sign-off lint rules (STARC-2.1.4.2) flag ANY blocking assignment in always_ff as an error, regardless of context. Your code won't pass sign-off. 3. Synthesis mismatch risk: Even in the "safe" case, mixing = and <= in the same block makes the code's meaning ambiguous to both humans and tools. The correct approach: use <= universally in always_ff. For intermediate calculations, use automatic local variables (which have no NBA semantics). This is unambiguous, lint-clean, and maintainable.SynthesisYou have a 2-stage pipeline using blocking = in always_ff. Gate-level simulation shows the correct 2-cycle latency, but RTL simulation shows 1-cycle latency. Explain what's happening.This is the exact inversion of the usual blocking bug — and it reveals something important about synthesis.In RTL simulation: Blocking = causes stage1 to update immediately, then stage2 reads the already-updated stage1. Both get the current cycle's value in the same clock cycle — 1-cycle latency appears.In synthesis: The synthesis tool recognizes that stage1 and stage2 are flip-flops. It always infers two D flip-flops in series, each capturing its D input at the clock edge. This is 2-cycle latency — 2-cycle latency in gate-level sim. This is a simulation-synthesis MISMATCH: RTL simulation shows 1-cycle delay, silicon (via gate-level simulation) shows 2-cycle delay. Your RTL verification environment validated the wrong behavior. The fix: use <= in the RTL — then both RTL simulation and gate-level simulation show 2-cycle latency, correctly.