Entity: prim_ram_1p_scr

Diagram

int Depth int Width int DataBitsPerMask bit EnableParity int NumPrinceRoundsHalf int NumDiffRounds int DiffWidth int NumAddrScrRounds bit ReplicateKeyStream int AddrWidth int NumParScr int NumParKeystr int DataKeyWidth int NonceWidth clk_i rst_ni key_valid_i [DataKeyWidth-1:0] key_i [NonceWidth-1:0] nonce_i req_i write_i [AddrWidth-1:0] addr_i [Width-1:0] wdata_i [Width-1:0] wmask_i intg_error_i ram_1p_cfg_t cfg_i gnt_o [Width-1:0] rdata_o rvalid_o [1:0] rerror_o [31:0] raddr_o

Description

Copyright lowRISC contributors. Licensed under the Apache License, Version 2.0, see LICENSE for details. SPDX-License-Identifier: Apache-2.0

This is a draft implementation of a low-latency memory scrambling mechanism.

The module is implemented as a primitive, in the same spirit as similar prim_ram_1p_adv wrappers. Hence, it can be conveniently instantiated by comportable IPs (such as OTBN) or in top_earlgrey for the main system memory.

The currently implemented architecture uses a reduced-round PRINCE cipher primitive in CTR mode in order to (weakly) scramble the data written to the memory macro. Plain CTR mode does not diffuse the data since the keystream is just XOR'ed onto it, hence we also we perform byte-wise diffusion using a (shallow) substitution/permutation network layers in order to provide a limited avalanche effect within a byte.

In order to break the linear addressing space, the address is passed through a bijective scrambling function constructed using a (shallow) substitution/permutation and a nonce. Due to that nonce, the address mapping is not fully baked into RTL and can be changed at runtime as well.

See also: prim_cipher_pkg, prim_prince

Generics

Generic name Type Value Description
Depth int 16*1024 Needs to be a power of 2 if NumAddrScrRounds > 0.
Width int 32 Needs to be byte aligned if byte parity is enabled.
DataBitsPerMask int 8 Needs to be set to 8 in case of byte parity.
EnableParity bit 1 Enable byte parity.
NumPrinceRoundsHalf int 2 Scrambling parameters. Note that this needs to be low-latency, hence we have to keep the amount of cipher rounds low. PRINCE has 5 half rounds in its original form, which corresponds to 2*5 + 1 effective rounds. Setting this to 2 halves this to approximately 5 effective rounds. Number of PRINCE half rounds, can be [1..5]
NumDiffRounds int 2 Number of extra diffusion rounds. Setting this to 0 to disable diffusion.
DiffWidth int DataBitsPerMask This parameter governs the block-width of additional diffusion layers. For intra-byte diffusion, set this parameter to 8.
NumAddrScrRounds int 2 Number of address scrambling rounds. Setting this to 0 disables address scrambling.
ReplicateKeyStream bit 1'b0 If set to 1, the same 64bit key stream is replicated if the data port is wider than 64bit. If set to 0, the cipher primitive is replicated, and together with a wider nonce input, a unique keystream is generated for the full data width.
AddrWidth int prim_util_pkg::vbits(Depth) Derived parameters
NumParScr int 64 Depending on the data width, we need to instantiate multiple parallel cipher primitives to create a keystream that is wide enough (PRINCE has a block size of 64bit)
NumParKeystr int 64
DataKeyWidth int 128 This is given by the PRINCE cipher primitive. All parallel cipher modules use the same key, but they use a different IV
NonceWidth int 64 * NumParScr Each 64 bit scrambling primitive requires a 64bit IV

Ports

Port name Direction Type Description
clk_i input
rst_ni input
key_valid_i input Key interface. Memory requests will not be granted if key_valid is set to 0.
key_i input [DataKeyWidth-1:0]
nonce_i input [NonceWidth-1:0]
req_i input Interface to TL-UL SRAM adapter
gnt_o output
write_i input
addr_i input [AddrWidth-1:0]
wdata_i input [Width-1:0]
wmask_i input [Width-1:0] Needs to be byte-aligned for parity
intg_error_i input The incoming transaction contains an integrity error and the module should alter its behavior appropriately. On integrity errors, the primitive reverses the bit-order of the nonce and surpresses any real transaction to the memory.
rdata_o output [Width-1:0]
rvalid_o output Read response (rdata_o) is valid
rerror_o output [1:0] Bit1: Uncorrectable, Bit0: Correctable
raddr_o output [31:0] Read address for error reporting.
cfg_i input ram_1p_cfg_t config

Signals

Name Type Description
read_en logic /////////////////////////////////////// Pending Write and Address Registers // /////////////////////////////////////// Writes are delayed by one cycle, such the same keystream generation primitive (prim_prince) can be reused among reads and writes. Note however that with this arrangement, we have to introduce a mechanism to hold a pending write transaction in cases where that transaction is immediately followed by a read. The pending write transaction is written to memory as soon as there is no new read transaction incoming. The latter can be a special case if the incoming read goes to the same address as the pending write. To that end, we detect the address collision and return the data from the write holding register. Read / write strobes
write_en_d logic /////////////////////////////////////// Pending Write and Address Registers // /////////////////////////////////////// Writes are delayed by one cycle, such the same keystream generation primitive (prim_prince) can be reused among reads and writes. Note however that with this arrangement, we have to introduce a mechanism to hold a pending write transaction in cases where that transaction is immediately followed by a read. The pending write transaction is written to memory as soon as there is no new read transaction incoming. The latter can be a special case if the incoming read goes to the same address as the pending write. To that end, we detect the address collision and return the data from the write holding register. Read / write strobes
write_en_q logic /////////////////////////////////////// Pending Write and Address Registers // /////////////////////////////////////// Writes are delayed by one cycle, such the same keystream generation primitive (prim_prince) can be reused among reads and writes. Note however that with this arrangement, we have to introduce a mechanism to hold a pending write transaction in cases where that transaction is immediately followed by a read. The pending write transaction is written to memory as soon as there is no new read transaction incoming. The latter can be a special case if the incoming read goes to the same address as the pending write. To that end, we detect the address collision and return the data from the write holding register. Read / write strobes
write_pending_q logic
addr_collision_d logic
addr_collision_q logic
waddr_q logic [AddrWidth-1:0]
macro_req logic Macro requests and write strobe The macro operation is silenced if an integrity error is seen
intg_err_macro_req logic
macro_write logic We are allowed to write a pending write transaction to the memory if there is no incoming read
rw_collision logic New read write collision
addr_mux logic [AddrWidth-1:0] ////////////////////// Address Scrambling // ////////////////////// We only select the pending write address in case there is no incoming read transaction.
addr_scr logic [AddrWidth-1:0] This creates a bijective address mapping using a substitution / permutation network.
raddr_q logic [AddrWidth-1:0] We latch the non-scrambled address for error reporting.
keystream logic [NumParScr*64-1:0]
data_scr_nonce logic [NumParScr-1:0][DataNonceWidth-1:0]
intg_err_data_scr logic
keystream_repl logic [Width-1:0] Replicate keystream if needed
rdata_scr logic [Width-1:0] /////////////////// Data Scrambling // /////////////////// Data scrambling is a two step process. First, we XOR the write data with the keystream obtained by operating a reduced-round PRINCE cipher in CTR-mode. Then, we diffuse data within each byte in order to get a limited "avalanche" behavior in case parts of the bytes are flipped as a result of a malicious attempt to tamper with the data in memory. We perform the diffusion only within bytes in order to maintain the ability to write individual bytes. Note that the keystream XOR is performed first for the write path such that it can be performed last for the read path. This allows us to hide a part of the combinational delay of the PRINCE primitive behind the propagation delay of the SRAM macro and the per-byte diffusion step.
rdata logic [Width-1:0] /////////////////// Data Scrambling // /////////////////// Data scrambling is a two step process. First, we XOR the write data with the keystream obtained by operating a reduced-round PRINCE cipher in CTR-mode. Then, we diffuse data within each byte in order to get a limited "avalanche" behavior in case parts of the bytes are flipped as a result of a malicious attempt to tamper with the data in memory. We perform the diffusion only within bytes in order to maintain the ability to write individual bytes. Note that the keystream XOR is performed first for the write path such that it can be performed last for the read path. This allows us to hide a part of the combinational delay of the PRINCE primitive behind the propagation delay of the SRAM macro and the per-byte diffusion step.
wdata_scr_d logic [Width-1:0]
wdata_scr_q logic [Width-1:0]
wdata_q logic [Width-1:0]
write_scr_pending_d logic ////////////////////////////////////////////// Scrambled data register and forwarding mux // ////////////////////////////////////////////// This is the scrambled data holding register for pending writes. This is needed in order to make back to back patterns of the form WR -> RD -> WR work:
cycle: 0
wdata_scr logic [Width-1:0] Select the correct scrambled word to be written, based on whether the word in the scrambled data holding register is valid or not. Note that the write_scr_q register could in theory be combined with the wdata_q register. We don't do that here for timing reasons, since that would require another read data mux to inject the scrambled data into the read descrambling path.
rvalid_q logic Output read valid strobe
intg_error_q logic
wmask_q logic [Width-1:0]

Constants

Name Type Value Description
AddrWidth int prim_util_pkg::vbits(Depth) Derived parameters
NumParScr int 64 Depending on the data width, we need to instantiate multiple parallel cipher primitives to create a keystream that is wide enough (PRINCE has a block size of 64bit)
NumParKeystr int 64
DataKeyWidth int 128 This is given by the PRINCE cipher primitive. All parallel cipher modules use the same key, but they use a different IV
NonceWidth int 64 * NumParScr Each 64 bit scrambling primitive requires a 64bit IV
DataNonceWidth int 64 - AddrWidth //////////////////////////////////////////// Keystream Generation for Data Scrambling // //////////////////////////////////////////// This encrypts the IV consisting of the nonce and address using the key provided in order to generate the keystream for the data. Note that we instantiate a register halfway within this primitive to balance the delay between request and response side.

Processes

Type: always_comb

Type: always_ff

Description
///////////// Registers // /////////////

Instantiations

Description
////////////////
Memory Macro //
////////////////