Used correctly, constants can ease the coding of complex and modular designs. Constants can be used in a variety of ways. They can be used to create ROMs, for modular coding, and to define what or how something should be used. For example, constants can be used in conjunction with generate statements to specify which portion of code to use (synthesize). Consider, for example, one portion of code written for an ASIC implementation and another portion written for a Xilinx implementation. The ASIC implementation should use gates to implement a multiplexer, while the Xilinx version should use three-state buffers to implement a multiplexer. Because some synthesis tools do not currently support configuration statements, a generate statement is the best solution.
Figure 13-12 shows an example of how constants can be used to define the logic created. Although this is a simple example, it illustrates the possibilities. By one change to the constant ASIC, an entirely different set of circuitry is synthesized throughout the design.
Figure 13-12 A Constant Guiding the Generation of Logic
Constants can aid modular coding. For example, you could define a constant that specifies the width of the address bus. One change to that constant in the package would make a modular change to everything in the design. See Figure 13-13. Using constants to define address and data-bus widths may be better than using generics. Generics are passed from the top-down, eliminating the possibility of synthesizing bottom-up. A bottom-up synthesis is generally preferable for decreased synthesis run-times because only the modules that change need to be resynthesized.
Figure 13 13 Address Width defined by a Constant
Rules for Defining Constants within a Package
Define constants within a package when it can be used to improve the modularity of the code by guiding generate statements.
Define constants in a package to define sizes and widths of buses. Constants used in this manner are generally more powerful than using generics because it allows the design to be synthesized in any manner, whereas generics allow only top-down synthesis.
Functions and Procedures
By definition, functions and procedures add modularity and reuse to code. Extensive use of functions and procedures from the global and project packages is encouraged. Rather than extensively using functions and procedures from a designer’s package, the designer is encouraged to add the functions and procedures at a local level (within an architecture), to maintain readability for other designers and future reuse.
When defining functions and procedures, it is beneficial to use unsized vectors to pass signals. Using unsized vectors allows a modular use of the subprogram. In addition to using unsized vectors, use signal – range attributes to define the logic.
In the function example shown below in Figure 13-14, the input, named vec, is defined as a std_logic_vector. By not defining a sized vector, the actual size of the signal that is passed in will determine the implementation. The range attribute ‘range specifies the size of the intended logic. This function is modular; that is, it is not limited to being used for one specific vector size. A vector of any size can be passed into this function and correctly infer any amount of logic.
Figure 13 14 Modular Function Use
Rules for Functions and Procedures
Extensive use of functions and procedures is encouraged. Predominately, the functions and procedures should be defined within either the global or the project packages.
Create modular functions and procedures by not specifying the width of inputs and outputs. Then use range attributes to extract the needed information about the size of an object.
Types, Subtypes, and Aliases
Types and subtypes are encouraged for readability. Types defined at the global and project level are generally required, and they help to create reusable code.
Aliases can be used to clarify the intent, or meaning, of a signal. In most cases, the intent of a signal can be clearly identified by its name. Thus, aliases should not be used extensively. While aliases can help to clarify the purpose of a signal, they also add redirection, which may reduce the readability of the code.
lthough aliases are not used in conjunction only with types and subtypes, it is useful for examples to be included here. In Figure 13-15 there are two types defined: a record and an array. For this example, aliases can be used to clarify the use of the signal rx_packet.data (rx_data) and the intent of the signal data_addr(0) (data_when_addr0). In this example, the alias data_when_addr0 is used in place of data_array(0), this provides more meaning to the "slice" of data than data_array(0) provides. Whenever the alias data_when_addr0 is seen in the code, the intent is obvious. The use of the alias rx_data simply provides a shortened version of the signal rx_packet.data while its use and intent are maintained.
Figure 13 15 Use of Types and Aliases
Rules for Types, Subtypes, and Alias use
Types and subtypes are encouraged on a global or project basis to facilitate reusable code.
Alias use is encouraged when it clearly promotes readability without adding complex redirection.
Technology-Specific Code (Xilinx) Section 4
It is desirable to maintain portable, reusable code. However, this is not always possible. There are cases for each technology vendor where instantiation of blocks is required. Furthermore, writing what is intended to be generic code will not always provide the best solution for a specific technology. The tradeoffs between instantiation versus technology-specific code are discussed below.
Although instantiation of Xilinx primitives is largely unneeded and unwanted, there are some specific cases where it must be done -- and other occasions when it should be done. While some of the components that need to be instantiated for a Xilinx implementation vary, those covered here are specific for Synplify, Synopsys, Exemplar, and XST. This section will describe situations where deviation from reusable code is required.
Specific top-level (FPGA) components require instantiation, including the boundary scan component, digital delay-locked loop components (DLL) or digital clock manager (DCM), startup block, and I/O pullups and pulldowns.
Inputs and outputs, other than LVTTL, can be specified in the synthesis tool. However, it is more advantageous to specify the I/O threshold level in the Xilinx Constraints Editor. This will write a constraint into the Xilinx UCF (User Constraint File), which is fed into the Xilinx implementation tools.
To instantiate Xilinx primitives, you will need to have a correct component declaration. This information can be inferred directly from the Xilinx Libraries Guide, found in the online documentation.
Rules for Required Instantiations for Xilinx
Boundary Scan (BSCAN)
Digital Clock Manager (DCM) or Delay-Locked Loop (DLL). Instantiating the DCM/DLL provides access to other elements of the DCM, as well as elimination of clock distribution delay. This includes phase shifting, 50-50 duty-cycle correction, multiplication of the clock, and division of the clock.
IBUFG and BUFG. IBUFG is a dedicated clock buffer that drives the input of the DCM/DLL. BUFG is an internal global clock buffer that drives the internal FPGA clock and provides the feedback clock to the DCM/DLL.
DDR registers. DDR registers are dedicated Double-Data Rate (DDR) I/O registers located in the input or output block of the FPGA.
Startup. The startup block provides access to a Global Set or Reset line (GSR) and a Global Three-State line (GTS). The startup block is not inferred because routing a global set or reset line on the dedicated GSR resources is slower than using the abundant general routing resources.
I/O pullups and pulldowns (pullup, pulldown).
Simulation of Instantiated Xilinx Primitives
Correct behavioral simulation will require certain simulation files. These can be found in the Xilinx directory structure: $Xilinx/vhdl/src/unisims. Note that unisims are similar to simprims, except that: unisims do not have component timing information enabled. Whereas, simprims have the timing information enabled but require an SDF file (from Xilinx place and route) to supply the timing information (post place and route timing simulation).
Within the unisim directory, several VHDL files need to be compiled to a unisim library. They can then be accessed by specifying the library unisim and using the use statement. For example:
The VHDL files must be compiled in a specific order because there are dependencies between the files. The compilation order is:
For post-place-and-route timing simulation, the simprim files need to be compiled into a simprim library. The VHDL files for simprims are in: $Xilinx/vhdl/src/simprims. The correct package compilation order is:
Simulation files rules
Unisims are used for behavioral and post-synthesis simulation.
Simprims are used for post place-and-route timing simulation.
Non-Generic Xilinx-Specific Code
This section is used to describe situations where Xilinx-specific coding may be required to get a better implementation than can be inferred from either generic code or ASIC-specific coding.
Generic coding of multiplexers is likely to result in an and-or gate implementation. However, for Xilinx parts, gate implementation of multiplexers is generally not advantageous. Xilinx parts have a very fast implementation for multiplexers of 64:1 or less. For multiplexers greater than 64:1, the tradeoffs need to be considered. Multiplexers implemented with internal three-state buffers have a near consistent implementation speed for any size multiplexer.
Three-state multiplexers are implemented by assigning a value of "Z" to a signal. Synthesis further requires concurrent assignment statements. An example is shown in Figure 13-16. For this example, there is a default assignment made to the signal data_tri to ‘Z’. The case statement infers the required multiplexing, and the concurrent assignment statements to the signal data infer internal three-state buffers. With those concurrent assignment statements, synthesis can only resolve the signal values is by using three-states. Without the concurrent assignment statements, synthesis would implement this in gates, despite the default assignment to "Z."
Figure 13 16 Three-state Implementation of 4:1 Multiplexer
Rules for Synthesis Three-State Implementation
Use a default assignment of "Z" to the three-state signal.
Make concurrent assignments to the actual three-stated signal.
While memory can be inferred for Xilinx, it most likely cannot be inferred for the ASIC by using the same code. It is very likely that two separate implementations will be required. This section will describe the methodology used to infer Xilinx-specific memory resources. It is generally advantageous to instantiate the use of memory resources to make it easier to change for other technology implementations. While it is not always required, Xilinx’s CORE Generator system program can generate RAM for instantiation. The CORE Generator system created memory must be used for dual-ported block RAMs, but it can also be used for creating other types of memory resources. The CORE Generator system does provide simulation files, but it is seen as a black box in synthesis; therefore, it will not provide timing information through that block.
RAM and ROM
The Xilinx LUT-RAM is implemented in the look-up tables (LUTs). Each slice has 32-bits of memory. A slice can have three basic single-port memory configurations: 16x1(2), 16x2, or 32x1. The Xilinx slices and CLBs can be cascaded for larger configurations.
LUT-RAM memory is characterized by synchronous write and asynchronous read operation. It also is not able to be reset; however, it can be loaded with initial values through a Xilinx user constraint file (UCF). Inference of Xilinx LUT-RAM resources is based on the same behavior described in the code shown in Figure 13-17. Dual-port LUT-RAM can also be inferred by adding a second read address. Dual-port RAM has similar functionality with a synchronous write port and two asynchronous read ports.
Figure 13 17 Xilinx LUT-RAM Inference
ROM inference is driven by constants. Example code for inferring LUT-ROM is shown in Figure 13-18.
Figure 13 18 LUT-ROM Inference
Single-port block RAM inference is driven by a registered read address and a synchronous write. The example shown Figure 13-19 has this characterization. In the past, block RAM has been easily inferred, simply by having the registered address and synchronous write. Synthesis tools can only infer simple block RAMs. For example, you cannot infer a dual-port RAM with a configurable aspect ratio for the data ports. For these reasons, most dual-port block RAMs should be block-RAM primitive instantiations or created with the CORE Generator system.
Figure 13 19 Virtex Block RAM Inference
Rules for Memory Inference
For single- or dual-port RAM implemented in LUTs, describe the behavior of a synchronous write and an asynchronous read operation.
For ROM inference in LUTs, create an array of constants.
Single-port block RAM is inferred by having a synchronous write and a registered read address (as shown in the example above, Figure 13-19).
For other configurations of the Xilinx block RAM, use the CORE Generator system.
CORE Generator System
The CORE Generator system may be used for creating many different types of ready-made functions. One limiting factor of the CORE Generator system is that synthesis tools cannot extract any timing information; it is seen as a black box.
The CORE Generator system provides three files for a module:
Implementation file, .ngc.
Instantiation template, .vho
Simulation wrapper, .vhd
For behavioral and post-synthesis simulation, the simulation wrapper file will have to be used. To simulate a CORE Generator module, the necessary simulation packages must be compiled. More information on using this flow and generating the necessary files can be found in the CORE Generator tool under Help Online Documentation.
The CORE Generator system provides simulation models in the $Xilinx/vhdl/src/XilinxCoreLib directory. There is a strict order of analysis that must be followed, which can be found in the analyze_order file located in the specified directory. In addition, Xilinx provides a Perl script for a fast and easy analysis of different simulators. To compile the XilinxCoreLib models with ModelSim or VSS, use the following syntax at a command prompt:
xilinxperl.exe $Xilinx/vhdl/bin/nt/compile_mti_vhdl.pl coregen
xilinxperl.exe $Xilinx/vhdl/bin/nt/compile_vss_vhdl.pl coregen
Compare logic is frequently implemented poorly in FPGAs. Compare logic is inferred by the use of <, <=, >, and >= VHDL operators. For a Xilinx implementation, this logic is best implemented when described with and-or implementations. When possible, look for patterns in the data or address signals that can be used to implement a comparison with gates, rather than compare logic. If a critical path includes comparison logic, an implementation that would use and-or logic should be considered.
Rule for Comparator Implementation
If a critical path has comparator logic in it, then try to implement the comparison by using and-or gates.
Xilinx Clock Enables
Clock enables are easily inferred, either explicitly or implicitly. Clock enables are very useful for maintaining a synchronous design. They are highly preferable over the unwanted gated clock. However, not all technologies support clock enables directly. For those architectures that do not support clock enables as a direct input to the register, it will be implemented via a feedback path. This type of implementation is not a highly regarded implementation style. Not only does it add a feedback path to the register, it also uses more logic because FPGA architecture requires two extra inputs into the LUT driving the register.
The Xilinx architecture supports clock enables as a direct input to a register. This is highly advantageous for a Xilinx implementation. However, the designer must be certain that the logic required to create the clock enable does not infer large amounts of logic, making it a critical path.
In the example shown below (Figure 13-20), there is an explicit inference of a clock enable and an implicit inference of clock enables. In the first section, a clock enable is via explicitly testing for a terminal count. In the second section of code, the clock enables are implied for the signals cs and state. The clock enable for cs is inferred by not making an assignment to cs in the state init. The clock enable for the signal state is inferred by not defining all possible branches for the if-then-else statement, highlighted in red. When the if-then-else condition is false, state must hold its current value. Clock enables are inferred for these conditions when they are in a clocked process. F
or a combinatorial process, it would infer latches.
Figure 13 20 Clock Enable Inference
Rules for Clock Enable Inference
Clock enables can only be inferred in a clocked process.
Clock enables can be inferred explicitly by testing an enable signal. If the enable is true, the signal is updated. If enable is false, that signal will hold its current value.
Clock enables can be implicitly inferred two ways:
Not assigning to a signal in every branch of an if-then-else statement or case statement. Remember that latches will be inferred for this condition in a combinatorial process (see section 5, Inadvertent latch Inference).
Not defining all possible states or branches of an if-then-else or case statement.
Pipelining with SRL
In Xilinx FPGAs, there is an abundance of registers; there are two registers per slice. This is sufficient for most registered signals. However, there are times when multiple pipeline delays are required at the end of a path. When this is true, it is best to use the Xilinx SRL (Shift Register LUT). The SRL uses the LUT as a shiftable RAM to create the effect of a shift register. In Figure 13-21an example of how to infer the SRL is shown. This will infer a shift register with 16 shifts (width = 4). Although this will infer registers for an ASIC, it will infer the SRL when you are targeting a Xilinx part. The behavior that is required to infer the SRL is highlighted in blue. The size could be made parameterizable by using constants to define the signal widths (section 3, ). It could also be made into a procedure with parameterized widths and sizes.
Figure 13 21 Inference of Xilinx Shift Register LUT (SRL)
Rules for SRL Inference
No reset functionality may be used directly to the registers.
If a reset is required, the reset data must be supplied to the SRL until the pipeline is filled with reset data.
You may read from a constant location or from a dynamic address location. In Xilinx Virtex-II parts, you may read from two different locations: a fixed location and a dynamically addressable location.
Technology-Specific Logic Generation – Generate Statements
This section has outlined ways that Xilinx-specific coding will differ from other solutions. Because many styles may exist for a similar block of code (for example a multiplexer), to get the optimal implementation, use VHDL generate statements. This is the best solution for a couple of reasons. Although configuration statements are commonly used to guide the synthesis of multiple implementation styles, some synthesis tools currently do not fully support them. Also, with generate statements, a change to a single constant will change the type of logic generated (ASIC or FPGA).
An example of using generate statements was covered in section 3, in the Figure 13-12.
Coding for Synthesis Section 5
The main synthesis issues involve coding for minimum logic level implementation (i.e., coding for speed, max frequency); inadvertent logic inference; and fast, reliable, and reusable code.
The number one reason that a design does not work in a Xilinx FPGA is that the design uses asynchronous techniques. To clarify, the primary concern is asynchronous techniques used to insert delays to align data, not crossing clock domains. Crossing clock domains is often unavoidable, and there are good techniques for accomplishing that task via FIFOs. There are no good techniques to implement an asynchronous design. First, and most important, the actual delay can vary based on the junction temperature. Second, for timing simulations, Xilinx provides only maximum delays. If a design works based on the maximum delays, this does not mean that it will work with actual delays. Third, Xilinx will stamp surplus –6 (faster) parts with a –5 or –4 (slower speed) speed-grade. However, if the design is done synchronously there will be no adverse effects.
In a synchronous design, only one clock and one edge of the clock should be used. There are exceptions to this rule. For example, by utilizing the 50/50 duty-cycle correction of the DCM/DLL, in a Xilinx FPGA you may safely use both edges of the clock because the duty-cycle will not drift.
Do not generate internal clocks. Primarily, do not generate gated clocks because these clocks will glitch, propagating erroneous data. The other primary problems with internally generated clocks are clock-skew related problems. Internal clocks that are not placed on a global clock buffer will incur clock skew, making it unreliable. Replace these internally generated clocks with either a clock enable signal or generate divided, multiplied, phase shifted, etc. clocks with a clock generated via the DCM/DLL.
Rules for Clock Signals
Use one clock signal and one edge.
Do not generate internal clock signals because of glitching and clock-skew related problems.
Local Synchronous Sets and Resets
ocal synchronous sets and resets eliminate the glitching associated with local asynchronous sets and resets. An example of such a problem is associated with the use of a binary counter that does not use the maximal binary count. For example, a four-bit binary counter has 16 possible binary counts. However, if the design calls only for 14 counts, the counter needs to be reset before it has reached its limit. An example of using local asynchronous resets is highlighted in red in Figure 13-22. A well-behaved circuit is highlighted in blue, in the Figure 13-23. For the binary counter that is using a local asynchronous reset, there will be glitching associated with the binary transitions, which will cause the local asynchronous reset to be generated. When this happens, the circuit will propagate erroneous data.
Figure 13-22 Local Asynchronous Reset and TC & Well-Behaved Synchronous Reset & CE
igure 13 23 Local Asynchronous Reset and TC & Well-Behaved Synchronous Reset & CE
Rule for Local Set or Reset Signals
A local reset or set signal should use a synchronous implementation.
Pipelining is the act of inserting registers into one path to align that data with the data in another path, such that both paths have an equal amount of latency. Pipelining may also decrease the amount of combinatorial delay between registers, thus increasing the maximum clock frequency. Pipelines are often inserted at the end of a path by using a shift register implementation. Shift registers in Xilinx’s Virtex parts are best implemented in the LUT as an SRL, as described in section 4. Signal naming for pipelined signals is covered in section 1.
Registering Leaf-Level Outputs and Top-Level Inputs
A very robust technique, used in synchronous design, is registering outputs of leaf-levels (sub-blocks). This has several advantages:
No optimization is needed across hierarchical boundaries.
Enables the ability to preserve the hierarchy.
Recompile only those levels that have changed.
Enables hierarchical floorplanning.
Increases the capability of a guided implementation.
Forces the designer to keep like-logic together.
Similarly, registering the top-level inputs decreases the input to clock (ti2c) delays; therefore, it increases the chip-to-chip frequency.
Rules for the Hierarchical Registering of Signals
Register outputs of leaf-level blocks.
Register the inputs to the chip’s top-level.
The use of clock enables increases the routability of a Xilinx implementation and maintains synchronous design. The use of clock enables is the correct alternative to gated clocks.
Clock enables increase the routability of the design because the registers with clock enables will run at a reduced clock frequency. If the clock enable is one-half the clock rate, the clock enabled datapaths are placed-and-routed once the full clock frequency paths have been placed-and-routed. The clock enable should have a timing constraint placed on it so that the Xilinx implementation tools will recognize the difference between the normal clock frequency and the clock-enabled frequency. This will place a lower priority on routing the clock-enabled paths.
Gated clocks will introduce glitching in a design, causing incorrect data to be propagated in the data stream. Therefore, gated clocks should be avoided.
Using signals generated by sequential logic as clocks is a common error. For example, you use a counter to count through a specific number of clock cycles, producing a registered terminal count. The terminal count is then used as a clock to register data. This internal clock is routed on the general interconnect. The skew on internally generated clocks can be so detrimental that it causes errors. This may also cause race conditions if the data is resynchronized with the system clock. This error is illustrated in Figure 13-. The text highlighted in red is the implementation using the terminal count as a clock.
Instead, generate the terminal count one count previous, and use the terminal count as a clock enable for the data register. The text highlighted in blue is the well-behaved implementation using the terminal count as a clock enable. An explanation of the reset signals is covered in the next section - 5.
It may be useful to generate clock enables by using a state machine. The state machine can be encoded at run time by the synthesis tool. Thus a one-hot, gray, or Johnson encoding style could be used. It is also possible to produce precisely placed clock enables by using a linear feedback shift register (LFSR), also known as a pseudo-random bitstream generator (PRBS generator). Xilinx provides application notes on the use of LFSRs.
Clock enables for Xilinx implementations are further discussed in section 4.
Rules for Clock Enable
Use clock enables in place of gated clocks.
Use clock enables in place of internally generated clocks.
Finite State Machines
Coding for Finite State Machines (FSM) involves includes analyzing several tradeoffs.
Enumerated types in VHDL allow the FSM to be encoded by the synthesis tool. However, the encoding style used will not be clearly defined in the code but rather in the synthesis tool. Therefore, good documentation should be provided -- stating specifically which encoding style was used. By default, most synthesis tools will use binary encoding for state machines with less than five states: one-hot for 5 to 24 states and gray for more than 24 states (or similar). Otherwise, synthesis will use one-hot encoding. One-hot encoding is the suggested implementation for Xilinx FPGAs because Xilinx FPGAs have abundant registers. Other encoding styles may also be used -- specifically gray encoding. For a gray-encoding style, only one-bit transitions on any given state transition (in most cases); therefore, less registers are used than for a one-hot implementation, and glitching is minimized. The tradeoffs for these encoding styles can easily be analyzed by changing a synthesis FSM attribute and running it through synthesis to get an estimate of the timing. The timing shown in synthesis will most likely not match the actual implemented timing; however, the timing shown between the different encoding styles will be relative, therefore providing the designer a good estimate of which encoding style to use.
Another possibility is to specifically encode the state machine. This is easily done via the use of constants. The code will clearly document the encoding style used. In general, one-hot is the suggested method of encoding for FPGAs -- specifically for Xilinx. A one-hot encoding style uses more registers, but the decoding for each state (and the outputs) is minimized, increasing performance. Other possibilities include gray, Johnson (ring-counter), user-encoded, and binary. Again, the tradeoffs can easily be analyzed by changing the encoding style and synthesizing the code.
Regardless of the encoding style used, the designer should analyze illegal states. Specifically, are all the possible states used? Often, state machines do not use all the possible states. Therefore, the designer should consider what occurs when an illegal state is encountered. Certainly, a one-hot implementation does not cover all possible states. For a one-hot implementation, many illegal states exist. Thus, if the synthesis tool must decode these states, it may become much slower. The code can also specifically report what will happen when an illegal state is encountered by using a “when others” VHDL case statement. Under the “when others” statement, the state and all outputs should be assigned to a specific value. Generally, the best solution is to return to the reset state. The designer could also choose to ignore illegal states by encoding “don’t care” values (‘X’) and allow the synthesis tool to optimize the logic for illegal states. This will result in a fast state machine, but illegal states will not be covered.
Rules for Encoding FSMs
For enumerated-types, encode the state machine with synthesis-specific attributes. Decide if the logic should check for illegal states.
For user-encoded state machines, the designer should analyze whether the logic should check for illegal states or not, and the designer should accordingly write the “when others” statement. If the designer is concerned with illegal states, the state machine should revert to the reset state. If the designer is not concerned with illegal states, the outputs and state should be assigned "X" in the “when others” statement.
Xilinx suggests using one-hot encoding for most state machines. If the state machine is large, the designer should consider using a gray or Johnson encoding style and accordingly analyze the tradeoffs.
FSM VHDL Processes
Most synthesis tools suggest coding state machines with three process statements: one for the next state decoding, one for the output decoding, and one for registering of outputs and state bits. This is not as concise as using one process statement to implement the entire state machine; however, it allows the synthesis tools the ability to better optimize the logic for both the outputs and the next-state decoding. Another style is to use two processes to implement the state machine: one for next state and output decoding and the other process for registering of outputs and state bits.
The decision to use one, two, or three process statements is entirely left up to the discretion of the designer. Moore state machines (output is dependent only on the current state) generally have limited decoding for the outputs, and the state machine can, therefore, be safely coded with either one or two process statements. Mealy state machine (outputs depend on the inputs and the current state) output decoding is generally more complex, and, therefore, the designer should use three processes. Mealy state machines are also the preferred style for FSMs because it is advantageous to register the outputs of a sub-block (as described above in section 5). Mealy state machines will have the least amount of latency with registered outputs. Mealy state machines can be used with a look-ahead scheme. Based on the current state and the inputs, the outputs can be decoded for the next state. For simple state machines where the output is not dependent on the inputs, a Moore implementation is equivalent to a look-ahead scheme. That is, the outputs can be decoded for the next state and appropriately registered to reflect the next state (rather than reflecting the current state). The purpose of this scheme is to introduce the least amount of latency when registering the outputs.
Rules for FSM Style
Generally, use three process statements for a state machine: one process for next-state decoding, one for output decoding, and one for the registering of state bits and outputs.
Use a Mealy look-ahead state machine with registered outputs whenever possible, or use a Moore state machine with next-state output decoding and registered outputs to incur the minimum amount of latency.