Phase Ordering of the Java Hostpot C2’s compiler

The Java C2 compiler turns Java bytecode into highly optimized native instructions. The process is decomposed in several phases. Knowing the order in which these phases are applied is important for developers interested in modifying Hostpot.  In this post I will talk about the phase ordering of the C2 compiler, making emphasis on the optimizations applied during the compilation process and providing links to the C++ implementation in the Hostpot’s repository.

Arnold Schwaighofer’s Master’s Thesis gives a very good overview of the phase order of the C2 compiler (The following list is based on his text):

  1. Parsing: The bytecode is parsed to construct the C2′ internal representation: the Ideal Graph. During parsing local optimizations are performed.
  2. Ideal Graph Optimizations: Machine independent optimization operate on the Ideal Graph. During this phase several phases of global value numbering, constant propagation, dead code elimination and loop optimizations occur.
  3. Instruction Selection: Generates the MachNode graph using a bottom-up rewrite system. The ideal graph is split into subtrees. An optimal assignment of machine specific nodes to the subtree is built by  recursively applying a BURS algorithm. The possible instructions and their associated costs are recorded in an architecture description file, which is used by the algorithm.
  4. Global Code Motion: A global code motion algorithm constructs a control ow graph. Basic blocks are created and filled with instructions.
  5. Scheduling: Orders the instructions within a basic block. The compiler selects between available instructions based on a score. The score is computed using several heuristics (e.g. instructions that store to the stack or memory are given a higher score in order to free registers used for their input operands early).
  6. Register Allocation: Converts the graph into a non-SSA form and assigns physical registers. A graph coloring allocator is used. At the end of register allocation the compiler generates the debugging information for safepoints.
  7. Machine Code Generation: Machine code is emitted by iterating over the MachNode graph

These seven super-phases are composed of several sub-phases that further decompose and calls other phases as they go. The process is not linear and many sub-phases are applied iteratively.

1. Parsing

During the parsing super-phase, the Java bytecode is transformed into the C2’s internal representation, the Ideal Graph. The parser also performs the following local optimizations while building the Ideal Graph:

As explained in section 5 of this paper, these optimizations are applied locally to the portion of the graph being built, all combined and at the same time.  In future phases they may be applied again as the Ideal Graph changes.

Code locations involved in parsing:

/hotspot/src/share/vm/opto/parse1.cpp
/hotspot/src/share/vm/opto/parse2.cpp
/hotspot/src/share/vm/opto/parse3.cpp

The whole parsing takes places in the Parser’s constructor.  Especially interesting is the do_one_bytecode method that transforms one given bytecode into a set of Ideal Graph nodes.

2. Ideal Graph Optimizations

After the Ideal Graph is built, it is successively transformed by the compiler to achieve machine independent optimizations. The output of this process is a highly optimized graph that can be used to produce much better native code.  Figure 1 shows this super-phase further decomposed in sub-phases. The figure shows the phases executed by the compiler as boxes. The arrows indicates the phase order (an arrow goes from a previous phase to the next):

Phase order in the C2 Compiler

Figure 1. Phase order in the C2 compiler

As the picture shows, an initial Global Value Numbering phase (GVN) is performed. GVN is performed several times afterwards, almost after every phase. As the super-phase goes,  other optimization phases are performed (in this order: Inlining, Escape Analysis, Conditional Constant Propagation and several loop optimizations).

All Loop Optimization Phases are comprised in a super-phase called Ideal Loop, which is run several times during the whole optimization process. Hotspot features the following loop optimizations: Loop Unrolling, Loop Predication, Range Check Elimination and Iteration Split. The order and the number of times in which these optimizations are run within the Ideal Loop super-phase depends on the graph being optimized and the compiler’s configuration.

The compiler also performs other two phases not directly related to optimizations: Compact Node IDs (CNI) and Macro Node Elimination (MCE). CNI reassign nodes IDs to keep IDs compact. Compact IDs enables a smaller memory footprint in one-one bit map mappings, allowing O(1) in frequent operations like checking and unchecking a node as visited. MCE decompose some nodes of the Ideal Graph into simpler ones.

The boxes in Figure 1 do not represent atomic operations. Each phase in the figure can internally call other phases. For example, the Inlining phase calls repeatedly the GVN and Ideal Loop phases as the inlining of methods advances.

Code locations involved in the Ideal Graph Optimizations:

Related documentation

Its also worth reading the Master Thesis of Thomas Würthinger. Section 3.4 provides further information on the phase order of the C2.

One Comment:

  1. Nice article. Could you write something similar for the graal compiler?
    https://github.com/graalvm

    Would be awesome!

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *