Class Emitter<N>

java.lang.Object
ghidra.pcode.emu.jit.gen.util.Emitter<N>
Type Parameters:
N - the contents of the stack after having emitted all the previous bytecodes

public class Emitter<N> extends Object
The central object for emitting type checked JVM bytecode.

This is either genius or a sign of some deep pathology. On one hand it allows the type-safe generation of bytecode in Java classfiles. On the other, it requires an often onerous type signature on any method of appreciable sophistication that uses it. The justification for this utility library stems from our difficulties with error reporting in the ASM library. We certainly appreciate the effort that has gone into that library, and must recognize its success in that it has been used by the OpenJDK itself and eventually prompted them to devise an official classfile API. Nevertheless, its analyses (e.g., max-stack computation) fail with inscrutable messages. Admittedly, this only happens when we have generated invalid bytecode. For example, popping too many items off the stack usually results in an ArrayIndexOutOfBoundsException instead of, "Hey, you can't pop that here: [offset]". Similarly, if you push a long and then pop an int, you typically get a NullPointerException. Unfortunately, these errors do not occur with the offending visitXInstruction() call on the stack, but instead during MethodVisitor.visitMaxs(int, int), and so we could not easily debug and identify the cause. We did find some ways to place breakpoints and at least derive the bytecode offset. We then used additional dumps and instrumentation to map that back to our source that generated the offending instruction. This has been an extremely onerous process. Additionally, when refactoring bytecode generation, we are left with little if any assistance from the compiler or IDE. These utilities seek to improve the situation.

Our goal is to devise a way leverage Java's Generics and its type checker to enforce stack consistency of generated JVM bytecode. We want the Java compiler to reject code that tries, for example, to emit an iload followed by an lstore, because there is clearly an int on the stack where a long is required. We accomplish this by encoding the stack contents (or at least the local knowledge of the stack contents) in this emitter's type variable <N>. We encode the types of stack entries using a Lisp-style list. The bottom of the stack is encoded as Emitter.Bot. A list is encoded with Emitter.Ent where the first type parameter is the tail of the list (for things further down the stack), and the second type parameter encodes the JVM machine type, e.g., Types.TInt, of the element at that position. The head of this list, i.e., the type <N>, is the top of the stack.

The resulting syntax for emitting code is a bit strange, but still quite effective in practice. A problem we encounter in Java (and most OOP languages to our knowledge) is that an instance method can always be invoked on a variable, no matter the variable's type parameters. Sure, we can always throw an exception at runtime, but we want the compiler to reject it, which implies static checking. Thus, while instance methods can be used for pure pushes, we cannot use them to validate stack contents, e.g., for pops. Suppose we'd like to specify the lcmp bytecode op. This would require a long at the top of the stack, but there's no way we can restrict <N> on the implied this parameter. Nor is there an obvious way to unpack the contents of <N> so that we can remove the Types.TLong and add a Types.TInt. Instead, we must turn to static methods.

This presents a different problem. We'd like to provide a syntax where the ops appear in the order they are emitted. Usually, we'd chain instance methods, like such:

 em
                .ldc(1)
                .pop();
 

However, we've already ruled out instance methods. Were we to use static methods, we'd get something like:

 Op.pop(Op.ldc(em, 1));
 

However, that fails to display the ops in order. We could instead use:

 var em1 = Op.ldc(em, 1);
 var em2 = Op.pop(em1);
 
However, that requires more syntactic kruft, not to mention the manual bookkeeping to ensure we use the previous emn at each step. To work around this, we define instance methods, e.g., emit(Function), that can accept references to static methods we provide, each representing a JVM bytecode instruction. This allows those static methods to impose a required structure on the stack. The static method can then return an emitter with a type encoding the new stack contents. (See the Op class for examples.) Thus, we have a syntax like:
 em
                .emit(Op::ldc__i, 1)
                .emit(Op::pop);
 

While not ideal, it is succinct, allows method chaining, and displays the ops in order of emission. (Note that we use this pattern even for pure pushes, where restricting <N> is not necessary, just for syntactic consistency.) There are some rubs for operators that have different forms, e.g., Op.ldc__i(Emitter, int), but as a matter of opinion, having to specify the intended form here is a benefit. The meat of this class is just the specification of the many arities of emit. It also includes some utilities for declaring local variables, and the entry points for generating and defining methods.

To give an overall taste of using this utility library, here is an example for dynamically generating a class that implements an interface. Note that the interface is not dynamically generated. This is a common pattern as it allows the generated method to be invoked without reflection.

 interface MyIf {
        int myMethod(int a, String b);
 }
 
 <THIS extends MyIf> void doGenerate(ClassVisitor cv) {
        var mdescMyMethod = MthDesc.derive(MyIf::myMethod)
                        .check(MthDesc::returns, Types.T_INT)
                        .check(MthDesc::param, Types.T_INT)
                        .check(MthDesc::param, Types.refOf(String.class))
                        .check(MthDesc::build);
        TRef<THIS> typeThis = Types.refExtends(MyIf.class, "Lmy.pkg.ImplMyIf;");
        var paramsMyMethod = new Object() {
                Local<TRef<THIS>> this_;
                Local<TInt> a;
                Local<TRef<String>> b;
        };
        var retMyMethod = Emitter.start(typeThis, cv, ACC_PUBLIC, "myMethod", mdescMyMethod)
                        .param(Def::param, Types.refOf(String.class), l -> paramsMyMethod.b = l)
                        .param(Def::param, Types.T_INT, l -> paramsMyMethod.a = l)
                        .param(Def::done, typeThis, l -> paramsMyMethod.this_ = l);
        retMyMethod.em()
                        .emit(Op::iload, paramsMyMethod.a)
                        .emit(Op::ldc__i, 10)
                        .emit(Op::imul)
                        .emit(Op::ireturn, retMyMethod.ret())
                        .emit(Misc::finish);
 }
 

Yes, there is a bit of repetition; however, this accomplishes all our goals and a little more. Note that the generated bytecode is essentially type checked all the way through to the method definition in the MyIf interface. Here is the key: We were to change the MyIf interface, the compiler (and our IDE) would point out the inconsistency. The first such errors would be on mdescMyMethod. So, we would adjust it to match the new definition. The compiler would then point out issues at retMyMethod -- assuming the parameters to myMethod changed, and not just the return type. We would adjust it, along with the contents of paramsMyMethod to accept the new parameter handles. If the return type of myMethod changed, then the inferred type of retMyMethod will change accordingly.

Now for the generated bytecode. The Op.iload(Emitter, Local) requires the given variable handle to have type Types.TInt, and so if the parameter "a" changed type, the compiler will point out that the opcode must also change. Similarly, the Op.imul(Emitter) requires two ints and pushes an int result, so any resulting inconsistency will be caught. Finally, when calling Op.ireturn(Emitter, RetReq), two things are checked: 1) there is indeed an int on the stack, and 2) the return type of the method, witnessed by retMyMethod.ret(), is also an int. There are some occasional wrinkles, but for the most part, once we resolve all the compilation errors, we are assured of type consistency in the generated code, both internally and in its interface to other compiled code.

  • Constructor Details

    • Emitter

      public Emitter(org.objectweb.asm.MethodVisitor mv)
      Create a new emitter by wrapping the given method visitor.

      Direct use of this constructor is not recommended, but is useful during transition from unchecked to checked bytecode generation.

      Parameters:
      mv - the ASM method visitor
  • Method Details

    • rootScope

      public Scope rootScope()
      Get the root scope for declaring local variables
      Returns:
      the root scope
    • emit

      public <R> R emit(Function<? super Emitter<N>,R> func)
      Emit a 0-argument operator

      This can also be used to invoke generator subroutines whose only argument is the emitter.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference, e.g., Op.pop(Emitter).
      Returns:
      the value returned by func
    • emit

      public <R, A1> R emit(BiFunction<? super Emitter<N>,A1,R> func, A1 arg1)
      Emit a 1-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference, e.g., Op.ldc__i(Emitter, int).
      arg1 - the argument (other than the emitter) to pass to func
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2> R emit(Emitter.A3Function<? super Emitter<N>,A1,A2,R> func, A1 arg1, A2 arg2)
      Emit a 2-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2, A3> R emit(Emitter.A4Function<Emitter<N>,A1,A2,A3,R> func, A1 arg1, A2 arg2, A3 arg3)
      Emit a 3-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      arg3 - the next argument
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2, A3, A4> R emit(Emitter.A5Function<? super Emitter<N>,A1,A2,A3,A4,R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4)
      Emit a 4-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      arg3 - the next argument
      arg4 - the next argument
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2, A3, A4, A5> R emit(Emitter.A6Function<? super Emitter<N>,A1,A2,A3,A4,A5,R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5)
      Emit a 5-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      arg3 - the next argument
      arg4 - the next argument
      arg5 - the next argument
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2, A3, A4, A5, A6> R emit(Emitter.A7Function<? super Emitter<N>,A1,A2,A3,A4,A5,A6,R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6)
      Emit a 6-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      arg3 - the next argument
      arg4 - the next argument
      arg5 - the next argument
      arg6 - the next argument
      Returns:
      the value returned by func
    • emit

      public <R, A1, A2, A3, A4, A5, A6, A7> R emit(Emitter.A8Function<? super Emitter<N>,A1,A2,A3,A4,A5,A6,A7,R> func, A1 arg1, A2 arg2, A3 arg3, A4 arg4, A5 arg5, A6 arg6, A7 arg7)
      Emit a 7-argument operator

      This can also be used to invoke generator subroutines.

      Type Parameters:
      R - the return type
      Parameters:
      func - the method reference
      arg1 - an argument (other than the emitter) to pass to func
      arg2 - the next argument
      arg3 - the next argument
      arg4 - the next argument
      arg5 - the next argument
      arg6 - the next argument
      arg7 - the next argument
      Returns:
      the value returned by func
    • assume

      public static <N extends Emitter.Next> Emitter<N> assume(org.objectweb.asm.MethodVisitor mv, N assumedStack)
      (Not recommended) Wrap the given method visitor with assumed stack contents

      start(ClassVisitor, int, String, MthDesc) or start(TRef, ClassVisitor, int, String, MthDesc) is recommended instead.

      Type Parameters:
      N - the stack contents
      Parameters:
      mv - the ASM method visitor
      assumedStack - the assumed stack contents
      Returns:
      the emitter
    • start

      public static Emitter<Emitter.Bot> start(org.objectweb.asm.MethodVisitor mv)
      Wrap the given method visitor assuming an empty stack

      start(ClassVisitor, int, String, MthDesc) or start(TRef, ClassVisitor, int, String, MthDesc) is recommended instead.

      Parameters:
      mv - the ASM method visitor
      Returns:
      the emitter
    • start

      public static <MR extends Types.BType, N extends Emitter.Next> Methods.Def<MR,N> start(org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR,N> desc)
      Define a static method
      Type Parameters:
      MR - the type returned by the method
      N - the parameter types of the method
      Parameters:
      cv - the ASM class visitor
      access - the access flags (static is added automatically)
      name - the name of the method
      desc - the method descriptor
      Returns:
      an object to aid further definition of the method
    • start

      public static <MR extends Types.BType, OT, N extends Emitter.Next> Methods.ObjDef<MR,OT,N> start(Types.TRef<OT> owner, org.objectweb.asm.ClassVisitor cv, int access, String name, Methods.MthDesc<MR,N> desc)
      Define an instance method
      Type Parameters:
      MR - the type returned by the method
      OT - the type owning the method
      N - the parameter types of the method
      Parameters:
      owner - the owner type (as a reference type)
      cv - the ASM class visitor
      access - the access flags (static is forcibly removed)
      name - the name of the method
      desc - the method descriptor
      Returns:
      an object to aid further definition of the method