Dynamic Dispatch Deep Dive

This blog post summarizes a somewhat random excursion into dynamic dispatch implementations in Go, plus a bit of Java. It assumes the audience knows a bit about dynamic dispatch and its compiler implementations (vtable and all that).

Dynamic Dispatch in Golang

Here's what happened -- I stumbled onto an article about Golang (the content is irrelevant); somehow it made me recall that, when I was using Golang for a distributed system course project, there was something interesting about Golang's interface design. I can't remember the details, so I Googled a bit, and ran into this StackOverflow answer.

Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time.

Cool, so how does Go compute these method tables at runtime?

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data. Assigning b to an interface value of type Stringer sets both words of the interface value.

This sounds just like fat pointer for Rust Trait Object. It trades storage and conversion cost for faster method lookup at call sites. I think Haskell typeclass might fit into this camp as well.

Go's dynamic type conversions mean that it isn't reasonable for the compiler or linker to precompute all possible itables: there are too many (interface type, concrete type) pairs, and most won't be needed. Instead, the compiler generates a type description structure for each concrete type like Binary or int or func(map[int]string). Among other metadata, the type description structure contains a list of the methods implemented by that type. Similarly, the compiler generates a (different) type description structure for each interface type like Stringer; it too contains a method list. The interface runtime computes the itable by looking for each method listed in the interface type's method table in the concrete type's method table. The runtime caches the itable after generating it, so that this correspondence need only be computed once.

Hmm...? I understand that there are dynamic cases where we can't precompute the itable (e.g., casting one interface to another), but always computing these tables at runtime sounds expensive? Yeah premature optimization is a devil, but this really got me interested.

Then I looked into the second link from the StackOverflow post, and not surprisingly, Mr. Anonymous had the same question, from 14 years ago.

The relatively expensive operation is assignment to an interface value - but even it is pretty cheap. The first time a concrete type hits an interface type, it builds a hash table entry that points to a vtable. Second and subsequent assignments of the same type will do a much cheaper hash lookup to find the vtable.

Why hash-table? It seems unnecessarily more expensive than an alternative implementation.

And yes, exactly:

The compiler knows all the information necessary to build the interface-object involved in the conversion. Because of this, the fact that Go currently performs some runtime computation and caching to create the interface-object seems a little odd.

There's more discussion, and eventually it went off this topic, but before that, I got what I need, merci Ian.

gccgo does precompute the tables that it knows that it needs. However, there is also a dynamic component: an arbitrary type may be converted to an arbitrary interface, and in that case the method table must be built at runtime. gccgo does this as well.

These are good sanity checks, but now I want to see this in the compiler source code, and luckily there was a clue from the first article linked in the StackOverflow post. It talks a bit about the algorithmic complexity of building up the itable, and left a no-longer-valid code pointer:

we can build the mapping in O(ni + nt) time instead.

If you click on that link, it takes you to the landing page of Golang's github repo. But if you hover your cursor over that link, you see the hidden clue -- code.google.com/p/go/source/browse/src/pkg/runtime/iface.c#98, and thankfully things didn't move around that much, and I found this iface.go in the current repo. I think it became a .go file because the compiler bootstraps itself now.

Now things will get a bit more technical.

Diving into the Golang codebase

So I cloned the repo at commit 7f76c00f, and started exploring from iface.go. and found this getitab method:

src/runtime/iface.go

Lines 35 to 94 in 7f76c0

func getitab(inter *interfacetype, typ *_type, canfail bool) *itab {
	if len(inter.Methods) == 0 {
		throw("internal error - misuse of itab")
	}

	// easy case
	if typ.TFlag&abi.TFlagUncommon == 0 {
		if canfail {
			return nil
		}
		name := toRType(&inter.Type).nameOff(inter.Methods[0].Name)
		panic(&TypeAssertionError{nil, typ, &inter.Type, name.Name()})
	}

	var m *itab

	// First, look in the existing table to see if we can find the itab we need.
	// This is by far the most common case, so do it without locks.
	// Use atomic to ensure we see any previous writes done by the thread
	// that updates the itabTable field (with atomic.Storep in itabAdd).
	t := (*itabTableType)(atomic.Loadp(unsafe.Pointer(&itabTable)))
	if m = t.find(inter, typ); m != nil {
		goto finish
	}

	// Not found.  Grab the lock and try again.
	lock(&itabLock)
	if m = itabTable.find(inter, typ); m != nil {
		unlock(&itabLock)
		goto finish
	}

	// Entry doesn't exist yet. Make a new entry & add it.
	m = (*itab)(persistentalloc(unsafe.Sizeof(itab{})+uintptr(len(inter.Methods)-1)*goarch.PtrSize, 0, &memstats.other_sys))
	m.Inter = inter
	m.Type = typ
	// The hash is used in type switches. However, compiler statically generates itab's
	// for all interface/type pairs used in switches (which are added to itabTable
	// in itabsinit). The dynamically-generated itab's never participate in type switches,
	// and thus the hash is irrelevant.
	// Note: m.Hash is _not_ the hash used for the runtime itabTable hash table.
	m.Hash = 0
	itabInit(m, true)
	itabAdd(m)
	unlock(&itabLock)
finish:
	if m.Fun[0] != 0 {
		return m
	}
	if canfail {
		return nil
	}
	// this can only happen if the conversion
	// was already done once using the , ok form
	// and we have a cached negative result.
	// The cached result doesn't record which
	// interface function was missing, so initialize
	// the itab again to get the missing function name.
	panic(&TypeAssertionError{concrete: typ, asserted: &inter.Type, missingMethod: itabInit(m, false)})
}

Let's start with the function signature:

  • getitab says "here's an object, get me the itable for this interface"
  • inter *interfacetype is the interface type
  • typ *_type is the object type, which makes sense since fetching an itable shouldn't require the object's instance data.

Then notice the super performant and not-confusing double rounds of lookup for itable, it basically revolves around a find method, and a global hashmap from (interface_type, concrete_type) to itable:

src/runtime/iface.go

Lines 15 to 28 in 7f76c0

const itabInitSize = 512

var (
	itabLock      mutex                               // lock for accessing itab table
	itabTable     = &itabTableInit                    // pointer to current table
	itabTableInit = itabTableType{size: itabInitSize} // starter table
)

// Note: change the formula in the mallocgc call in itabAdd if you change these fields.
type itabTableType struct {
	size    uintptr             // length of entries array. Always a power of 2.
	count   uintptr             // current number of filled entries.
	entries [itabInitSize]*itab // really [size] large
}

src/runtime/iface.go

Lines 96 to 119 in 7f76c0

// find finds the given interface/type pair in t.
// Returns nil if the given interface/type pair isn't present.
func (t *itabTableType) find(inter *interfacetype, typ *_type) *itab {
	// Implemented using quadratic probing.
	// Probe sequence is h(i) = h0 + i*(i+1)/2 mod 2^k.
	// We're guaranteed to hit all table entries using this probe sequence.
	mask := t.size - 1
	h := itabHashFunc(inter, typ) & mask
	for i := uintptr(1); ; i++ {
		p := (**itab)(add(unsafe.Pointer(&t.entries), h*goarch.PtrSize))
		// Use atomic read here so if we see m != nil, we also see
		// the initializations of the fields of m.
		// m := *p
		m := (*itab)(atomic.Loadp(unsafe.Pointer(p)))
		if m == nil {
			return nil
		}
		if m.Inter == inter && m.Type == typ {
			return m
		}
		h += i
		h &= mask
	}
}

Then it's the itable allocation and initiation, the worth-looking-at bits is this itabInit function. It also implements the aforementioned "build the mapping in O(ni + nt) time". Lastly it explains why we have this if m.Fun[0] != 0 { check at the call site. These intuitive error handling mechanisms certainly make the code a pleasure to read:

src/runtime/iface.go

Lines 189 to 248 in 7f76c0

// itabInit fills in the m.Fun array with all the code pointers for
// the m.Inter/m.Type pair. If the type does not implement the interface,
// it sets m.Fun[0] to 0 and returns the name of an interface function that is missing.
// If !firstTime, itabInit will not write anything to m.Fun (see issue 65962).
// It is ok to call this multiple times on the same m, even concurrently
// (although it will only be called once with firstTime==true).
func itabInit(m *itab, firstTime bool) string {
	inter := m.Inter
	typ := m.Type
	x := typ.Uncommon()

	// both inter and typ have method sorted by name,
	// and interface names are unique,
	// so can iterate over both in lock step;
	// the loop is O(ni+nt) not O(ni*nt).
	ni := len(inter.Methods)
	nt := int(x.Mcount)
	xmhdr := (*[1 << 16]abi.Method)(add(unsafe.Pointer(x), uintptr(x.Moff)))[:nt:nt]
	j := 0
	methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.Fun[0]))[:ni:ni]
	var fun0 unsafe.Pointer
imethods:
	for k := 0; k < ni; k++ {
		i := &inter.Methods[k]
		itype := toRType(&inter.Type).typeOff(i.Typ)
		name := toRType(&inter.Type).nameOff(i.Name)
		iname := name.Name()
		ipkg := pkgPath(name)
		if ipkg == "" {
			ipkg = inter.PkgPath.Name()
		}
		for ; j < nt; j++ {
			t := &xmhdr[j]
			rtyp := toRType(typ)
			tname := rtyp.nameOff(t.Name)
			if rtyp.typeOff(t.Mtyp) == itype && tname.Name() == iname {
				pkgPath := pkgPath(tname)
				if pkgPath == "" {
					pkgPath = rtyp.nameOff(x.PkgPath).Name()
				}
				if tname.IsExported() || pkgPath == ipkg {
					ifn := rtyp.textOff(t.Ifn)
					if k == 0 {
						fun0 = ifn // we'll set m.Fun[0] at the end
					} else if firstTime {
						methods[k] = ifn
					}
					continue imethods
				}
			}
		}
		// didn't find method
		// Leaves m.Fun[0] set to 0.
		return iname
	}
	if firstTime {
		m.Fun[0] = uintptr(fun0)
	}
	return ""
}

Finally it adds the new itable to the global cache and releases the lock. Overall there's nothing surprising, time to look at where getitab gets used. Luckily, there are only 4 places where getitab is called directly, and they basically serve 2 language constructs in Go -- type assertion and type switch.

Type Assertion

In Go, type assertions basically does casting.

These 3 functions in iface.go uses getitab:

  • typeAssert
  • assertE2I
  • assertE2I2

If you do git grep "typeAssert(", you will only find 1 declaration and 1 definition. So where is this function invoked? Well if you do git grep "\<typeAssert\>", that gives you a few places, and the ones that matter are

cmd/compile/internal/ssagen/ssa.go:144: ir.Syms.TypeAssert = typecheck.LookupRuntimeFunc("typeAssert")
cmd/compile/internal/typecheck/builtin.go:101:  {"typeAssert", funcTag, 67},

If you go to builtin.go, you'll see that we have a runtimeDecls for a bunch of go functions including typeAssert, the name suggests that it enables accessing these functions at runtime rather than compile time. runtimeDecls is used in this initRuntime which is pretty self explanatory:

src/cmd/compile/internal/typecheck/syms.go

Lines 68 to 71 in 7f76c0

// InitRuntime loads the definitions for the low-level runtime functions,
// so that the compiler can generate calls to them,
// but does not make them visible to user code.
func InitRuntime() {

If you go to ssa.go and search for Syms.TypeAssert, you should arrive in the midst of the dottype1 function, where the compiler generates one of 3 calls (each to one of the aforementioned functions from iface.go):

src/cmd/compile/internal/ssagen/ssa.go

Lines 6760 to 6771 in 7f76c0

		// Call into runtime to get itab for result.
		if descriptor != nil {
			itab = s.rtcall(ir.Syms.TypeAssert, true, []*types.Type{byteptr}, d, typ)[0]
		} else {
			var fn *obj.LSym
			if commaok {
				fn = ir.Syms.AssertE2I2
			} else {
				fn = ir.Syms.AssertE2I
			}
			itab = s.rtcall(fn, true, []*types.Type{byteptr}, target, typ)[0]
		}

Let's dive into these 3 code paths. First, what's this commaok? Recall that assertE2I panics on failure, whereas assertE2I2 returns nil. So if commaok == false, the type assert ends up panicking at runtime? Aha the commaok must mean the str, ok := obj.(string) pattern in Go.

If you trace further to see where the value of commaok was first set, you'll see that it comes from either dottype or dynamicDottype -- the name should make sense now, it's referring to x.(type) in Go, and the "dynamic" part is for generics x.(T). If you check the gitblame for dynamicDottype, you'll see the commit in which Keith Randall implemented generic x.(T) operation. If you are like me and think "hmm this suggests Go's generics is not like C++ or Rust generics, where compiler emits new and specialized code upon generics instantiation", you are like 50% right. I'll write another blog on generics implementations.

Let's look at the places dottype and dynamicDottype are used:

src/cmd/compile/internal/ssagen/ssa.go

Lines 1499 to 1506 in 7f76c0

	case ir.OAS2DOTTYPE:
		n := n.(*ir.AssignListStmt)
		var res, resok *ssa.Value
		if n.Rhs[0].Op() == ir.ODOTTYPE2 {
			res, resok = s.dottype(n.Rhs[0].(*ir.TypeAssertExpr), true)
		} else {
			res, resok = s.dynamicDottype(n.Rhs[0].(*ir.DynamicTypeAssertExpr), true)
		}

src/cmd/compile/internal/ssagen/ssa.go

Lines 2956 to 2959 in 7f76c0

	case ir.ODOTTYPE:
		n := n.(*ir.TypeAssertExpr)
		res, _ := s.dottype(n, false)
		return res

src/cmd/compile/internal/ssagen/ssa.go

Lines 5663 to 5669 in 7f76c0

	case ir.ODOTTYPE, ir.ODYNAMICDOTTYPE:
		var v *ssa.Value
		if n.Op() == ir.ODOTTYPE {
			v, _ = s.dottype(n.(*ir.TypeAssertExpr), false)
		} else {
			v, _ = s.dynamicDottype(n.(*ir.DynamicTypeAssertExpr), false)
		}

So what are these OAS2DOTTYPE ir constructs? Thankfully they are somewhat documented:

src/cmd/compile/internal/ir/node.go

Lines 145 to 145 in 7f76c0

	OAS2DOTTYPE // Lhs = Rhs (x, ok = I.(int))

src/cmd/compile/internal/ir/node.go

Lines 185 to 185 in 7f76c0

	ODOTTYPE       // X.Ntype or X.Type (.Ntype during parsing, .Type once resolved); after walk, Itab contains address of interface type descriptor and Itab.X contains address of concrete type descriptor

src/cmd/compile/internal/ir/node.go

Lines 299 to 299 in 7f76c0

	ODYNAMICDOTTYPE  // x = i.(T) where T is a type parameter (or derived from a type parameter)

It should be pretty obvious why only OAS2DOTTYPE corresponds to commaok == true. The else branch for OAS2DOTTYPE might look sketchy, but a bit more digging will lead you to this:

src/cmd/compile/internal/typecheck/stmt.go

Lines 104 to 111 in 7f76c0

		case ir.ODOTTYPE:
			r := r.(*ir.TypeAssertExpr)
			stmt.SetOp(ir.OAS2DOTTYPE)
			r.SetOp(ir.ODOTTYPE2)
		case ir.ODYNAMICDOTTYPE:
			r := r.(*ir.DynamicTypeAssertExpr)
			stmt.SetOp(ir.OAS2DOTTYPE)
			r.SetOp(ir.ODYNAMICDOTTYPE2)

Great, commaok makes sense now, what about descriptor? It also had some documentations:

src/cmd/compile/internal/ssagen/ssa.go

Lines 6555 to 6557 in 7f76c0

// descriptor is a compiler-allocated internal/abi.TypeAssert whose address is passed to runtime.typeAssert when
// the target type is a compile-time-known non-empty interface. It may be nil.
func (s *state) dottype1(pos src.XPos, src, dst *types.Type, iface, source, target, targetItab *ssa.Value, commaok bool, descriptor *obj.LSym) (res, resok *ssa.Value) {

Okay, that sounds like some generics related thing again, i.e., if we are type asserting to a generic type, descriptor == nil? Well only the dynamicDottype code path has descriptor explicitly set to nil, so I think that's a fair conjecture. But why do we call typeAssert at runtime in this path? Well, Keith added some low level caching optimization, and I'm not looking into that because there's enough derailing.

Type Switch

In Go type switch is syntactic + performance sugar for a series of type assertions.

In the compiler source, this is the only other place getitab gets called, nothing fancy:

src/runtime/iface.go

Lines 520 to 539 in 7f76c0

// interfaceSwitch compares t against the list of cases in s.
// If t matches case i, interfaceSwitch returns the case index i and
// an itab for the pair <t, s.Cases[i]>.
// If there is no match, return N,nil, where N is the number
// of cases.
func interfaceSwitch(s *abi.InterfaceSwitch, t *_type) (int, *itab) {
	cases := unsafe.Slice(&s.Cases[0], s.NCases)

	// Results if we don't find a match.
	case_ := len(cases)
	var tab *itab

	// Look through each case in order.
	for i, c := range cases {
		tab = getitab(c, t, true)
		if tab != nil {
			case_ = i
			break
		}
	}

Again, in ssa.go we find this interfaceSwitch function registered as ir.Syms.InterfaceSwitch, and used in a single place; it's pretty intuitive.

src/cmd/compile/internal/ssagen/ssa.go

Lines 2094 to 2096 in 7f76c0

		r := s.rtcall(ir.Syms.InterfaceSwitch, true, []*types.Type{typs.Int, typs.BytePtr}, d, t)
		s.assign(n.Case, r[0], false, 0)
		s.assign(n.Itab, r[1], false, 0)

A bit more looking on the IR types you should arrive at the intermediate lowering for type switch. I won't go into all the details, but you'll find interesting things like separating concrete v.s. interface cases, and special handling for generics. The initial version of generic type switch is added in this commit.

src/cmd/compile/internal/walk/switch.go

Lines 552 to 552 in 7f76c0

			isw := ir.NewInterfaceSwitchStmt(base.Pos, caseVar, s.itabName, typeArg, dotHash, lsym)

Compile time Optimziation

If you are not lost yet, you might remember why we started this excursion -- how Go constructs the itables, and how it optimizes for certain static cases. We've seen the former, now let's look for the optimizations (i.e., when target type is not an interface).

Intuitively, if the target type is not an interface type (e.g., we are casting from an interface to its underlying concrete type via interface.(Type)), we can pre-compute the itable for (type, interface) at compile time, then load the itable from the interface and do a simple pointer equality check, since each (type, interface) pair has a unique itable.

Indeed, this is exactly what happens in dottype1, first note that in the Type Assert section, we've been looking at a specific branch of dottype1:

src/cmd/compile/internal/ssagen/ssa.go

Lines 6560 to 6560 in 7f76c0

	if dst.IsInterface() {

Outside that branch, which corresponds to target type being a non-interface, we see exactly what we conjectured above, also note the documentation for targetItab:

src/cmd/compile/internal/ssagen/ssa.go

Lines 6791 to 6813 in 7f76c0

	direct := types.IsDirectIface(dst)
	itab := s.newValue1(ssa.OpITab, byteptr, iface) // type word of interface
	if base.Debug.TypeAssert > 0 {
		base.WarnfAt(pos, "type assertion inlined")
	}
	var wantedFirstWord *ssa.Value
	if src.IsEmptyInterface() {
		// Looking for pointer to target type.
		wantedFirstWord = target
	} else {
		// Looking for pointer to itab for target type and source interface.
		wantedFirstWord = targetItab
	}

	var tmp ir.Node     // temporary for use with large types
	var addr *ssa.Value // address of tmp
	if commaok && !ssa.CanSSA(dst) {
		// unSSAable type, use temporary.
		// TODO: get rid of some of these temporaries.
		tmp, addr = s.temp(pos, dst)
	}

	cond := s.newValue2(ssa.OpEqPtr, types.Types[types.TBOOL], itab, wantedFirstWord)

src/cmd/compile/internal/ssagen/ssa.go

Lines 6553 to 6553 in 7f76c0

// If src is a nonempty interface and dst is not an interface, targetItab is an itab representing (dst, src). Otherwise it is nil.

Now the question is, where was targetItab first assigned? If you look at the 2 call sites of dottype1, you'll see that it's from n.ITab:

src/cmd/compile/internal/ir/expr.go

Lines 670 to 676 in 7f76c0

type TypeAssertExpr struct {
	miniExpr
	X Node

	// Runtime type information provided by walkDotType for
	// assertions from non-empty interface to concrete type.
	ITab Node `mknode:"-"` // *runtime.itab for Type implementing X's type

If you run git grep "\.ITab =", you'll get a few matches, but I think the most relevant (and decipherable) ones are these, which processes the regular and generic dot type expressions (e.g., x.(T)).

src/cmd/compile/internal/walk/expr.go

Lines 705 to 710 in 7f76c0

func walkDotType(n *ir.TypeAssertExpr, init *ir.Nodes) ir.Node {
	n.X = walkExpr(n.X, init)
	// Set up interface type addresses for back end.
	if !n.Type().IsInterface() && !n.X.Type().IsEmptyInterface() {
		n.ITab = reflectdata.ITabAddrAt(base.Pos, n.Type(), n.X.Type())
	}

src/cmd/compile/internal/walk/expr.go

Lines 735 to 739 in 7f76c0

// walkDynamicDotType walks an ODYNAMICDOTTYPE or ODYNAMICDOTTYPE2 node.
func walkDynamicDotType(n *ir.DynamicTypeAssertExpr, init *ir.Nodes) ir.Node {
	n.X = walkExpr(n.X, init)
	n.RType = walkExpr(n.RType, init)
	n.ITab = walkExpr(n.ITab, init)

Specifically, since we only care about the case where target type is a compile-time non-interface, I think we can safely ignore walkDynamicDotType and instead focus on the reflectdata.ITabAddrAt function called in walkDotType. Its definition is pretty simple:

src/cmd/compile/internal/reflectdata/reflect.go

Lines 840 to 852 in 7f76c0

// ITabAddrAt returns an expression that evaluates to the
// *runtime.itab value for concrete type typ implementing interface
// iface.
func ITabAddrAt(pos src.XPos, typ, iface *types.Type) *ir.AddrExpr {
	s, existed := ir.Pkgs.Itab.LookupOK(typ.LinkString() + "," + iface.LinkString())
	lsym := s.Linksym()

	if !existed {
		writeITab(lsym, typ, iface, false)
	}

	return typecheck.LinksymAddr(pos, lsym, types.Types[types.TUINT8])
}

If you trace the lsym := s.Linksym() call, you'll end up seeing these things, I don't understand them, but to me they are saying "emitting some itable symbol to the object file, which serve as static data that'll be read by the optimized type assert expressions"

src/cmd/internal/obj/sym.go

Lines 89 to 115 in 7f76c0

// LookupABIInit looks up a symbol with the given ABI.
// If it does not exist, it creates it and
// passes it to init for one-time initialization.
func (ctxt *Link) LookupABIInit(name string, abi ABI, init func(s *LSym)) *LSym {
	var hash map[string]*LSym
	switch abi {
	case ABI0:
		hash = ctxt.hash
	case ABIInternal:
		hash = ctxt.funchash
	default:
		panic("unknown ABI")
	}

	ctxt.hashmu.Lock()
	s := hash[name]
	if s == nil {
		s = &LSym{Name: name}
		s.SetABI(abi)
		hash[name] = s
		if init != nil {
			init(s)
		}
	}
	ctxt.hashmu.Unlock()
	return s
}

src/cmd/internal/obj/link.go

Lines 450 to 467 in 7f76c0

// An LSym is the sort of symbol that is written to an object file.
// It represents Go symbols in a flat pkg+"."+name namespace.
type LSym struct {
	Name string
	Type objabi.SymKind
	Attribute

	Size   int64
	Gotype *LSym
	P      []byte
	R      []Reloc

	Extra *interface{} // *FuncInfo, *VarInfo, *FileInfo, or *TypeInfo, if present

	Pkg    string
	PkgIdx int32
	SymIdx int32
}

src/cmd/internal/obj/link.go

Lines 1009 to 1029 in 7f76c0

// Link holds the context for writing object code from a compiler
// to be linker input or for reading that input into the linker.
type Link struct {
	Headtype           objabi.HeadType
	Arch               *LinkArch
	Debugasm           int
	Debugvlog          bool
	Debugpcln          string
	Flag_shared        bool
	Flag_dynlink       bool
	Flag_linkshared    bool
	Flag_optimize      bool
	Flag_locationlists bool
	Flag_noRefName     bool   // do not include referenced symbol names in object file
	Retpoline          bool   // emit use of retpoline stubs for indirect jmp/call
	Flag_maymorestack  string // If not "", call this function before stack checks
	Bso                *bufio.Writer
	Pathname           string
	Pkgpath            string           // the current package's import path
	hashmu             sync.Mutex       // protects hash, funchash
	hash               map[string]*LSym // name -> sym mapping

And if you trace the writeITab call, you'll see more details about the actual itable emission, and a helper that generates symbols for the methods.

src/cmd/compile/internal/reflectdata/reflect.go

Lines 1291 to 1294 in 7f76c0

// writeITab writes the itab for concrete type typ implementing interface iface. If
// allowNonImplement is true, allow the case where typ does not implement iface, and just
// create a dummy itab with zeroed-out method entries.
func writeITab(lsym *obj.LSym, typ, iface *types.Type, allowNonImplement bool) {

src/cmd/compile/internal/reflectdata/reflect.go

Lines 352 to 358 in 7f76c0

		sig := &typeSig{
			name:  f.Sym,
			isym:  methodWrapper(t, f, true),
			tsym:  methodWrapper(t, f, false),
			type_: typecheck.NewMethodType(f.Type, t),
			mtype: typecheck.NewMethodType(f.Type, nil),
		}

I'll stop here. If you are interested in the historical development of these features and optimizations, I recommend playing with the git blames, I ended up finding a lot of interesting conversations and old PRs, back when things were a bit simpler.

OpenJDK codebase quick tour

After poking around in the Go compiler source code, I thought about Java and JVM, because I vaguely recalled that JVM doesn't have this fat pointer thing like Go, Rust, etc. So how is dynamic dispatch implemented at the JVM level?

I cloned the openjdk source at commit 6d5699617ff, and did some cursory search. I'm mainly interested in what happens when we have an interface method call, because in such case compiler has no clue what the underlying object type is, so we either have a fat pointer like Go, or do some slow method lookup from the underlying type's interface tables.

I happen to remember the relevant bit of JVM bytecode -- invokeinterface, so I just searched for it in the codebase. There are a lot of matches, but with a bit of searching I found the interpreter implementation:

src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp

Lines 2302 to 2303 in 6d5699

      CASE(_invokeinterface): {
        u2 index = Bytes::get_native_u2(pc+1);

I kinda lost the energy to look further, but we can still see that invokeinterface involves some itable check when the method is fully virtual:

src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp

Lines 2394 to 2421 in 6d5699

        itableOffsetEntry* ki = (itableOffsetEntry*) int2->start_of_itable();
        int i;
        for ( i = 0 ; i < int2->itable_length() ; i++, ki++ ) {
          if (ki->interface_klass() == iclass) break;
        }
        // If the interface isn't found, this class doesn't implement this
        // interface. The link resolver checks this but only for the first
        // time this interface is called.
        if (i == int2->itable_length()) {
          CALL_VM(InterpreterRuntime::throw_IncompatibleClassChangeErrorVerbose(THREAD, rcvr->klass(), iclass),
                  handle_exception);
        }
        int mindex = interface_method->itable_index();

        itableMethodEntry* im = ki->first_method_entry(rcvr->klass());
        callee = im[mindex].method();
        if (callee == nullptr) {
          CALL_VM(InterpreterRuntime::throw_AbstractMethodErrorVerbose(THREAD, rcvr->klass(), interface_method),
                  handle_exception);
        }

        istate->set_callee(callee);
        istate->set_callee_entry_point(callee->from_interpreted_entry());
        if (JVMTI_ENABLED && THREAD->is_interp_only_mode()) {
          istate->set_callee_entry_point(callee->interpreter_entry());
        }
        istate->set_bcp_advance(5);
        UPDATE_PC_AND_RETURN(0); // I'll be back...

On the other hand, for the other invoke... bytecode, even the virtual case is a lot simpler than the above

src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp

Lines 2450 to 2478 in 6d5699

            } else {
              // get receiver
              int parms = entry->number_of_parameters();
              // this works but needs a resourcemark and seems to create a vtable on every call:
              // Method* callee = rcvr->klass()->vtable()->method_at(cache->f2_as_index());
              //
              // this fails with an assert
              // InstanceKlass* rcvrKlass = InstanceKlass::cast(STACK_OBJECT(-parms)->klass());
              // but this works
              oop rcvr = STACK_OBJECT(-parms);
              VERIFY_OOP(rcvr);
              Klass* rcvrKlass = rcvr->klass();
              /*
                Executing this code in java.lang.String:
                    public String(char value[]) {
                          this.count = value.length;
                          this.value = (char[])value.clone();
                     }

                 a find on rcvr->klass() reports:
                 {type array char}{type array class}
                  - klass: {other class}

                  but using InstanceKlass::cast(STACK_OBJECT(-parms)->klass()) causes in assertion failure
                  because rcvr->klass()->is_instance_klass() == 0
                  However it seems to have a vtable in the right location. Huh?
                  Because vtables have the same offset for ArrayKlass and InstanceKlass.
              */
              callee = (Method*) rcvrKlass->method_at_vtable(entry->table_index());

This feels a bit slow. Maybe there are plenty of optimizations like jit or de-virtualization which will help performance. Although not using fat pointer means that upcasting to an interface should be a no-op in Java, but shouldn't method call happens a lot more frequently at runtime, than these upcasts?

Some thoughts on Go v.s. other languages

  • Rust trait object similar to Golang interface in representation, but Go is more ad-hoc (the jargon is structural typing), it doesn't require a struct to specify "I implement this interface".
  • Note that structural typing is the root cause of having to construct the itable at runtime in Go, because you simply can't know at compile time all the possible cases you'd need.
  • On the contrary, in C++ and Java, all objects have this fixed set of "implemented interfaces", so all the "itables" could be pre-computed, and runtime casting across interfaces just needs to lookup in this itable set set, rather than traversing over all the methods and trying to construct an itable.

Extra Resources

Here are some additional articles I enjoyed reading, while investigating this dynamic dispatch topic.