

$ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null Point runs on SPECjvm2008 with -Xint shows huge improvements on half of the tests, without any regressions: I think it is fairly complete, and so would like to solicit more feedback and testing here. I tried to capture the current mechanics of stack banging in stackOverflow.hpp, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. This is why native_call argument persists, even though it is not used in x86 case anymore.

Other architectures can follow this example later. This patch makes a pilot change for x86, without touching other architectures. It also drops the need for special-casing the native_call, because we might as well bang the entire shadow zone in native case as well. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc.

The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. Most recently, it showed up in my work to get infra work reasonably fast when cold, which includes lots of interpreter paths. This shows up every time I benchmark the interpreter-only code. This is an old issue, I submitted the first RFE about this back in 2015.
