This is a quick list of the structural performance problems I see in the commonly used implementations I'm familiar with. It is true that in some toy benchmarks, particularly those that eliminate startup time that Java can perform on-par with similar applications written in C and Java but for most real applications, I still believe it is anywhere from 20% to 10X slower. See these performance benchmarks. Performance tuning is one of the most time consuming phases of the development process so I view this loss of performance in the language as a loss of efficiency in the programming process, hence my desire to see these problems get fixed.
For most applications, the benefits of Java such as the safe memory model, "lazy linking" and improved standard apis far outweigh the performance loss so that makes Java a very successful language for most applications. But Java struggles to compete on the desktop, small devices where more performance control is necessary, small script level programs where startup time is dominant, and in applications with large memory requirements or where performance is critical. If you are competing on a price/performance curve with another company using a more efficient language, you may well lose in the long run with Java.
I also think that each of the major performance problems in the Java environment can be fixed though probably not without breaking some of the rules in the language spec. It is my opinion that the performance problems inherant in the Java platform are not structurally tied to the most important efficiency gains you get from using it. Instead, these performance problems are due to abstractions in the Java language which are not compatible with the implementation of these basic concepts in windows and unix.
This problem is partly address in JDK1.5 where you can generate a single shared archive for a given system which then can be shared between processes on that system. This is a good step but is still pretty limited. I think ultimately, this points to the fact that Java was not really designed to run efficiently on windows and unix and the difficulty of adding this functionality after the fact without changing the language spec may be intractable.
As far as I know, the only longterm approach Sun is taking as this point to fix this problem is to build yet another version of the virtual machine which can execute more than one application in the same physical address space. Though folks at Sun believe this is a "non-compromises" approach, I don't see how that can be really true. In a sense, this amounts to writing an OS on top of an OS which is both a complex thing, a technique that is bound to add overhead, unlikely to work with all native code without modification, and given the sharing of the virtual address space between processes I think that for some applications this will be intractable. Given all of that, how will this MVM concept become a seamless concept for users to understand?
Because the code is self-modifying, there is no single static representation of the code shared by all processes. It changes based on the code access pattern. In fact, hidden invalid references could lurk in any program waiting for the proper invalid instruction to be invoked in this model (though usually the compiler ensures these will not exist). If the JIT went and tried to compile an entire class, it could uncover linking errors that were "dormant" in the class file model. I don't know if it is spec'd or not that a "dormant" unresolveable reference must not be flagged but I do know that it is used.
Also self modifying code is used to invoke a method from a reference to an interface. In this case, the slot containing the method may be different from one class to the next. The instruction to invoke this method stores the "guess" at the slot index, it if matches, it is efficent but if it does not, it does a linear search through the method table. Maybe this is tuned better in the JIT? It could lead to somewhat unstable performance if you happened to switch the guess back and forth from one invocation to the next.
Instead of using dynamic code modification, you could then just insert the extra test into the code which hasn't been prelinked and avoid use of the self-modifying code altogether. Because the JLS spec's the linking behavior explicitly I don't think you can change this in a compatible way.
As for linking errors, all errors should be reported as quickly as possible by default. Add additional rules to support known linking rules and have that invalid code print an explicit error if it gets invoked. The silent linking thing is useful for development but just plain scary for deployment.
I suppose currently you could use a real Java to native code compiler but I do not know if there is a fully general system out there which supports partial and flexible combination of compiled and dynamically linked code. This approach also costs you in portability of the distributed code and so I think breaks down there too.