Notes on RPCEmu dynarec : The makefile does not include the relevant files as of yet, as this is totally experimental. In order to compile, you must remove arm.c, and include ArmDynarec.c and codegen_x86.c/codegen_amd64.c (depending on CPU type). This only works on x86-32 and x86-64 at the moment. Other ports are possible, but haven't been done yet (and won't be done by me). This _has_ now been tested on linux (both x86-32 and x86-64) and works! This hasn't been tested on that many programs. The ones that have been tested (and work) include : RISC OS 3.7 RISC OS 4.02 (lazy task swapping can now be enabled!) RISC OS 4.39 Quake FQuake Freedoom OpenTTD FishTank2 !PCengine MAME Freestyle Zero MetaMorph Jan3D Icon AWViewer !SICK Delirium You must be in SA110 emulation for this to work. ARM610, ARM710 and ARM7500 appear to work, but there are many problems. Compatibility is mostly the same as a real StrongARM RiscPC. RPCEmu now supports self modifying code, so some stuff that didn't work before (eg Ovation, SparkFS) works now. There are still problems with some older programs though. In order to disable the dynarec disable the cache (ie *cache 0). This returns compatibility (mostly) to normal, and to a speed only a little slower than previous versions of RPCEmu. The instruction cache currently uses 1.8 megs. This is much improved over previous versions, but could probably be improved further by shrinking the block size (shrinking the number of blocks any further has a big impact on speed) The x86-32 emulator is slightly more buggy when compiled on MSVC than on GCC. I am not sure of the reason for this. The x86-32 emulator is now an (almost) full dynarec. Native code is generated most instructions. The x86-64 version is still a threaded interpreter and as such is slower. The code blocks are flushed when the StrongARM Icache is flushed. RPCEmu also handles self modifying code - when a memory page is written to, all code blocks in that page are invalidated, and when a code block is built in a page, that page is invalidated. The performance increase on my main machine (Athlon X2 4200+) ranges from about 400% (!SICK, RISC OS in general) to 400% (!Fishtank2) to 300% (ArcQuake). Benchmarks noted : Dhrystone (!SICK) - 328k on Athlon X2 4200+ 292k on Core 2 Duo 1.66ghz 231k on Athlon XP 2400+ 34k on P2-350 (that's about ARM6 speed) On Athlon X2 4200+ - Riscosmark CPU test - 184% of StrongARM speed Firebench - 490 fps FQuake - 15 fps average (16fps on real SA) ArcQuake timedemo demo2 - 13.1fps (10.something on real SA?) The highest instruction rate I've seen so far is on Firebench, which runs at around 221 MIPS. Ideas for optimisations/design flaws: Blocks always end when the PC could be modified (depending on condition codes). In theory, if a jump is not taken, then it should be possible to continue the block and put a check in the generated code for when the jump _is_ taken. There is one exception in the current recompiler - when a branch occurs but is _not_ taken when the block is being compiled, the recompiler carries on, and puts in code to jump out if the branch is taken. The recompiler adds code at the end of a block, that checks to see if the next block has already been compiled. If it has, it jumps straight to it, instead of returning to execx86() (depending on a few conditions). This helps loops a bit, but they could be made more efficient by extending B/BL to jump backwards within the same block. Note : I have tried this now, and the performance boost is almost unnoticeable - so it is not in the SVN source. None of this is added in the x86-64 version however, as it appears to kill performance. I can't figure out _why_ it kills performance, but if it could be made to work properly it would be a nice speed increase. The x86-64 recompiler in general isn't optimal - it's mostly a direct conversion of the x86-32 code. The one improvement is that the current opcode is passed to the relevant handler through a register instead of via the stack. However, too many immediate addresses are used, which could potentially prove fatal (as they are only 32-bit). Also, some of the additional registers could be hardcoded to frequently used variables. Abort checking can be done either at the end of every LDR/STR/LDM/STM instruction, or at the end of each block (depending on the status of ABORTCHECKING - see rpcemu.h). RISC OS 3.7 works okay with ABORTCHECKING commented out, but RISC OS 4.02 and 4.39 need it set. The flag setting code in the generated code is far from optimal. Currently it uses LAHF to put the flags in AH, and tests seperately for overflow. setzn uses the fact that the Z and N flags are in the same place on both x86 and ARM, and therefore just ORs them straight in. ADD/SUB/ADC/CMP use a lookup table for C/Z/N. The memory access code could be made better, especially in LDM/STM. There have been some improvements in this area recently.