Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3DNow!+Pro emulation #3217

Open
2 tasks done
Torinde opened this issue Jan 11, 2022 · 16 comments
Open
2 tasks done

3DNow!+Pro emulation #3217

Torinde opened this issue Jan 11, 2022 · 16 comments

Comments

@Torinde
Copy link
Contributor

Torinde commented Jan 11, 2022

Code of Conduct & Contributing Guidelines

  • I agree to follow the code of conduct and the contributing guidelines.

Have you checked that no other similar feature request(s) already exists?

  • I have searched and didn't find any similar issues.

Is your feature request related to a problem? Please describe.

Running many games and software in 3DNow! mode

What you want

Adding emulation of 3DNow!+ instructions to the Penitum III type (#3179), resulting in Athlon XP type with:

  • K6 (x86 registers) : SYSCALL, SYSRET
  • 3Dnow! (x87/MMX registers, K6-2) : FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW, PREFETCH, PREFETCHW
  • 3DNow! undocumented K6-2: PSWAPW (Swaps 16-bit words within 64-bit MMX register, Available on K6-2 and K6-3 only); On at least AMD K6-2, all of the unassigned 3DNow! opcodes execute as equivalents of POR (MMX bitwise-OR instruction).[133]
  • 3DNow!+ (x87/MMX registers, Athlon/K6-2+) : PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD

Adding emulation of the last 3DNow! instructions as Experimental type (in addition to Athlon XP, FISTTP, Pentium II SYSENTER/SYSEXIT):

  • 3DNow! Pro (x87/MMX registers, Geode GX/LX) : PFRSQRTV, PFRCPV

Adding Athlon XP 1GHz 1200+ (or the less catchy named Mobile Athlon 4 850 MHz) cycles to the "Emulate CPU speed" list.

Addintionally, for a gradual raise of the cycles requirements, add:

  • K6 type (Pentium MMX + K6 instructions). K6 166/200MHz cycles are already in the "Emulate CPU speed" list.
  • K6-2 type (K6 + 3DNow! instructions). K6-2 300MHz cycles are already in the "Emulate CPU speed" list.
  • K6-2+ type (K6-2 + 3DNow!+ instructions). Adding K6-2+ 450MHz cycles.
  • Athlon type (K6-2+ + only the MMX2 part of SSE/3DNow!+ instructions). Adding Athlon 500MHz cycles.

Hackipedia.org 3DNow!
https://www.ardent-tool.com/CPU/Docs_AMD.html

Attached table with CPU types, instruction sets, launch dates, cycles from the DOSBox-X "Emulate CPU speed" list menu and DOSBox wiki. DOSBox wiki goes up to Pentium III 500MHz, so it would be nice DOSBox-X to give some guidance to the users what host CPU is needed to emulate Ahtlon XP (example of benchmark procedure). Hypervisor-type core (#1089) and other performance optimizations (#1184) will help.

image
CPU instructions Win9x era.xlsx

Describe alternatives you've considered

PCem, Bochs, QEMU

Additional context

Of course comes the question of the limit of emulation aspirations (#3196). DOSBox-X websites mention as aim Win9x/Me, pre-2000, pre-WinXP, ISA slot support and maybe other.
Windows Millenium was launched 2000-09-14 and had official support until 2006-7-11. Windows XP was launched 2001-10-25. Windows Vista/2008 server are the latest to support ISA slots.1

3DNow!, Cyrix EMMI 1,2,3,4 (any software that uses EMMI?),5 and NEC Vxx 1,2,3 (those CPUs were used in NEC PC-98) are clearly included in the preceding era.

Athlon XP is a very good for next target as it upgrades the current SSE implementation with 3DNow!, which is relevant for a lot of games in the DOS/Win9x era.

Penitum 4 (SSE2) was also launched an year before WinXP. SSE2 is required for later versions of various internet browsers 1,2.

Pentium 4 Prescott adds SSE3 - relatively few instructions, one of them is already supported in Experimental (FISTTP, #2526), also helps with internet browsers 1
Athlon 64 adds SSE2/x86-64 - there is work on 64-bit DOS extenders 1,2,3 #3269 (any possibilities to use x86-64 under Win9x?)
Latest models of both (Pentium 4 662, Athlon 64 AM2, etc.) support simultaneously SSE3 and x86-64.

Finally, Core 2 adds SSSE3 (also available in Atom without x86-64: N270/N280/some Z and E models) - launched in the final month of WinMe official support, relatively few instructions and notably the last MMX instructions. Industrial PCs supporting Core 2 and ISA slots (B65, Q35, 945GC). There is also Vortex86EX2 (32-bit CPU with SSSE3 and ISA slots, launched in 2018 and available at least until 2028)
Interesting for purportedly alleviating register pressure will be eventual update to Experimental type with SSSE3 + 3DNow! (non-exiting combination), although with lesser SSE implementations it can be seen also in Ahtlon XP/64.
SSSE3 (last MMX, last instructions added in 32-bit only CPUs) requiring software: 32-bit OBS, OEL7.1, Skype for Linux, EDIUS (Win7 x64), other?

The overall aim for emulated instructions (beyond the scope of this enhancement suggestion) can be:

  • up to the latest utilized by any DOS/Win9x software available at the time of DOS/Win9x - SSE2 Kontakt 1.1,2, WinXP - SSE3 Autodesk, SSE2 and beyond emulation #4185, anything more?
  • up to the latest utilized by any DOS/Win9x/XP software, including modern ones: Win9x/XP - SSSE3 Mesa9x
  • up to the latest provided by CPU that had motherboard with DOS/Win9x drivers - SSSE3 P4M890, anything more?
  • up to the latest provided by CPU, which is 32-bit only - SSSE3 from Atom and Vortex86EX2/RDC R30460 (#4967)
  • up to the latest provided by CPU that had motherboard with WinXP drivers - AVX (Socket AM3+), anything more?
  • up to the latest provided by CPU that had mainstream desktop motherboard with ISA slots - SSE3 present in Prescott launched 2004-02-01 (P4 2.4A and Celeron D 310 2.13GHz) without any other extras like NX, 64-bit or Hyper-Threading, compatible with basic/initial P4 chipsets and boards with ISA slots. (equivalent modern remake?)
  • up to the latest provided by CPU that had motherboard with ISA slots - SSE4.1 Core 2 Quad Q9400, AVX Q77/Ivy Bridge, slightly more recent CPU/chipsets with LPC bus can also use LPC/ISA bridge (for example via TPM header) to provide ISA slots (thus at least up to AVX2/Z97/Ryzen AM4, anything more or any actual example motherboards?) Notably AVX can work under DOS and AVX512 in 32-bit code.
  • up to the latest provided by CPU that can be equipped with ISA slots (#4662) - even the most recent ones via PCIe/PCI/ISA bridge chips (for example CH367) and external USB/ISA adapters 1 2 (they even sell dosbox-x.conf and win9x.img files) - do USB or PCIe provide full DOS support for DMA?
  • up to the latest provided by CPU+motherboard that can natively run DOS/Win9x (even if most of the motherboard features remain unused due to incompatibilities) 1,2 go as high as AMD970/Piledriver, Atom and Z87/Haswell, thus adds up to SSE4a, MOVBE, AVX2. Probably the cutoff for that will be the last motherboard/laptop/device supporting MBR/UEFI Compatibility Support Module.
  • up to the latest defined by the CPU vendors for Legacy mode or Compatibility mode (for use with custom motherboard/firmware/OS). Probably the cutoff for that will be x86S (although it still retains the Compatibility mode many of the latest instructions are restricted to 64-bit mode).
  • up to the latest defined by the CPU vendors in 64-bit mode - redefined for use in 32-bit modes in virtual environments such as DOSBox-X

Obviously from the second bullet onwards the use-case is for new DOS/Win9x development and completeness.

https://en.wikipedia.org/wiki/X86_instruction_listings - also describes various undocumented and single-model-specific instructions (e.g. in some 80387 variants).

@Torinde Torinde changed the title 3DNow!+Pro support 3DNow!+Pro emulation Jan 13, 2022
@Torinde
Copy link
Contributor Author

Torinde commented Jan 13, 2022

I searched in the PCem repository and there is a lot of mentions of the individual instructions and "3DNow". K6-2+ is listed, so at least 3DNow!+ should be there.

I wanted to point more specifically to which code needs to be adopted. Will that help?

3DNOW

SYSCALL and SYSENTER
SYSRET - nothing, although there is SYSEXIT

FEMMS
PAVGUSB
PF2ID
PFACC
PFADD
PFCMPEQ
PFCMPGE
PFCMPGT
PFMAX
PFMIN
PFMUL
PFRCP
PFRCPIT1
PFRCPIT2
PFRSQIT1
PFRSQRT
PFSUB
PFSUBR
PI2FD
PMULHRW
PREFETCH
PREFETCHW

K6-2+ (K6-2P) is listed in cpu.h and readme, but I don't find the 3DNow!+ instructions:
PF2IW - nothing
PFNACC - nothing
PFPNACC - nothing
PI2FW - nothing
PSWAPD - nothing
Geode GX/LX aren't listed
PFRSQRTV - nothing
PFRCPV - nothing

Cyrix 6x86MX is listed in cpu.h and readme, but the EMMI and Cyrix-specific x87 instructions don't appear:
PAVEB - nothing
PADDSIW - nothing
PMAGW - nothing
PDISTIB - nothing
PSUBSIW - nothing
PMVZB - nothing
PMULHRW - appears, but probably that's code for the 3DNow instruction with the same mnemonic. Reading the 3DNow manual and Cyrix application note 108 seems like the calculation is quite similar, but opcodes are different?
PMVNZB - nothing
PMVLZB - nothing
PMVGEZB - nothing
PMULHRIW - nothing
PMACHRIW - nothing

FTSTP - nothing
FRINT2 - nothing
FRICHOP - nothing
FRINEAR - nothing

@Torinde
Copy link
Contributor Author

Torinde commented Jan 14, 2022

From QEMU patches

  • 3dnow.diff "adds emulation of the AMD 3DNow! and Extended 3DNow! instruction set, and also adds the CPUID bit for Extended MMX" (all 3DNow!+instructions and SYSCALL are in except PFRCPIT1, PFRCPIT2, PFRSQIT1 for which is said "no need to actually increase precision")
  • sse3.diff "adds emulation of SSE3 instructions"

@Torinde
Copy link
Contributor Author

Torinde commented Jan 16, 2022

From Bochs: 3dnow.cc
Notes:
- These instructions are not implemented yet:
PFPNACC_PqQq, PF2IW_PqQq, PFNACC_PqQq, PFCMPGE_PqQq, PFMIN_PqQq,
PFRCP_PqQq, PFRSQRT_PqQq, PFSUB_PqQq, PFADD_PqQq, PFCMPGT_PqQq,
PFMAX_PqQq, PFRCPIT1_PqQq, PFRSQIT1_PqQq, PFSUBR_PqQq, PFACC_PqQq,
PFCMPEQ_PqQq, PFMUL_PqQq, PFRCPIT2_PqQq
- CPUID does not report 3DNow! instruction set.

SSE2, SSE3, SSSE3 (and more modern ones) are also supported

@joncampbell123
Copy link
Owner

It's not happening right away, but I updated the MMX support code so that the 64-bit register can hold two 32-bit floats as preparation for future 3DNow! emulation.

Ref: https://softpixel.com/~cwright/programming/simd/3dn.php

@Torinde
Copy link
Contributor Author

Torinde commented Jan 22, 2022

Great to hear!
From 86box: SYSCALL, SYSRET

@fuel-pcbox
Copy link
Contributor

From what I can tell, it's simply impossible to use AVX unless you're in 64-bit mode, as it reuses a single-byte opcode as a prefix for VEX-encoded instructions.

@Torinde
Copy link
Contributor Author

Torinde commented Jun 6, 2022

From what I can tell, it's simply impossible to use AVX unless you're in 64-bit mode, as it reuses a single-byte opcode as a prefix for VEX-encoded instructions.

No experience myself, but what I found is in my first comment above:

@fuel-pcbox
Copy link
Contributor

That first thread is literally only showing that it's possible to save and restore the state of AVX registers under DOS. That doesn't mean it's possible to use AVX under DOS. lol

@Torinde
Copy link
Contributor Author

Torinde commented Jun 8, 2022

Intel AVX instructions will be available on both 32bit and 64bit flavors of Processor and OS.:

AVX will require OS to have 256-bit YMM state support. Once the OS adds the necessary system level support, it can decide to create 64-bit distribution and 32-bit distributions. A 32-bit OS distribution with the required state support will allow software to use AVX.

So, combining this statement with the above thread on state save/restore - it seems you can use AVX in DOS without 64-bit mode?

@fuel-pcbox
Copy link
Contributor

Weird, because AVX specifically requires the VEX encoding, and that's impossible to use in 32-bit mode.

@Torinde
Copy link
Contributor Author

Torinde commented Jun 22, 2022

Per Wikipedia you can use VEX in 32-bit mode with the following restrictions:

In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.
The VEX prefix's initial-byte values, 0xC4 and 0xC5, are the same as the opcodes of the LDS and LES instructions. Not supported in 64-bit mode, the ambiguity is resolved in 32-bit mode by exploiting the fact that a legal LDS or LES's ModRM byte can not specify a register operand; i.e., be of the form 11xxxxxx. Various bit-fields in the VEX prefix's second byte are inverted to ensure that the byte is always of this form. Similarly, the REX prefix's one-byte form has the four high-order bits set to four, which replaces sixteen opcodes numbered 0x40–0x4F. Previously, those opcodes were individual INC and DEC instructions for the eight standard processor registers; x86-64 code must use ModR/M INC and DEC instructions. [2]

Can you test that? What about AVX-512/EVEX? Here is shown that AVX-512 can be used in 32-bit code.

@Torinde
Copy link
Contributor Author

Torinde commented Jul 22, 2022

Some 32-bit only CPUs with relatively low performance (thus realistic to emulate), but supporting advanced instruction sets:

  • Intel A100 600MHz : SSE2, maybe NX-bit
  • Pentium M ULV 733J 1.1GHz : SSE2, NX-bit
  • Core Solo U1300 1.07GHz : SSE3, NX-bit, VT-x, Constant TSC
  • VIA C7 1GHz : SSE3, NX-bit
  • Atom N270 1.6GHz : SSSE3, NX-bit, MOVBE, Constant TSC

From AMD the slowest I found:

  • Sempron 2600+ 1.6GHz Palermo S754 SDA2600AIO2BA : 3DNow!+, SSE2, NX-bit
  • Sempron 2600+ 1.6GHz Palermo S754 SDA2600AIO2BO : 3DNow!+, SSE3, NX-bit

Along with PAE/PSE those will also bring closer the possibility to run as guest Win8/10 and Win7 May 2018 KB4103718 (e.g. all 32-bit Windows)

also, if that helps: most of the above are derivatives of the already supported P6: Pentium M, A100, Core Solo - just with:

  • higher top frequency, but models listed above well below the top Pentium III.
  • more cache and instructions

PCBox/PCBox/issues/41

@Torinde
Copy link
Contributor Author

Torinde commented Nov 27, 2022

For the NEC Vxx emulation - it seems 86box 3.11 adds support for its 8080 mode, I'm not sure if that's already part of DOSbox-X PC-98 mode or not.

@Torinde
Copy link
Contributor Author

Torinde commented Feb 8, 2023

@finalpatch, as I see you're aware of FPU emulation - just wondering if you're willing to make a PR for 3DNow!+?
If that helps: 86box recently added the missing 3DNow!+ instructions (and now supports all except the two "Pro" instructions from Geode), there is also a QEMU 3Dnow.diff file link above.

@Torinde
Copy link
Contributor Author

Torinde commented Aug 14, 2023

@qeeg, JHRobotics/simd95 is a "Simple hack for enabling SSE/AVX instructions on DOS and Windows 95/98"

@Torinde Torinde mentioned this issue Dec 17, 2023
2 tasks
@Torinde
Copy link
Contributor Author

Torinde commented Mar 10, 2024

FEX-Emu (x86 emulator that can be used by Wine) supports all 3DNow!, including Extended and the Geode specific instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants