Xbox 360/Hardware information/Xenos (GPU): Difference between revisions

From Data Crystal
Jump to navigation Jump to search
m (small grammar fix)
No edit summary
 
Line 20: Line 20:


Xenos, like other Xbox console GPUs, is able to directly ingest and execute ucode without having to be compiled by the driver first, allowing programmers to squeeze out more performance by relying on the fixed HW spec.
Xenos, like other Xbox console GPUs, is able to directly ingest and execute ucode without having to be compiled by the driver first, allowing programmers to squeeze out more performance by relying on the fixed HW spec.
{{todo|Fix this.  Xenos ucodes use the 10 MSBs of the word to select the instruction, so hex values are not ideal for grouping them}}


{| class="wikitable"
{| class="wikitable"
Line 26: Line 28:
! Instruction !! Hex !! Description
! Instruction !! Hex !! Description
|-
|-
| exec || 0x10 0x2A 0x11 0x** || Start executing code
| exec || (Unknown) || Start executing code
|-
|-
| cnop? || 0xFF 0xFF 0x03 0x** || NOP, end of program
| cnop? || (Unknown) || NOP, end of program
|-
|-
| add || (Unknown) || Adds op2 and op3 and stores the result in op1
| add || (Unknown) || Adds op2 and op3 and stores the result in op1

Latest revision as of 20:41, 31 July 2024

A repository for ROMhacking relevant info for Xenos.

Basic Specifications

  • ATI Xenos @ 500MHz
    • World's first unified shading GPU
    • TeraScale uarch
    • 3 "SIMD cores" (predecessor to CUs)
    • 8 ROPS
    • 16 TMUs
    • 240 shaders
    • Direct3D Feature Level 9_0c, with some things from higher feature levels (mainly FL 10)
    • MEMEXPORT allows for GPGPU compute
    • Connected to 10MB of eDRAM framebuffer
      • eDRAM has extremely high bandwidth and PIM for free MSAA at 720p and low-cost at 1080p

Things to note: If using data from MEMEXPORT on CPU, you must enable both d3d12_readback_memexport and d3d12_readback_resolve in Xenia's options or use HW!

ucode Programming

Xenos, like other Xbox console GPUs, is able to directly ingest and execute ucode without having to be compiled by the driver first, allowing programmers to squeeze out more performance by relying on the fixed HW spec.

Hmmm...
To do:
Fix this. Xenos ucodes use the 10 MSBs of the word to select the instruction, so hex values are not ideal for grouping them
Xenos ucode
Instruction Hex Description
exec (Unknown) Start executing code
cnop? (Unknown) NOP, end of program
add (Unknown) Adds op2 and op3 and stores the result in op1
cndeq (Unknown) If op2 = 0.0f, op1 = op3, else op1 = op4


Texture Formats

Xenos Supported Texture Formats
Format Description Xenos Specific Other
Uncompressed Bitmap No Not recommended due to high VRAM use
DXT1/BC1 4x4 pixels packed into 64 bits

c0 16-bit color (ARRRRRGGGGGGBBBBB)

c1 16-bit color (ARRRRRGGGGGGBBBBB)

pixelIndices[16] 2-bit indices, c0 = 00, c1 = 01, c2 = 10, c3 = 11

c0 > c1: c2 and c3 linearly interpolate between c0 and c1

c0 < c1: c3 rather than representing a middle point between c0/c1, represents full transparency

No
DXT2/BC2 4x4 pixels pack into 128 bits

pixelAlphaTable[16] 4-bit alpha data

c0 16-bit color (-RRRRRGGGGGGBBBBB)

c1 16-bit color (-RRRRRGGGGGGBBBBB)

pixelIndices[16] 2-bit indices, c0 = 00, c1 = 01, c2 = 10, c3 = 11

c2 and c3 linearly interpolate between c0 and c1. Data is considered pre-multiplied by alpha

No
DXT3/BC2

pixelAlphaTable[16] 4-bit alpha data

c0 16-bit color (-RRRRRGGGGGGBBBBB)

c1 16-bit color (-RRRRRGGGGGGBBBBB)

pixelIndices[16] 2-bit indices, c0 = 00, c1 = 01, c2 = 10, c3 = 11

c2 and c3 linearly interpolate between c0 and c1. Data is considered NOT pre-multiplied by alpha

No
DXT3A 4x4 pixels packed into 64 bits

pixelScalars[16] 4-bit scalar value

Yes
DXT3A as 1111 Same as DXT3A, but each bit of scalar data represents mask for respective channel Yes
DXT4/BC3 4x4 pixels packed into 128 bits

a0 8-bit alpha value

a1 8-bit alpha value

pixelAlphaIndices[16] 3-bit index

c0 16-bit color (-RRRRRGGGGGGBBBBB)

c1 16-bit color (-RRRRRGGGGGGBBBBB)

If a0 > a1, all alpha indices are linearly interpolated, else only a2-a5 are linearly interpolated while a6 and a7 are 0 and 255 respectively. Data is considered pre-multiplied by alpha

No
DXT5/BC3 4x4 pixels packed into 128 bits

a0 8-bit alpha value

a1 8-bit alpha value

pixelAlphaIndices[16] 3-bit index

c0 16-bit color (-RRRRRGGGGGGBBBBB)

c1 16-bit color (-RRRRRGGGGGGBBBBB)

If a0 > a1, all alpha indices are linearly interpolated, else only a2-a5 are linearly interpolated while a6 and a7 are 0 and 255 respectively. Data is considered NOT pre-multiplied by alpha

No
DXT5A 4x4 pixels packed into 64 bits

s0 8-bit scalar

s1 8-bit scalar

pixelScalarIndices[16] 3-bit indices

Scalars are linearly interpolated

Yes
DXN 4x4 pixels packed into 128 bits

Same as DXT5A but with two channels of scalar

Yes
CTX1 4x4 pixels packed into 64 bits

s0 8-bit scalar channel 0

s1 8-bit scalar channel 1

s2 8-bit scalar channel 2

s3 8-bit scalar channel 3

pixelScalarIndices[16] 2-bit indices, shared between both channels

Yes

Useful Links

Hmmm...
To do:
Organize these better

https://www.techpowerup.com/gpu-specs/xbox-360-gpu-90nm.c1919

https://learn.microsoft.com/en-us/windows/win32/direct3d9/dx9-graphics-reference

https://learn.microsoft.com/en-us/windows/win32/direct3d9/dx9-graphics-programming-guide

https://learn.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-reference

https://learn.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-programming-guide

https://xenia.jp/updates/2021/04/27/leaving-no-pixel-behind-new-render-target-cache-3x3-resolution-scaling.html

https://en.wikipedia.org/wiki/Unified_shader_model

https://fileadmin.cs.lth.se/cs/Personal/Michael_Doggett/talks/unc-xenos-doggett.pdf

http://www.students.science.uu.nl/~3220516/advancedgraphics/papers/inferred_lighting.pdf

https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/R600_Instruction_Set_Architecture.pdf

https://en.wikipedia.org/wiki/S3_Texture_Compression

http://web.archive.org/web/20100423054747/http://msdn.microsoft.com:80/en-us/library/bb313877.aspx