Shaders

Tuesday, 7th August 2007

Following sirob's prompting, I dropped the BasicEffect for rendering and rolled my own effect. After seeing the things that could be done with them (pixel and vertex shaders) I'd assumed they'd be hard to put together, and that I'd need to change my code significantly.

In reality all I've had to do is copy and paste the sample from the SDK documentation, load it into the engine (via the content pipeline), create a custom vertex declaration to handle two sets of texture coordinates (diffuse and lightmap) and strip out all of the duplicate code I had for creating and rendering from two vertex arrays.

[StructLayout(LayoutKind.Sequential)]
public struct VertexPositionTextureDiffuseLightMap {

	public Xna.Vector3 Position;
	public Xna.Vector2 DiffuseTextureCoordinate;
	public Xna.Vector2 LightMapTextureCoordinate;

	public VertexPositionTextureDiffuseLightMap(Xna.Vector3 position, Xna.Vector2 diffuse, Xna.Vector2 lightMap) {
		this.Position = position;
		this.DiffuseTextureCoordinate = diffuse;
		this.LightMapTextureCoordinate = lightMap;
	}

	public readonly static VertexElement[] VertexElements = new VertexElement[]{
		new VertexElement(0, 0, VertexElementFormat.Vector3, VertexElementMethod.Default, VertexElementUsage.Position, 0),
		new VertexElement(0, 12, VertexElementFormat.Vector2, VertexElementMethod.Default, VertexElementUsage.TextureCoordinate, 0),
		new VertexElement(0, 20, VertexElementFormat.Vector2, VertexElementMethod.Default, VertexElementUsage.TextureCoordinate, 1)
	};

}

uniform extern float4x4 WorldViewProj : WORLDVIEWPROJECTION;

uniform extern texture DiffuseTexture;
uniform extern texture LightMapTexture;

uniform extern float Time;

struct VS_OUTPUT {
    float4 Position : POSITION;
    float2 DiffuseTextureCoordinate : TEXCOORD0;
    float2 LightMapTextureCoordinate : TEXCOORD1;
};

sampler DiffuseTextureSampler = sampler_state {
    Texture = <DiffuseTexture>;
    mipfilter = LINEAR;
};

sampler LightMapTextureSampler = sampler_state {
	Texture = <LightMapTexture>;
	mipfilter = LINEAR;
};

VS_OUTPUT Transform(float4 Position : POSITION, float2 DiffuseTextureCoordinate : TEXCOORD0, float2 LightMapTextureCoordinate : TEXCOORD1) {
    
    VS_OUTPUT Out = (VS_OUTPUT)0;

    Out.Position = mul(Position, WorldViewProj);
    Out.DiffuseTextureCoordinate = DiffuseTextureCoordinate;
    Out.LightMapTextureCoordinate = LightMapTextureCoordinate;

    return Out;
}

float4 ApplyTexture(VS_OUTPUT vsout) : COLOR {
	float4 DiffuseColour = tex2D(DiffuseTextureSampler, vsout.DiffuseTextureCoordinate).rgba;
	float4 LightMapColour = tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).rgba;
    return DiffuseColour * LightMapColour;
}

technique TransformAndTexture {
    pass P0 {
        vertexShader = compile vs_2_0 Transform();
        pixelShader  = compile ps_2_0 ApplyTexture();
    }
}

Of course, now I have that up and running I might as well have a play with it...

By adding up and dividing the individual RGB components of the lightmap texture by three you can simulate the monochromatic lightmaps used by Quake 2's software renderer. Sadly I know not of a technique to go the other way and provide colourful lightmaps for Quake 1. Not very interesting, though.

I've always wanted to do something with pixel shaders as you get to play with tricks that are a given in software rendering with the speed of dedicated hardware acceleration. I get the feeling that the effect (or a variation of it, at least) will be handy for watery textures.

float4 ApplyTexture(VS_OUTPUT vsout) : COLOR {
	
	float2 RippledTexture = vsout.DiffuseTextureCoordinate;
	
	RippledTexture.x += sin(vsout.DiffuseTextureCoordinate.y * 16 + Time) / 16;
	RippledTexture.y += sin(vsout.DiffuseTextureCoordinate.x * 16 + Time) / 16;
	
	float4 DiffuseColour = tex2D(DiffuseTextureSampler, RippledTexture).rgba;
	float4 LightMapColour = tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).rgba;
    
	return DiffuseColour * LightMapColour;
	
}

My code is no doubt suboptimal (and downright stupid).

Naturally, I needed to try and duplicate Scet's software rendering simulation trick.

The colour map (gfx/colormap.lmp) is a 256×64 array of bytes. Each byte is an index to a colour palette entry, on the X axis is the colour and on the Y axis is the brightness: ie, RGBColour = Palette[ColourMap[DiffuseColour, Brightness]]. I cram the original diffuse colour palette index into the (unused) alpha channel of the ARGB texture, and leave the lightmaps untouched.

float2 LookUp = 0;
LookUp.x = tex2D(DiffuseTextureSampler, vsout.DiffuseTextureCoordinate).a;
LookUp.y = (1 - tex2D(LightMapTextureSampler, vsout.LightMapTextureCoordinate).r) / 4;
return tex2D(ColourMapTextureSampler, LookUp);

As I'm not loading the mip-maps (and am letting Direct3D handle generation of mip-maps for me) I have to disable mip-mapping for the above to work, as otherwise you'd end up with non-integral palette indices. The results are therefore a bit noisier in the distance than in vanilla Quake, but I like the 8-bit palette look. At least the fullbright colours work.

Comments

Quake XNA XNA Quake

Less Colourful Quake 2

Monday, 6th August 2007

I've transferred the BSP rendering code to use the new level loading code, so I can now display correctly-coloured Quake 2 levels. The Quake stuff is in its own assembly, and is shared by the WinForms resource browser project and the XNA renderer.

I'm also now applying lightmaps via multiplication rather than addition, so they look significantly better.

A shader solution would be optimal. I'm currently just drawing the geometry twice, the second time with some alpha blending enabled.

Comments

Quake XNA XNA Quake

Keyboard Handler Fix

Friday, 3rd August 2007

ArchG indicated a bug in the TextInputHandler class I posted a while back - no reference to the delegate instance used for the unmanaged callback is held, so as soon as the garbage collector kicks in things go rather horribly wrong.

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * XnaTextInput.TextInputHandler - benryves@benryves.com                                     *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * This is quick and very, VERY dirty.                                                       *
 * It uses Win32 message hooks to grab messages (as we don't get a nicely wrapped WndProc).  *
 * I couldn't get WH_KEYBOARD to work (accessing the data via its pointer resulted in access *
 * violation exceptions), nor could I get WH_CALLWNDPROC to work.                            *
 * Maybe someone who actually knows what they're  doing can work something out that's not so *
 * kludgy.                                                                                   *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * This quite obviously relies on a Win32 nastiness, so this is for Windows XNA games only!  *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

#region Using Statements
using System;
using System.Runtime.InteropServices;
using System.Windows.Forms; // This class exposes WinForms-style key events.
#endregion

namespace XnaTextInput {

	/// <summary>
	/// A class to provide text input capabilities to an XNA application via Win32 hooks.
	/// </summary>
	class TextInputHandler : IDisposable {

		#region Win32

		/// <summary>
		/// Types of hook that can be installed using the SetWindwsHookEx function.
		/// </summary>
		public enum HookId {
			WH_CALLWNDPROC = 4,
			WH_CALLWNDPROCRET = 12,
			WH_CBT = 5,
			WH_DEBUG = 9,
			WH_FOREGROUNDIDLE = 11,
			WH_GETMESSAGE = 3,
			WH_HARDWARE = 8,
			WH_JOURNALPLAYBACK = 1,
			WH_JOURNALRECORD = 0,
			WH_KEYBOARD = 2,
			WH_KEYBOARD_LL = 13,
			WH_MAX = 11,
			WH_MAXHOOK = WH_MAX,
			WH_MIN = -1,
			WH_MINHOOK = WH_MIN,
			WH_MOUSE_LL = 14,
			WH_MSGFILTER = -1,
			WH_SHELL = 10,
			WH_SYSMSGFILTER = 6,
		};

		/// <summary>
		/// Window message types.
		/// </summary>
		/// <remarks>Heavily abridged, naturally.</remarks>
		public enum WindowMessage {
			WM_KEYDOWN = 0x100,
			WM_KEYUP = 0x101,
			WM_CHAR = 0x102,
		};

		/// <summary>
		/// A delegate used to create a hook callback.
		/// </summary>
		public delegate int GetMsgProc(int nCode, int wParam, ref Message msg);

		/// <summary>
		/// Install an application-defined hook procedure into a hook chain.
		/// </summary>
		/// <param name="idHook">Specifies the type of hook procedure to be installed.</param>
		/// <param name="lpfn">Pointer to the hook procedure.</param>
		/// <param name="hmod">Handle to the DLL containing the hook procedure pointed to by the lpfn parameter.</param>
		/// <param name="dwThreadId">Specifies the identifier of the thread with which the hook procedure is to be associated.</param>
		/// <returns>If the function succeeds, the return value is the handle to the hook procedure. Otherwise returns 0.</returns>
		[DllImport("user32.dll", EntryPoint = "SetWindowsHookExA")]
		public static extern IntPtr SetWindowsHookEx(HookId idHook, GetMsgProc lpfn, IntPtr hmod, int dwThreadId);

		/// <summary>
		/// Removes a hook procedure installed in a hook chain by the SetWindowsHookEx function. 
		/// </summary>
		/// <param name="hHook">Handle to the hook to be removed. This parameter is a hook handle obtained by a previous call to SetWindowsHookEx.</param>
		/// <returns>If the function fails, the return value is zero. To get extended error information, call GetLastError.</returns>
		[DllImport("user32.dll")]
		public static extern int UnhookWindowsHookEx(IntPtr hHook);

		/// <summary>
		/// Passes the hook information to the next hook procedure in the current hook chain.
		/// </summary>
		/// <param name="hHook">Ignored.</param>
		/// <param name="ncode">Specifies the hook code passed to the current hook procedure.</param>
		/// <param name="wParam">Specifies the wParam value passed to the current hook procedure.</param>
		/// <param name="lParam">Specifies the lParam value passed to the current hook procedure.</param>
		/// <returns>This value is returned by the next hook procedure in the chain.</returns>
		[DllImport("user32.dll")]
		public static extern int CallNextHookEx(int hHook, int ncode, int wParam, ref Message lParam);

		/// <summary>
		/// Translates virtual-key messages into character messages.
		/// </summary>
		/// <param name="lpMsg">Pointer to an Message structure that contains message information retrieved from the calling thread's message queue.</param>
		/// <returns>If the message is translated (that is, a character message is posted to the thread's message queue), the return value is true.</returns>
		[DllImport("user32.dll")]
		public static extern bool TranslateMessage(ref Message lpMsg);


		/// <summary>
		/// Retrieves the thread identifier of the calling thread.
		/// </summary>
		/// <returns>The thread identifier of the calling thread.</returns>
		[DllImport("kernel32.dll")]
		public static extern int GetCurrentThreadId();

		#endregion

		#region Hook management and class construction.

		/// <summary>Handle for the created hook.</summary>
		private readonly IntPtr HookHandle;

		private readonly GetMsgProc ProcessMessagesCallback;

		/// <summary>Create an instance of the TextInputHandler.</summary>
		/// <param name="whnd">Handle of the window you wish to receive messages (and thus keyboard input) from.</param>
		public TextInputHandler(IntPtr whnd) {
			// Create the delegate callback:
			this.ProcessMessagesCallback = new GetMsgProc(ProcessMessages);
			// Create the keyboard hook:
			this.HookHandle = SetWindowsHookEx(HookId.WH_GETMESSAGE, this.ProcessMessagesCallback, IntPtr.Zero, GetCurrentThreadId());
		}

		public void Dispose() {
			// Remove the hook.
			if (this.HookHandle != IntPtr.Zero) UnhookWindowsHookEx(this.HookHandle);
		}

		#endregion

		#region Message processing

		private int ProcessMessages(int nCode, int wParam, ref Message msg) {
			// Check if we must process this message (and whether it has been retrieved via GetMessage):
			if (nCode == 0 && wParam == 1) {

					// We need character input, so use TranslateMessage to generate WM_CHAR messages.
					TranslateMessage(ref msg);

					// If it's one of the keyboard-related messages, raise an event for it:
					switch ((WindowMessage)msg.Msg) {
						case WindowMessage.WM_CHAR:
							this.OnKeyPress(new KeyPressEventArgs((char)msg.WParam));
							break;
						case WindowMessage.WM_KEYDOWN:
							this.OnKeyDown(new KeyEventArgs((Keys)msg.WParam));
							break;
						case WindowMessage.WM_KEYUP:
							this.OnKeyUp(new KeyEventArgs((Keys)msg.WParam));
							break;
					}

			}

			// Call next hook in chain:
			return CallNextHookEx(0, nCode, wParam, ref msg);
		}

		#endregion

		#region Events

		public event KeyEventHandler KeyUp;
		protected virtual void OnKeyUp(KeyEventArgs e) {
			if (this.KeyUp != null) this.KeyUp(this, e);
		}

		public event KeyEventHandler KeyDown;
		protected virtual void OnKeyDown(KeyEventArgs e) {
			if (this.KeyDown != null) this.KeyDown(this, e);
		}

		public event KeyPressEventHandler KeyPress;
		protected virtual void OnKeyPress(KeyPressEventArgs e) {
			if (this.KeyPress != null) this.KeyPress(this, e);
		}

		#endregion
	}
}

I wrote a crude ZSoft PCX loader (only handles 8-bit per plane, single-plane images, which is sufficient for Quake 2).

Using this loader I found colormap.pcx, which appears to perform the job of palette and colour map for Quake II.

.wal files now open with the correct palette. I've also copied over most of the BSP loading code, but it needs a good going-over to make it slightly more sane (especially where the hacks for Quake II support have been added).

Comments

C# Quake XNA XNA Quake XnaTextInput

Loader Change

Thursday, 2nd August 2007

I've started rewriting the underlying resource loading code to better handle multiple versions of the game.

To help with this I'm writing a WinForms-based resource browser.

(That's the only real Quake-related change visible in the above screenshot. I've written a cinematic (.cin, used in Quake 2) loader).

To aid loading resources I've added a number of new generic types. For example, the Picture class always represents a 32-bit per pixel ARGB 2D picture. The decoders for various formats will always have access to the resource manager, so they can request palette information if they need it. To further aid issues, there are some handy interfaces that a specific format class can implement - for example, a class (such as WallTexture for handling .wal files) implementing IPictureLoader will always have a GetPicture() method.

The loader classes are also given attributes specifying which file extensions are handled. (This project uses quite a bit of reflection now). The only issue I can see with this are files that use the same extension but have different types, such as the range of .lmp files.

In addition, certain single files within the packages have multiple sub-files (for example, the .wad files in Quake). I'm not sure how I'll handle this, but I'm currently thinking of having the .wad loader implement IPackage so you could access files via gfx/somewad.wad/somefileinthewad, but some files don't have names or extensions.

Comments

Quake XNA XNA Quake

Quake 2 and Emulation

Wednesday, 1st August 2007

The current design of the Quake project is that there are a bunch of classes in the Data namespace that are used to decode Quake's structures in a fairly brain-dead manner. To do anything useful with it you need to build up your own structures suitable for the way you intend on rendering the level.

The problem comes in when you try to load resources from different versions of Quake. Quake 1 and Quake 2 have quite a few differences. One major one is that every BSP level in Quake contains its own mip textures. You can call a method in the BSP class which returns sane texture coordinates as it can inspect the texture dimensions inside itself. Quake 2 stores all of the textures externally in .wal resources - the BSP class can no longer calculate texture coordinates as it can't work out how large the textures are as it can't see outside itself.

I guess the only sane way to work this out is to hide the native types from the end user and wrap everything up, but I've never liked this much as you might neglect to wrap up something that someone else would find very important, or you do something that is unsuitable for the way they really wanted to work.

Anyhow. I've hacked around the BSP loader to within an inch of its life and it seems to be (sort of) loading Quake 2 levels for brute-force rendering. Quake 2 boasts truecolour lightmaps, improving the image quality quite significantly!

The truecolour lightmaps show off the Strogg disco lighting to its best effect. One of the problems with the Quake II BSP file format is that the indexing of lumps inside the file has changed. Not good.

That's a bit better. Quake II's lightmaps tend to stick to the red/brown/yellow end of the spectrum, but that is a truecolour set of lightmaps in action!

The lightmaps tend to look a bit grubby where they don't line up between faces. Some trick to join all lightmaps for a plane together into a single texture should do the trick, and reduce the overhead of having to load thousands of tiny textures (which I'm guessing have to be scaled up to a power-of-two). I'll have to look into it.

On to .wal (wall texture) loading - and I can't find a palette anywhere inside the Quake II pack files. I did find a .act (Photoshop palette) that claimed to be for Quake II, but it doesn't quite seem to match. It's probably made up of the right colours, but not in the right order.

Fortunately I have some PAK files with replacement JPEG textures inside them and can load those instead for the moment.

The brightness looks strange due to the bad way I apply the lightmaps - some kludgy forced two-pass affair with alpha blending modes set to something that sort of adds the two textures together in a not-very-convincing manner.

Can anyone recommend a good introduction to shaders for XNA? I'm not really trying to do anything that exciting.

This is a really bad and vague overview of the emulation technique I use in Cogwheel, so I apologise in advance. Emulation itself is very simple when done in the following manner - all you really need is a half-decent knowledge of how the computer you're emulating works at the assembly level. The following is rather Z80-specific.

At the heart of the system is its CPU. This device reads instructions from memory and depending on the value it reads it performs a variety of different actions. It has a small amount of memory inside itself which it uses to store its registers, variables used during execution. For example, the PC register is used as a pointer to the next instruction to fetch and execute from memory, and the SP register points at the top of the stack.

It can interact with the rest of the system in three main ways:

Read/Write Memory
Input/Output Hardware
Interrupt Request

I assume you're familiar with memory. The hardware I refer to are peripheral devices such as video display processors, keypads, sound generators and so on. Data is written to and read from these devices on request. What the hardware device does with that data is up to it. I'll ignore interrupt requests for the moment.

The CPU at an electronic level communicates with memory and hardware using two buses and a handful of control pins. The two buses are the address bus and data bus. The address bus is read-only (when viewed from outside the CPU) and is used to specify a memory address or a hardware port number. It is 16 bits wide, meaning that 64KB memory can be addressed. Due to the design, only the lower 8-bits are normally used for hardware addressing, giving you up to 256 different hardware devices.

The data bus is 8-bits wide (making the Z80 an "8-bit" CPU). It can be read from or written to, depending on the current instruction.

The exact function of these buses - whether you're addressing memory or a hardware device, or whether you're reading or writing - is relayed to the external hardware via some control pins on the CPU itself. The emulator author doesn't really need to emulate these. Rather, we can do something like this:

class CpuEmulator {

	public virtual void WriteMemory(ushort address, byte value) {
		// Write to memory.
	}
	
	public virtual byte ReadMemory(ushort address) {
		// Read from memory.
		return 0x00;
	}

	public virtual void WriteHardware(ushort address, byte value) {
		// Write to hardware.
	}
	
	public virtual byte ReadHardware(ushort address) {
		// Read from hardware.
		return 0x00;
	}

}

A computer with a fixed 64KB RAM, keyboard on hardware port 0 and console (for text output) on port 1 might look like this:

class SomeBadComputer : CpuEmulator {

	private byte[] AllMemory = new byte[64 * 1024];

	public override void WriteMemory(ushort address, byte value) {
		AllMemory[address] = value;
	}
	
	public override byte ReadMemory(ushort address) {
		return AllMemory[address];
	}

	public override void WriteHardware(ushort address, byte value) {
		switch (address & 0xFF) {
			case 1:
				Console.Write((char)value);
				break;
		}
	}
	
	public override byte ReadHardware(ushort address) {
		switch (address & 0xFF) {
			case 0:
				return (byte)Console.ReadKey();
			default:
				return 0x00;
		}
	}

}

This is all very well, but how does the CPU actually do anything worthwhile?

It needs to read instructions from memory, decode them, and act on them. Suppose our CPU had two registers - 16-bit PC (program counter) and 8-bit A (accumulator) and this instruction set:

00nn   : Load 'nn' into accumulator.
01nn   : Output accumulator to port N.
02nn   : Input to accumulator from port N.
03nnnn : Read from memory address nnnn to accumulator.
04nnnn : Write accumulator to memory address nnnn.
05nnnn : Jump to address nnnn.

Extending the above CpuEmulator class, we could get something like this:

partial class CpuEmulator {

	public ushort RegPC = 0;
	public byte RegA = 0;
	
	private int CyclesPending = 0;
	
	public void FetchExecute() {
		switch (ReadMemory(RegPC++)) {
			case 0x00:
				RegA = ReadMemory(RegPC++);
				CyclesPending += 8;
				break;
			case 0x01:
				WriteHardware(ReadMemory(RegPC++), RegA);
				CyclesPending += 8;
				break;
			case 0x02:
				RegA = ReadHardware(ReadMemory(RegPC++));
				CyclesPending += 16;
				break;
			case 0x03:
				RegA = ReadMemory((ushort)(ReadMemory(RegPC++) + ReadMemory(RegPC++) * 256));
				CyclesPending += 16;
				break;
			case 0x04:
				WriteMemory((ushort)(ReadMemory(RegPC++) + ReadMemory(RegPC++) * 256), RegA);
				CyclesPending += 24;
				break;
			case 0x05:
				RegPC = (ushort)(ReadMemory(RegPC++) + ReadMemory(RegPC++) * 256);
				CyclesPending += 24;
				break;
			default:
				// NOP
				CyclesPending += 4;
				break;
		}
	}

}

The CyclesPending variable is used for timing. Instructions take a variable length of time to run (depending on complexity, length of opcode, whether it needs to access memory and so on). This time is typically measured in the number of clock cycles taken for the CPU to execute the instruction.

Using the above CyclesPending += x style one can write a function that will execute a particular number of cycles:

partial class CpuEmulator {

	public void Tick(int cycles) {
		CyclesPending -= cycles;
		while (CyclesPending < 0) FetchExecute();
	}

}

For some truly terrifying code, an oldish version of Cogwheel's instruction decoding switch block. That code has been automatically generated from a text file, I didn't hand-type it all.

Um... that's pretty much all there is. The rest is reading datasheets! Your CPU would need to execute most (if not all) instructions correctly, updating its internal state (and registers) as the hardware would. The non-CPU hardware (video processor, sound processor, controllers and so on) would also need to conform to data reads and writes correctly.

As far as timing goes, various bits of hardware need to run at their own pace. One scanline (of the video processor) is a good value for the Master System. Cogwheel provides this method to run the emulator for a single frame:

public void RunFrame() {
	this.VDP.RunFramePending = false;
	while (!this.VDP.RunFramePending) {
		this.VDP.RasteriseLine();
		this.FetchExecute(228);
	}
}

In the Master System's case, one scanline is displayed every 228 clock cycles. Some programs update the VDP on every scanline (eg changing the background horizontal scroll offset to skew the image in a driving game).

The above is embarrassingly vague, so if anyone is interested enough to want clarification on anything I'd be happy to give it.

Comments

C# Emulation Quake XNA XNA Quake

Page 33 of 54 ← 1 … 29 30 31 32 33 34 35 36 37 … 54 →