DirectDraw Full CPU Usage

Started by
9 comments, last by ivan0 1 year, 10 months ago

I've downloaded SDKs of DirectX 2.0, 7.0, and 9.0 looking for a windowed rotating triangle example which the 7.0 version does include, the problem is that it takes full CPU usage which is unacceptable, 9.0 also includes both managed/unmanaged versions, and even more complex examples don't increase the CPU usage (or at least is unnoticeable).

In the end, I'm just looking for a non-fullscreen DirectX 7.0 rotating triangle in C, this must run on Windows 95, from then on I think I can handle more modern versions such as 9.0 which looks very easy to use, WebGL, etc.

In fact, I tried every code sample on versions 2.0 - 7.0 all of them using the CPU at 100%., and most if not all are VC++ 4.0 projects (2.0 are just Nmake makefiles).

Please, I really appreciate your help and comprehension.

Advertisement

I'm very curious, why the dependance on windows 95? Why the super old versions of DirectX from the 90s?

ivan0 said:
this must run on Windows 95, from then on I think I can handle more modern versions such as 9.0 which looks very easy to use

So you're targeting something that has been outdated for 25 years with Win98? And DX9 was tied to Windows XP but extended a few times, dead for about a decade but you still want to use it? They're certainly things you can do, but it isn't recommended.

DX12 is already 7 years old, and has seven big sub-versions. DX13 is expected later this year.

You're unlikely to get any support, and even driver support is spotty in emulators. It is pretty bad for learning on. It may look easy to use, but the knowledge is severely out of date, like learning how to repair a Model T car. It is boutique knowledge from the past, but not especially useful today.

ivan0 said:
the problem is that it takes full CPU usage which is unacceptable … full CPU usage … all of them using the CPU at 100%

You never actually asked a question, but I'm assuming you're asking about how to reduce CPU usage.

How are you measuring it? If you're looking at task manager your numbers are extremely rough estimates and often overstated. Process Explorer gives more accurate numbers, but still rough estimates. They're also useless at identifying problems.

Go get Pix for Windows (it is a free tool from Microsoft used by game developers everywhere) or other actual profilers and measure your code. Sampling profilers can reveal a lot, and instrumented profilers can reveal a lot more. You can see exactly what functions are taking the time, exactly where the bottlenecks are, identify exactly what hotspots your code is encountering.

Anything less of actual profiling tools is just a guess, and quite often they're wrong. As a first pass you'll likely identify some unexpectedly huge nested loops, wasteful processing patterns, redundant expensive system calls, and much more. Given your decades-old technology, you'll also find a lot of time spent in emulation layers as they try to make today's hardware follow long-discarded patterns they no longer want to use. But without actually measuring it with an accurate profiler, you can't actually know.

It's an old technology I know, I always wanted to do Windows 95 graphics development, and DirectX 7.0 was the only thing I had available back in 2002, yes, by that time it was a 7-year-old OS, DirectX 9.0 was coming out, Pentium 4 the hot thing to have but I only had a 100 Mhz Pentium (Toshiba Satellite 200 CDS), now the retro computing thing is back in this age of internet and online collaboration, I really appreciate your help and your opinions, and also I'm not going to give up on this project, is that impossible to use an old API?

As to the window mode problem, according to my research, DirectDraw was made to work in fullscreen only and that's partly why the high CPU usage, so I will have to use performance profilers as @frob mentioned before, and also a little bit of remote Windbg using a COM port, just for old-times sake.

As one last thing, I wrote this simple render function that does nothing but presents a blue screen and swallows the whole CPU.

void render() {
	D3DCOLOR backgroundColor = D3DRGBA(.0f, .2f, .4f, 1.0f);
	_d3dDevice->Clear(0, NULL, D3DCLEAR_TARGET, backgroundColor, 0, 0);
	_frontBuffer->Blt(&_renderRect, _backBuffer, NULL, DDBLT_WAIT, NULL);
}

ivan0 said:
swallows the whole CPU

How did you measure this?

What you can see in a tool like Task Manager is extremely rough and often over-stated. You need to use a proper profiler like PIX so you can see how busy your program actually is.

The three lines you wrote in your example don't provide nearly enough information. You're drawing directly on the front buffer but not saying what else you do. You don't show any type of delay or pause, such as a device Present() call, which is a more typical delay system used in games.

Thanks for your reply @frob the following is the complete CPP source code:

#define STRICT
#define D3D_OVERLOADS
#include <windows.h>
#include <d3d.h>
#pragma comment(lib, "ddraw.lib")
#pragma comment(lib, "dxguid.lib")

#define WIN_WIDTH 200
#define WIN_HEIGHT 200

struct RENDERDIMENSIONS { DWORD dwWidth; DWORD dwHeight; };

LPDIRECTDRAW7 _dd;
LPDIRECTDRAWSURFACE7 _frontBuffer;
LPDIRECTDRAWSURFACE7 _backBuffer;
LPDIRECT3D7 _d3d;
LPDIRECT3DDEVICE7 _d3dDevice;
LPDIRECT3DVERTEXBUFFER7 v_buffer;
RECT _renderRect;
RENDERDIMENSIONS _renderDimensions;
D3DVERTEX _triangleVertices[3];


void isDXError(HRESULT hRet, char *msg) {
	if (FAILED(hRet)) {
		MessageBoxA(0, msg, "DirectX Error", MB_ICONERROR);
		exit(0);
	}
}

void createFrontBuffer() {
	HRESULT hRet;
	DDSURFACEDESC2 ddsd;

	memset(&ddsd, 0, sizeof(ddsd));
	ddsd.dwSize = sizeof(ddsd);
	ddsd.dwFlags = DDSD_CAPS;
	ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE;
	hRet = _dd->CreateSurface(&ddsd, &_frontBuffer, NULL);
	isDXError(hRet, "CreateSurface DDSCAPS_PRIMARYSURFACE");
}

void GetRenderDimensions(HWND hwnd) {
	GetClientRect(hwnd, &_renderRect);
	ClientToScreen(hwnd, (POINT*)&_renderRect.left);
	ClientToScreen(hwnd, (POINT*)&_renderRect.right);
	_renderDimensions.dwWidth = _renderRect.right - _renderRect.left;
	_renderDimensions.dwHeight = _renderRect.bottom - _renderRect.top;
}

void createBackBuffer(HWND hwnd) {
	HRESULT hRet;
	DDSURFACEDESC2 ddsd;

	memset(&ddsd, 0, sizeof(ddsd));
	ddsd.dwSize = sizeof(ddsd);
	ddsd.dwFlags = DDSD_WIDTH | DDSD_HEIGHT | DDSD_CAPS;
	ddsd.ddsCaps.dwCaps = DDSCAPS_OFFSCREENPLAIN | DDSCAPS_3DDEVICE | DDSCAPS_VIDEOMEMORY;
	ddsd.dwWidth = _renderDimensions.dwWidth;
	ddsd.dwHeight = _renderDimensions.dwHeight;
	hRet = _dd->CreateSurface(&ddsd, &_backBuffer, NULL);
	isDXError(hRet, "DDSCAPS_OFFSCREENPLAIN");
}

void createD3D() {
	HRESULT hRet;
	hRet = _dd->QueryInterface(IID_IDirect3D7, (LPVOID *)&_d3d);
	isDXError(hRet, "QueryInterface _d3d");
}

void createD3DDevice() {
	HRESULT hRet;

	hRet = _d3d->CreateDevice(IID_IDirect3DHALDevice, _backBuffer, &_d3dDevice);
	if (FAILED(hRet)) {
		hRet = _d3d->CreateDevice(IID_IDirect3DRGBDevice, _backBuffer, &_d3dDevice);
		isDXError(hRet, "CreateDevice");
	}
}

void createBuffers(HWND hwnd) {
	createFrontBuffer();
	createBackBuffer(hwnd);
}

void createDirectDraw(HWND hwnd) {
	HRESULT hRet;

	hRet = DirectDrawCreateEx(NULL, (VOID**)&_dd, IID_IDirectDraw7, NULL);
	isDXError(hRet, "DirectDrawCreateEx");

	hRet = _dd->SetCooperativeLevel(hwnd, DDSCL_NORMAL);
	isDXError(hRet, "SetCooperativeLevel");
}

void createClipper(HWND hwnd) {
	HRESULT hRet;
	LPDIRECTDRAWCLIPPER clipper;
	hRet = _dd->CreateClipper(0, &clipper, NULL);
	isDXError(hRet, "CreateClipper");
	clipper->SetHWnd(0, hwnd);
	_frontBuffer->SetClipper(clipper);
}

void initD3D(HWND hwnd) {
	GetRenderDimensions(hwnd);
	createDirectDraw(hwnd);
	createBuffers(hwnd);
	createClipper(hwnd);
	createD3D();
	createD3DDevice();
}

void render() {
	D3DCOLOR backgroundColor = D3DRGBA(.0f, .2f, .4f, 1.0f);
	_d3dDevice->Clear(0, NULL, D3DCLEAR_TARGET, backgroundColor, 0, 0);
	_frontBuffer->Blt(&_renderRect, _backBuffer, NULL, DDBLT_WAIT, NULL);
	
}

LRESULT CALLBACK WndProc(
	HWND hwnd,
	UINT msg,
	WPARAM wParam,
	LPARAM lParam) {

	switch (msg) {
	case WM_DESTROY:
		PostQuitMessage(0);
		break;
	}
	return DefWindowProcW(hwnd, msg, wParam, lParam);
}

RECT getWindowCenter() {
	RECT rect;
	GetClientRect(GetDesktopWindow(), &rect);
	rect.left = (rect.right / 2) - (WIN_WIDTH / 2);
	rect.top = (rect.bottom / 2) - (WIN_HEIGHT / 2);
	return rect;
}

int WINAPI WinMain(
	HINSTANCE hInstance,
	HINSTANCE hPrevInstance,
	LPSTR lpCmdLine,
	int nCmdShow) {

	MSG  msg;
	HWND hwnd;
	WNDCLASSW wc;

	RECT rect = getWindowCenter();

	memset(&wc, 0, sizeof(WNDCLASSW));
	memset(&msg, 0, sizeof(MSG));
	wc.lpszClassName = L"myWin";
	wc.hInstance = hInstance;
	wc.hbrBackground = GetSysColorBrush(COLOR_3DFACE);
	wc.hCursor = LoadCursor(NULL, IDC_ARROW);
	wc.lpfnWndProc = WndProc;

	RegisterClassW(&wc);
	hwnd = CreateWindowW(
		wc.lpszClassName,
		L"DirectX 7.0 Triangle",
		WS_OVERLAPPEDWINDOW,
		rect.left,
		rect.top,
		WIN_WIDTH,
		WIN_HEIGHT,
		NULL,
		NULL,
		hInstance,
		NULL);

	ShowWindow(hwnd, nCmdShow);

	initD3D(hwnd);

	while (msg.message != WM_QUIT)
	{
		if (PeekMessage(&msg, NULL, 0U, 0U, PM_REMOVE))
		{
			TranslateMessage(&msg);
			DispatchMessage(&msg);
		}
		else
			render();
	}

	return msg.wParam;
}

I'm a C# developer so excuse me if I used an ugly naming convention.

There is nothing fancy or strange here, I am measuring the CPU consumption with the Task Manager, why should I need something else when the code is so simple? why is the equivalent DirectX 9.0 example running without any noticeable increase in CPU usage?

By downloading and checking Microsoft's DXSDK you can see the triangle code sample by yourself and compare the 7.0 version with 9.0, paths are:

Visual C++ 4.0 - ..\dx7sdk-7001\samples\multimedia\d3dim\src\tutorials\triangle

Visual Studio 2005 (Express works as well) - ..\dxsdk_aug2007\Samples\C++\Direct3D\Tutorials\Tut03_Matrices

My guess (and it is only a guess because I can't measure it) is that this code looks to just be continuously writing to video memory. It doesn't do any real work or processing, only writing to video memory as fast as it can, you create a back buffer but don't even use it. When you get some real processing involved, and also start drawing to the back buffer, you'll find your problems have gone away. Do some real rendering, especially when you start to solve the inevitable graphical tearing by writing to back buffer followed by the device's present(), and you'll see it clear up. I don't notice any significant blocking calls nor any significant system calls that happen with Real Work. Start doing Real Work and you'll start hitting realistic performance.

Also I don't have the relatively ancient compilers and libraries available so I couldn't build it and test it out even if I wanted. VS4.0 was a LONG time ago. Back then I was using Borland products and have those on the shelf, the oldest Visual Studios I still have lying around were VS6.0 from 1998, but again, the only way to know for certain where the time is being spent is to profile the code with a real profiler. Task Manager is a very rough approximation and it over-estimates generally, since it only looks at if time slices are allocated, not the actual CPU timing measurements that a profiler can measure.

mueller.andrew@gmail.com said:

I'm very curious, why the dependance on windows 95? Why the super old versions of DirectX from the 90s?

Retrocomputing is back thanks to FPGAs, furthermore, internet is cheaper these days, it wasn't back then in 2000, Pentium 4 was already available but I couldn't afford anything but my 100 Mhz Pentium 8 MB RAM

frob said:

My guess (and it is only a guess because I can't measure it) is that this code looks to just be continuously writing to video memory. It doesn't do any real work or processing, only writing to video memory as fast as it can, you create a back buffer but don't even use it. When you get some real processing involved, and also start drawing to the back buffer, you'll find your problems have gone away. Do some real rendering, especially when you start to solve the inevitable graphical tearing by writing to back buffer followed by the device's present(), and you'll see it clear up. I don't notice any significant blocking calls nor any significant system calls that happen with Real Work. Start doing Real Work and you'll start hitting realistic performance.

I suspect that the expected behavior of Directdraw is swallowing the whole CPU no matter what, Fury^3 is the game that promoted the then newest DirectX API, and yet it uses 100% CPU on a modern machine.

I'm almost giving up…, except that I want to give it a try to Windbg, do some remote debugging, and see what I can do now on fullscreen mode.

frob said:

Also I don't have the relatively ancient compilers and libraries available so I couldn't build it and test it out even if I wanted. VS4.0 was a LONG time ago. Back then I was using Borland products and have those on the shelf, the oldest Visual Studios I still have lying around were VS6.0 from 1998, but again, the only way to know for certain where the time is being spent is to profile the code with a real profiler. Task Manager is a very rough approximation and it over-estimates generally, since it only looks at if time slices are allocated, not the actual CPU timing measurements that a profiler can measure.

I recently downloaded everything from winworldpc.com, Windows 95, VC++4.0, SDK, DDK, etc… I never had anything available in physical format, and also I'm using Virtual PC 2007 to virtualize Win95.

From a glance I see two issues:

1)
For one, you're not handling WM_PAINT. Add your render call in there as well, followup by ValidateRect (validate the full window) and return zero. Do NOT pass WM_PAINT on to DefWindowProc
Without doing that you receive WM_PAINT all the time.

2)
Process all pending messages in your message pump, not only one. Use PM_NOREMOVE for PeekMessage to return right away if no message is in the queue.
while ( mainLoopRunning )
{
while ( PeekMessage( &msg, NULL, 0, 0, PM_NOREMOVE ) )
{
BOOL bRet = GetMessage( &msg, NULL, 0, 0 );
if ( bRet == 0 )
{
// Break out of loop
goto main_loop_done;
}
else if ( bRet == -1 )
{
// Error occurred, break out of loop
goto main_loop_done;
}
else
{
::TranslateMessage( &msg );
::DispatchMessage( &msg );
}
}
Your Render call here
}

To the admins: God, this forum message editor is getting worse and worse. STOP MESSING WITH MY LINE BREAKS AND F*CKING UP THE INDENTATION OF THE CODE SNIPPETS. Oh, and auto-italicate from underscores is also stupid. As you see I tried to add constants with underscores in their names. Fixing up formatting errors of the message editor took more time than to write the actual reply.

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

@undefined there isn't much improvement, good point about the return 0 on WM_PAINT, I'm reading the directx_sdk.chm included in DirectX 9.0 SDK and they recommend that too, I'm just trying to get the best of each approach but still, I'm stuck on this and I think it's unsolvable, I saw an answer on StackOverflow about OpenGL 1.1 on Windows 95 and same issue, 100% CPU load, I think it's the expected behavior, ugly time to be a programmer for those APIs.

Now I have read that Win95 devs used SoftICE to debug and not Windbg so one more tool to learn about if I want to get to that old-school graphics.

This topic is closed to new replies.

Advertisement