My project takes too long to program

Started by
89 comments, last by JoeJ 3 years, 1 month ago

NikiTo said:

@aganm This is not really relevant. Saying that it is too long is enough.

I can't really imagine what sort of practical problem would take that long to solve. Theoretical research may take an unbounded amount of time to make progress, but software is typically pretty amenable to both estimation, and taking shortcuts to cut down the amount of work.

It seems to me that you need to define an MVP (minimum viable product), and cut all features that don't contribute to that out for now.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Advertisement

@swiftcoder With sufficient shaving of micro-seconds from the execution time, anything can take ages. Complexity of the solution increases exponentially as processing time decreases.

I have a simple rule for this. The time I invest in making it faster should at least be paid back in its execution. So if I spend an hour improving, I want an hour time gain from the computer. If that isn't feasible, I don't spend the time on it, since I don't get the result faster.

There are lots of other interesting things to do instead.

@swiftcoder Project of such kind i think - chemical emulator. Completely unexpected behavior of materials popping up everywhere. Hard to organize inside a program. Needs constant speed optimizations.

I have found very few MVPs in my project. One is - it is enough to run on a single GPU/CPU model. Not feeling tempted to rewrite my SM5 shaders to use SM6 either. The client will have to port it to other architectures.

@alberth I don't think @swiftcoder was referring to cost/benefit of the performance alone. More that any single problem that is possible to be solved using computers should be possible to implement in a shorter timeframe. And then the point of removing excess optimization adventures and feature creep to cut down to the actual MVP.

That makes perfect sense to me.

Alberth said:
Complexity of the solution increases exponentially as processing time decreases.

This is, I'm sure, not always true - although if the only road you can see to optimize anything, for number-crunching only means adding more logic, then yes. And I'm on board with what you mean. I'm convinced we ought to go for a sweet spot: You optimize and add some complexity, arriving at a nice, performant solution.

After that, you're "just" adding further micro-optimizations which won't really benefit when it comes to keeping the project readable, even though it might make the solution a bit more performant. I know this is exactly what you mean by the above quote.

Like you said, it can come to just a matter of being paid back in the time it takes, but I'm sure there's more to OP's problem than that.

I'm saying that it's not worth going over the sweet spot. Not necessarily due to not gaining much more performance, but that you're losing the overview/readability of your implementation more easily when it's more complex.

@SuperVGA I work with big data. I can export to resources to be tested by C++, the result of only a very small portion of the work the app does. And it is already 300MB on disk. It takes time to add the debugging code inside the HLSL. Because i can not just output it everything. I need to decide at which point in the code and in the flow what variables i can export. Then i change the focus over the big data. Not programmatically. I change a define. And this way i test various small chunks of the big data.

In the beginning it was impossible for me to work without HLSL. Without some of the optimizations i made, these shaders would not run not even on a Titan. Mostly optimizations aimed to lower the amount of VRAM used.

I think if somebody is developing a game with GI, it helps to have a fast sandbox.

99% of my time goes for bugs hunting anyways. But i don't blame myself, GPUs are complex to debug.

As soon as it is acceptable slow to use the CPU, i will go for C#. My coding style inside C++ is completely different than HLSL. In C++ i optimize nothing at all, never. I don't like C++ anyways(personal opinion, a matter of taste and freedom of speech). I can't wait to switch to CPU.

Still a big project. Added optimizations and bug hunting…… never ending story.

I use a “purge the error” or “better hang than wrong” style of programming too. It doesn't help with productivity. (Edit: but could be slower in the end, if full of hidden bugs.)

NikiTo said:

@SuperVGA I work with big data. I can export to resources to be tested by C++, the result of only a very small portion of the work the app does. And it is already 300MB on disk. It takes time to add the debugging code inside the HLSL. Because i can not just output it everything. I need to decide at which point in the code and in the flow what variables i can export. Then i change the focus over the big data. Not programmatically. I change a define. And this way i test various small chunks of the big data.

That's good - you should test various small chunks. IMO that is the right approach; sample it randomly and often if it is.
I had some unwieldy databases where I applied test and development of extensions to a small part of the dataset, otherwise I wouldn't have gotten to go home until my first run was over.

NikiTo said:

99% of my time goes for bugs hunting anyways. But i don't blame myself, GPUs are complex to debug.

Yes, I agree. I'm not about to tell you “do this, do that", and I don't know more than what you've been willing to state here.
But “obviously” the issue can be to make the compute stuff run as stupidly as possible, making for fewer potential issues on the GPU.
It might be easier said than done, but I'll personally go through lengths not to have to sit debugging HLSL.

NikiTo said:

As soon as it is acceptable slow to use the CPU, i will go for C#. My coding style inside C++ is completely different than HLSL. In C++ i optimize nothing at all, never. I don't like C++ anyways(personal opinion, a matter of taste and freedom of speech). I can wait to switch to CPU.

Still a big project. Added optimizations and bug hunting…… never ending story.

Alright, I have it the other way around wrt. language, but it's a preferential thing - It also depends on what I'm making.
If you don't run a lot of stuff on the CPU anyways, I guess it hardly even matters when it comes to performance.

I still think it can, at least in general, be important to find that sweet spot where stuff is maintainable and easy to understand/debug,
while being as fast as it can. I'm by no means saying that you've over-engineered your solution, or that you need to optimize further. It was merely a remark on @alberth 's statement, which there is a lot of truth to.

NikiTo said:

I use a “purge the error” or “better hang than wrong” style of programming too. It doesn't help with productivity.

If that style was used from the birth of the project, I'm quite sure that would in deed have helped with the productivity.
Think of all the issues waiting to be revealed if you didn't prefer that style. You'd discover stuff way after implementing, trying on and integrating new parts (you'd then spend even more time bug-hunting; somewhere above those 99%)

NikiTo said:
I work with big data. I can export to resources to be tested by C++, the result of only a very small portion of the work the app does. And it is already 300MB on disk.

  1. If you test un C++ side, does that mean you have the same implementation in C++ as in CS? If so, why don't you finish your project with C++ alone, and after this works port back to CS? Much less than half work, more than conpensating the longer wait on processing.
  2. 300MB becasue of debug data, or 300MB per image? Can you use smaller images to reduce processing times while developing algorithms?

Obvious questions ofc.

Another question: Remembering detecting lines is part of your work, would it help to have an algorithm which can detect direction of the line per pixel in linear time? If line is pronounced (edges), resulting vector magnitude is high, if not (noise), magnitude is small.

SuperVGA said:
But “obviously” the issue can be to make the compute stuff run as stupidly as possible, making for fewer potential issues on the GPU.

You are right. I did not do it that way, partially by the need for speed, but honestly, partly because of my mental problem with premature optimization. I guess it would have worked without the optimizations i did not because of the need but because of… mental. My TDR time limit is at 30 seconds. For example, without the optimizations, one of the shader would have been 50 times slower.

Usually i first program the shader in the dumb way, and then the monitor turns black, amd says - timeout error, I am like - WTF, not even 30 secs enough? And start optimizing. (Add to it some mental…) Similar for making it fit inside the VRAM.

I do it in a very dumb way inside the debug code for example.

I have many folders with big data.
But i take only one of them and work on it.

I make the GPU parse small chunk of the data in that folder in order to fit inside the VRAM. I divide the work to N parts. Then i call the dispatch() from inside a loop from the C++. Every loop parses a small amount of data. Otherwise it does not fit inside the VRAM. On that small amount, i can make some cheap tests that covers all the dimensions of the dispatch. Some tests are too heavy and they run only on one dimension/tick. Like that -

#ifdef DEBUG
….not optimized at all debugging code goes here….

a lighter test here(sends few debug data to the resource)

if (threadID.y > 8 && threadID.y << 16) {
a heavier test here(sends more debug data to the resource)
}

if (threadID.z == 12) {
sending lot of debug data to the resource
}
#endif

It is all very tight. Once in a month, i run it with #define DEBUG outcommented, and it takes lot of time to compute. But not a problem for me. This will be the final product and most probably the client will use it on his supercomputer, not me. I need it fast enough for my developing process to not stop.

SuperVGA said:
Alright, I have it the other way around wrt. language, but it's a preferential thing - It also depends on what I'm making. If you don't run a lot of stuff on the CPU anyways, I guess it hardly even matters when it comes to performance.

CPU waits for the GPU to be ready with the first passes. Staying idle. Much later, much much later, the CPU will work on the data produced by the GPU from folder.00001 while the GPU is working on folder.00002.

SuperVGA said:
If that style was used from the birth of the project, I'm quite sure that would in deed have helped with the productivity.

Not quite from the start. For my first tests i was taking the image of the debug resource, putting it in Krita, then making changes, running the shaders again, and putting it again in Krita, then hiding/unhiding until manually seeing an error. Then i made HTML/JS compare the two resources. Then i started making more and more effort. Still i use HTML/JS for visualization of the data. When i want to see some graph, not only a message “all tests passed OK!” With big data, sometimes i could say - “it is not pink enough. Something is wrong” haha

Inside the HLSL, in some situations, i don't initialize variables, because i want the shader to crash if some of my “if {} else if” misses the closing "else".

And my code inside C++ is intentionally different than the HLSL version. If i used “while” inside the shader, i use "for" inside C++(id does matter in my case). C++ version is explicitly different -

C++:
if (!(n ≤ m)) {
var = bim(n);
var += bam(var);
} else {
var = bim(m);
var += bam(var);
}

HLSL:
if (n > m) {
var = bim(n);
} else {
var = bim(m);
}

var += bam(var);

To say it in a way as an example. It is more complicated than that. It helps to have two different ways of doing it. To rephrase it, it helps me to rethink again what i am exactly doing. Computer will not do it wrong, it helps me to find my human made bugs of human nature. For the computer is the same. So i am not copy-pasting code from the shader to C++.

Then in the tests i am explicit - Is a pixel at the correct place? Does it have correct value? If pixel is not there, is it expected to be missing? And so on, many on


One thing for sure, compared to before the tests, i have one mental problem less - anxiety for if it works correctly. With these tests i sleep much much better.

And the client will take it, if take it, the way it is. They have enough money to hire lot of top tier programmers to port it to any language/hardware they want. Once i sell it, i am free as a bird.

@JoeJ What you suggest is very logical. But because of the exigences of the algo i can not do it the way you suggest.

I could have lied to you that i need lines, or i could have told you the truth. Or right now, I could be intentionally making you doubt in order to take you away from the truth. Secrecy gives me superpowers!

This topic is closed to new replies.

Advertisement