GameGuru MAX Performance Adventures
Lee Bamber
Fairbourne, Wales
- 0 Collaborators
As I work on my next game maker project, GameGuru MAX, I want to chronicle my little foray into performance tuning on the GPU and CPU side. It's been a while since I used GPA and a LONG while since I used VTUNE, and so it will be interesting to share my experiences using both to boost my FPS. ...learn more
Project status: Under Development
Intel Technologies
Intel GPA,
Intel vTune
Overview / Usage
Creating a game maker is nothing new to me these days, but I would like to target performance as a key feature of the development, ensuring the code follows best practises to fully load available GPU and CPU cores.
Methodology / Approach
I will be learning and then using the GPA and VTUNE profiling tools to first understand the current state of the software, and then chase down the low hanging fruit performance gains. As much as it is fun to accidentally stumble on a great optimization, it is much more rewarding to find the worst offenders and hack them out, and then receive an instant reward of more speed and smoothness.
Technologies Used
DAY ONE
Stage One - Learning By Watching
GPU Side (GPA)
Capture Location | Intel® Graphics Performance Analyzers Quick Tip
Capture Triggers | Intel® Graphics Performance Analyzers Quick Tips
Deferred Stream Capture | Intel® Graphics Performance Analyzers Quick Tip
Diff Mode | Intel® Graphics Performance Analyzers Quick Tip
Key-bindings | Intel® Graphics Performance Analyzers Quick Tip
Keyframes | Intel® Graphics Performance Analyzers Quick Tip
Multi-Frame Analysis | Intel® Graphics Performance Analyzers Quick Tip
Pause Mode | Intel® Graphics Performance Analyzers Quick Tip
Screenshots | Intel® Graphics Performance Analyzers Quick Tip
Shader Profile | Intel® Graphics Performance Analyzers Quick Tip
UI Modifications | Intel® Graphics Performance Analyzers Quick Tip
GPU + CPU Side (GPA)
Graphics Trace Analyzer Deep Dive | Part 1 | Configure and Capture a Trace
Graphics Trace Analyzer Deep Dive | Part 2 | Open and Explore
Graphics Trace Analyzer Deep Dive | Part 3 | Configure Custom Layout
Graphics Trace Analyzer Deep Dive | Part 4 | Data Selection and Summary Insights
CPU Side (VTUNE)
https://devmesh.intel.com/projects/boost-cpu-performance-with-intel-vtune-profiler
Stage Two - Install the Tools
Getting Intel GPA is super easy, and you just need to type Intel GPA and download the latest Windows installer it presents. The tool installs normally, and desktop icons provide the various tools mentioned in the tutorials above.
There are a number of ways to get a free version of VTune but I found mine by searching for VTune Profiler 2020 and finding this installer "VTune_Profiler_2020_update2_setup.exe". Again, the installation was easy, though it will be interesting to see where 1.6GB of installation goes. Nice to see it automatically detected by Visual Studio 2017 for integration, will be interesting to see what that provides.
The installation dropped me off here, which is a getting started the article, so we follow that next:
https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-vtune/top.html
Stage Three - The Hard Part
So I am on my way, selecting my app and running an analysis on it, going 'slowly' through the built-in help mode so I know what the buttons are called. By the end of a few hours, I hit my first snag. When drilling down the hottest function, it greyed out the 'Source' button so I can only see the assembly, not very useful. I am guessing the release build needs full symbol information associated, but this can be a pain in itself. It is here I stopped for the day, to regroup once I have got my VS project to produce symbols, then I can resume my VTune adventure.
DAY TWO
Understanding that Symbol Information in my Release executable is rather important, my first task was to make sure I had plenty of symbols in my binary. This involved basically switching anything ON that I could find; Generate Debug Info, Program Database, Generate Map File, Enable Browse Information. One of those is bound to produce the magic symbols VTUNE needed!
I also referenced an Intel link for extra guidance: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/windows-targets/debug-information-for-windows-application-binaries.html
Confirming I had the necessary PDB and MAP files associated with my binary, I was ready to once again venture into the world of VTUNE for Dummies, booted up the project, went to Intel VTune Profiler > Configure Analysis with VTune profiler, clicked the HotSpot Start button and ran through the launch of GameGuru MAX.
Now on my PC, the launch takes 11 seconds in Release Mode, and on some users, machines show a (Not Responding) for a few seconds while it does important things. I wanted to get that down to less than 5 seconds, and hopefully, avoid the Not Responding message altogether. Hopefully, the results of the analysis of just the launch period would reveal much.
The great news is that I could find actual source code when I clicked a few tabs and lines, the bad news is that my brain is starting to melt. To a VTUNE expert, the screen is clean and uncluttered. To the uninitiated, it's basically a page of text, and almost all the references are meaningless. The summary is values and charts, not a clue what it all means except what I can guess at face value. Bottom-Up has potential in that it shows bars indicating high and low activity, but it starts off showing low-level libraries like KernalBase and NTDLL. Sorry but I am not interested in optimizing my operating system, I only want to see what my GameGuruMAX.exe process is doing, I can drill down to low levels later on. So I can sort by the process, but then I lose sorting by importance, so that's out. I fiddled with the filters but they too seem to keep references to all the low-level distraction stuff. What I had hoped to find was a sort of nest of all the function calls that happened to start with WinMain, so I could effectively drop down from higher hierarchies to lower ones to find the problem sections of the code.
The closest I found was Top-down Tree which did as I described, allowed me to next down all my functions, and it placed me at a function called "DirectX:Decompress" which it says took 68.7% of my CPU Time : Total. I am going to take that as a clue and investigate around this function call with the line step method to see if this indicator is accurate. If so, top marks to VTUNE :)
DAY THREE
More resources to enjoy, including some deep dives (yay) to get to the heart of things:
I am prepping soon for another foray into what my CPUs are doing, especially when the software first launches and gets the UI ready for the user. One thing I am keen to try is putting pretty much anything I can onto a background thread and putting the user right into the UI within 5 seconds on a min-spec machine. Even if icons and other panels have to 'pop' in a few seconds later, having instant control to navigate the UI, load in-game projects, and generally get started will give the software a slick and solid feel. Having spent most of my coding life as a sequential thinker, it is going to be interesting running pretty much everything in parallel during the launch step :)
DAY FOUR
What a difference a day makes! Installed the latest version of VTUNE (11th January 2022) and within twenty minutes I had it installed, profiling GameGuru MAX and highlighting a naughty loop causing a performance hit during the running of a Test Level. Neat! The updated UI really cleans up the view into the project source code, which picked up right away and marked all the times for each function call on the right, exactly where you want to see them.
I also discovered that VTUNE will be able to let me stop the capture and then backtrack X seconds so I only profile the data relative to any spike I care to analyze, similar to capturing the last 30 seconds from your in-car webcam before someone crashed into you! I have also discovered the API that allows capture events to be coded directly into the software so I don't even need to do captures manually, meaning I can snapshot identical sections and do side-by-side comparisons when things go strange.
I hope is to assemble a small toolkit of go-to techniques to quickly dive into performance data, find the criminal code, then right back into Visual Studio to repair/rewrite/reject as required. Who knows, maybe my VTUNE HAMMER will be a regular tool for me now as we race towards the completion of the GameGuru MAX Early Access version :)
To be continued.