Jump to content

How to genuinely boost your FPS - Making use of multi cores


Recommended Posts

Hey you forum lovelies,

Thought i'd share this little 5 minute video with you - worked wonders for my FPS and general feel of the game. I now have less input lag when clicking the usual offenders (overlays, compactors, doors, etc) and I genuinely hope it helps some of you struggling folks too.

Makes use of a free piece of software called Process Lasso and seems to be borderline idiot proof - although I did try my best to break it ;) 

Full disclaimer : Results may vary, some of you are way bigger computer nerds than I am, and I no doubt miss-spoke a fair few times (I was rushing to get this video out before stream). This process greatly improved my gaming performance, as would cocaine and/or energy drinks. This will not instantly fix your 1500 critter, cycle 5000, 2 dupe "speed run" - that can only be fixed with a slap in the grill - however, it will hopefully help anyone who isn't a total pleblord.

Mwah as ever! 

-Life xox

*Edit* For those of you who didn't check the video description : *Note* Some people have stated that disabling SMT works better for them, this video was recorded with SMT enabled. For more info, google "Simultaneous Multithreading".

You lazy slackers :p 

Link to comment
Share on other sites

6 hours ago, MorsDux said:

If there was a boost I didnt notice it :/

What CPU out of interest? Also, did you read the youtube description? I put a little note in there that you may want to tinker with :

*Note* Some people have stated that disabling SMT works better for them, this video was recorded with SMT enabled. For more info, google "Simultaneous Multithreading".

16 minutes ago, Lilalaunekuh said:

Just out of curiosity: Did it help with your fps ?

[For me I wouldn´t say it noticeable improved my fps, but I only messed with the first tip for all cpus.]

As mentioned, the place I personally felt the biggest impact was in general gameplay. I.e. panning around the map felt way smoother, interacting with the build menu, buildings, overlays etc, all those things are now instantaneous even in a mid-late game base. I daresay at cycle 1500 with a billion things all running at once that i'll still be tearing my hair out, but maybe a little less ;) 

Link to comment
Share on other sites

Can you try just disallowing ONI from running on Core 0 and seeing if that makes any performance difference? On some systems CPU 0 is overwhelmingly the core that handles Interrupts and DPC calls including graphics driver calls, and context switching can be quite expensive.

Link to comment
Share on other sites

In other words (no offense btw), if you´re running many things in the background (for whatever reason), this can help, because the clunky windows scheduler wont assign the game to cores, already in use by another task, and then again and again.

The other way, that works perfectly fine is: Close the damn things, you dont need right now. Processes not running, cant use CPU-time. Closing the browser helps a lot. Windows is doing nearly nothing on itself in background of a heavy application like a game or such. 

If you need all of this programs, for example, as a streamer, this software can help. But it will not really help, if you already have only 2 or 4 cores. There´s nothing to gain by this. 

Link to comment
Share on other sites

This has a bunch of interesting use cases. If you overclock and end up not running the same speed on each core for whatever reason, you can make ONI only use the fast cores.

It's also interesting if you use a Zen 2 CPU. They consist of 2-4 CCX, each with 3 or 4 cores. Each CCX has dedicated cache and there is apparently a performance hit when a process is moved from one CCX to another. If you set ONI to only run on a single CCX and let nothing else run on that CCX, then not only will you avoid the performance hit from switching CCX, you will also ensure that ONI gains control of the entire 16 MB cache.

In general you would likely get the best results from knowing which cores are real and which are hyperthreaded ones. Also know about which cores are on which cache if you have enough cores to not let all cores share the same cache.

One word of warning though: it's not always a good idea to dedicate a CPU core to a task. I have an Intel 4790k and I noticed that if I have a single core task, which takes 100% of the CPU time for a while, then it will not stay on the same core. It will consistently switch in this order: 0->1->2->3->0. This means each core will heat up 1/4 of the time and cool 3/4 of the time. This allows peak speeds, which are faster than the core can cool because it's for such a short time. However if you follow the process, it will only experience peak speeds because it keeps getting a new cold core to heat up. This means if you dedicate a core to the 100% load process, then there is a risk that the CPU will thermal throttle because you took away the ability to spread out the heat. This means in my case it's best to NOT use dedicated cores.

Link to comment
Share on other sites

21 hours ago, Lifegrow said:

Full disclaimer : Results may vary, some of you are way bigger computer nerds than I am, and I no doubt miss-spoke a fair few times (I was rushing to get this video out before stream). This process greatly improved my gaming performance, as would cocaine and/or energy drinks. This will not instantly fix your 1500 critter, cycle 5000, 2 dupe "speed run" - that can only be fixed with a slap in the grill - however, it will hopefully help anyone who isn't a total pleblord.

Are you sure that this disclaimer is enough, when someone finds out, that you caused the rainforest fires, with your "tuning" tips?

put.jpg.b1f86c47abcd2c8ae2a68faaf30055a4.jpg
 

Link to comment
Share on other sites

2 hours ago, Nightinggale said:

It's also interesting if you use a Zen 2 CPU. They consist of 2-4 CCX, each with 3 or 4 cores. Each CCX has dedicated cache and there is apparently a performance hit when a process is moved from one CCX to another. If you set ONI to only run on a single CCX and let nothing else run on that CCX, then not only will you avoid the performance hit from switching CCX, you will also ensure that ONI gains control of the entire 16 MB cache.

It seems that Win10 uses the fastest two cores for ONI. On my new 3600X, I have 100% load on core 4 and 12 (I think) when running ONI. Reported CPU clock is 4.25GHz. Not sure how hyperthreading goes into core numbering, but the performance monitor lists 16 cores for this 8 core CPU.

FPS went from still completely playable 13FPS on my old FX8350 to also completely playable 20FPS on a complex end-game map on the 3600X. CPU load on the FX8350 on Win7 was more like 4 cores at 50%, so completely different scheduling.

Of course, that win10 installation is from yesterday, and I may still be missing drivers and stuff. Currently installing the mainboard in the case, so I cannot check. 

Link to comment
Share on other sites

3 hours ago, SharraShimada said:

In other words (no offense btw), if you´re running many things in the background (for whatever reason), this can help, because the clunky windows scheduler wont assign the game to cores, already in use by another task, and then again and again.

The other way, that works perfectly fine is: Close the damn things, you dont need right now. Processes not running, cant use CPU-time. Closing the browser helps a lot. Windows is doing nearly nothing on itself in background of a heavy application like a game or such. 

If you need all of this programs, for example, as a streamer, this software can help. But it will not really help, if you already have only 2 or 4 cores. There´s nothing to gain by this. 

Before I go on - i'm clearly not an expert (nor am enthusiast) but from the reading I did I think the cache in use is one of the more important factors here. You can't close *every* process - you just can't, and those windows essential programs, or peripheral software, etc - they can bugger up core usage and cache allocation. This guy seems to explain it rather clearly where I cannot :)

https://community.amd.com/thread/236646

Also, are you going to tell me that you would actually close each and every other program you have running before booting up a game? Nerd :p 

2 hours ago, Nightinggale said:

It's also interesting if you use a Zen 2 CPU. They consist of 2-4 CCX, each with 3 or 4 cores. Each CCX has dedicated cache and there is apparently a performance hit when a process is moved from one CCX to another. If you set ONI to only run on a single CCX and let nothing else run on that CCX, then not only will you avoid the performance hit from switching CCX, you will also ensure that ONI gains control of the entire 16 MB cache.

In general you would likely get the best results from knowing which cores are real and which are hyperthreaded ones. Also know about which cores are on which cache if you have enough cores to not let all cores share the same cache.

One word of warning though: it's not always a good idea to dedicate a CPU core to a task. I have an Intel 4790k and I noticed that if I have a single core task, which takes 100% of the CPU time for a while, then it will not stay on the same core. It will consistently switch in this order: 0->1->2->3->0. This means each core will heat up 1/4 of the time and cool 3/4 of the time. This allows peak speeds, which are faster than the core can cool because it's for such a short time. However if you follow the process, it will only experience peak speeds because it keeps getting a new cold core to heat up. This means if you dedicate a core to the 100% load process, then there is a risk that the CPU will thermal throttle because you took away the ability to spread out the heat. This means in my case it's best to NOT use dedicated cores.

This is what I was alluding to above. Knowing which cores are physical versus doubled can help hugely. For the video I simply showed how to split the workload with SMT still enabled - however for best results i'd advise (firstly, do some detailed reading) disabling SMT, googling which are the physical cores for your processor - and using half of those for your games. I.e. I have a 2700x so i'd use cores 0-3 for games, then leave the remainder for other processes. Windows shuffling processes to different cores acts to flush out the available cache, and for a game as clunky as ONI in terms of resource usage - I think that may be the key. This was pretty insightful too : https://pcper.com/2017/03/amd-ryzen-and-the-windows-10-scheduler-no-silver-bullet/

11 minutes ago, Gurgel said:

It seems that Win10 uses the fastest two cores for ONI. On my new 3600X, I have 100% load on core 4 and 12 (I think) when running ONI. Reported CPU clock is 4.25GHz. Not sure how hyperthreading goes into core numbering, but the performance monitor lists 16 cores for this 8 core CPU.

Of course, that win10 installation is from yesterday, and I may still be missing drivers and stuff. Currently installing the mainboard in the case, so I cannot check. 

Check the link above re: hyperthreading. Seems it varies from game to game - I found the bigger benefit from disabling it for ONI but as ever, your usage may vary.

1 hour ago, Oozinator said:

Are you sure that this disclaimer is enough, when someone finds out, that you caused the rainforest fires, with your "tuning" tips?

 

My streaming rig is currently an old an FX 8350 - if anyone wants some fried eggs, send them my way :p 

Link to comment
Share on other sites

Hah, I'm currently running an FX-9590.  It's got it's own steam turbine, er, water cooler :p  I mean, I don't call it the volcano for nothing :p  

I've been looking at upgrading for the last few years but, well, getting laid off puts a crimp in your budget... having to go back to college because your industry locally started requiring a degree for this sort of IT work? Bigger crimp.  Still, published my first book in April and I'm making decent money so I'm waiting till the 3rd gen threadrippers roll out to decide between them or a 3950x.

Link to comment
Share on other sites

This definitely seems like it will be most useful for Ryzen users.  I've often thought about assigning ONI to a single CCX but it's already a pain to remember to do the things I have to do when loading it up (such as setting the default 'destruct' to 'buildings' every damn time or chaos ensues).  But if Process Lasso does it automatically, that's quite a win.

Will definitely test this out when I get home, as I'm a R7 user as well.

Oh, and you probably don't want to split to even/odd.  In fact, I think that's the worst way to do it from a cache standpoint.  You want to minimize the use of the Infinity Fabric, and assign them to a single CCX.  See this reddit thread, and especially this graphic for cache latency for details.

TL;DR - If you're on Ryzen 7 use Process Lasso to assign "OxygenNotIncluded.exe" to cores 8 through 15 and then assign * (a wildcard for all other processes) to 0 through 7.

7 hours ago, Nightinggale said:

This means each core will heat up 1/4 of the time and cool 3/4 of the time. This allows peak speeds, which are faster than the core can cool because it's for such a short time.

This is true, but, moving a thread from one core to another is far from 'free', as you invalidate caches.  Remember, modern CPUs run much more quickly than RAM, which is why we have three layers of caching between them and RAM.  Most performance is not throttled by CPUs not being fast enough or cooled enough, but by not being able to access enough memory quickly enough.  This of course depends on what you're doing, but trying to keep them fed with the data they need is often the toughest challenge.  Avoiding L3 caches helps, and avoiding Infinity Fabric on Ryzen really helps.

On the other hand, moving to a different location on the die (i.e. a different core) may help more on some Intel CPUs as they cheapened them and used paste (instead of the preferred solder for higher thermal conductivity) from Ivy Bridge until the recent Coffe Lake-R chips.

Link to comment
Share on other sites

Windows 10 doesn't know how to manage many cores efficiently. That was something proven to be affecting the highest core count Threadripper CPU, yet with Linux it doesn't have that issue.

That issue was supposed to be fixed in the latest update of windows, as it was announced by Microsoft at AMD's Ryzen 2 presentation.

So, it would be worth specifying if your version of Windows 10 is up to date. It might have a relation with the effectiveness of the tweak.

Link to comment
Share on other sites

6 hours ago, cblack said:

This is true, but, moving a thread from one core to another is far from 'free', as you invalidate caches.

For the record I wrote the core shifting part about cores, which shares level 3 cache. Ditching all cache contents could surely be expensive. Also I wonder if there is more to the core order than first meets the eye. Remember this is the 4th gen flagship Intel CPU. If they came up with something fancy back then, this is the CPU it would be in. What if the CPU can move a process from core X to X+1? Not only could it be faster, if they are reach fancy, they would copy the level 1 and 2 caches too. I don't know if they did that, but the consistent and absolutely non-random way the 100% load process moves between cores indicates that there is something non-random going on.

6 hours ago, cblack said:

Most performance is not throttled by CPUs not being fast enough or cooled enough, but by not being able to access enough memory quickly enough.

This is a very good point and it made me think. Is the core switching triggered by temperature difference between cores or something like that? What happens if there is a single threaded application, which has 100% CPU load, but is bottlenecked by memory I/O. Now I'm wondering about making a program, which has an array of a billion ints. Each will be set to a random number between 0 and a billion. Now make a loop where it will read an index and whatever number it reads will be the index for the next iteration. Completely useless, but it will be 100% CPU load and with a billion ints it's designed to have cache misses all the time. Will this end up on a core, which won't heat up and then not switch core? I might actually end up running this experiment to see what happens.

7 hours ago, cblack said:

On the other hand, moving to a different location on the die (i.e. a different core) may help more on some Intel CPUs as they cheapened them and used paste (instead of the preferred solder for higher thermal conductivity) from Ivy Bridge until the recent Coffe Lake-R chips.

They went back to soldering and surprisingly that didn't really reduce the thermal resistance. If you use X watt and a specific heatsink, the CPU will be around the same temperature regardless of paste or soldering. In fact you risk the soldered one being hotter (not much). Yes it's a huge surprise to everybody (me included), but apparently they used the correct paste.

I can point to a bunch of mistakes Intel have made, but this one isn't on the list.

Link to comment
Share on other sites

2 hours ago, Nightinggale said:

Is the core switching triggered by temperature difference between cores or something like that?

It's dependent on the kernel, or more specifically, the scheduler.  There are plenty of open source schedulers you can look at if you're curious, but I have no clue how Windows does it.  I'm moving away from it faster than dupe runs to the bathroom when they have to pee.  Thankfully, ONI has Linux support :).

Modern scheduling has become significantly more complex over the years, from hyper threading, multi-core, NUMA, multiple CCXs/multi-CPU boards, etc.  It's not a topic I would casually investigate, but I'm sure it's fascinating.

2 hours ago, Nightinggale said:

They went back to soldering and surprisingly that didn't really reduce the thermal resistance.

IIRC the only people who actually noticed a difference were those overclocking, and even then only major overclocks with very small margins.  Between Intel's locked multipliers on all but their highest end products, and my love of stability over maximum speed, I don't have a lot of interest.  I've never heard of soldered actually performing worse than a pasted heat spreader, but I'm sure it depends on more than just one factor.

Anyway, to get back on topic...

I've run benchmarks now, and the results are quite surprising!  With no Process Lasso, i.e. all applications (including ONI) can be scheduled anywhere, I average 21 FPS while staring at the Printing Pod, fully zoomed out.  With using PL to assign all apps other than ONI to the first CCX (i.e. cores 0-7), and sticking ONI as the only thing on the second CCX (cores 8-15), I get an average of 27 FPS.  I know that 6 FPS doesn't sound like much, but when you're talking about a 29% improvement, it's a big deal.  The game feels much smoother overall, and it's no longer the chore it once was to reduce dupe ponder time.

So yeah, definitely check this out if you're on a Ryzen.  I suspect Intel users won't see quite that improvement, but I'd love to see numbers.

Link to comment
Share on other sites

10 hours ago, cblack said:

I know that 6 FPS doesn't sound like much, but when you're talking about a 29% improvement, it's a big deal. 

It is a big deal that you can get a performance boost without any changes to the game or buying new hardware. It doesn't really matter how you measure it or what numbers you get in that measurement. The fact that something, which is viewed as too slow becomes notably faster is significant.

Link to comment
Share on other sites

13 hours ago, cblack said:

I get an average of 27 FPS.  I know that 6 FPS doesn't sound like much, but when you're talking about a 29% improvement, it's a big deal.  The game feels much smoother overall, and it's no longer the chore it once was to reduce dupe ponder time.

I knew I wasn't crazy :D 

Glad you got some benefit of this bud.

Link to comment
Share on other sites

is anyone/many people seeing a single core ever get fully loaded in ONI?  I can load up a lategame colony that's starting to slow down quite a bit fps wise and no core gets pushed past 3300-3400mhz (cpu will go to 4.2ghz single core boost and can maintain around 4ghz all core).  I know it's not the GPU as nothing changes when I set the resolution to minimum.  RAM does come into play but even then it doesn't seem like all of it (can't be sure there of course) I gained a few fps tweaking my ram from 3200 16,18,18 to 3600 16,19,16 but still a pretty insignificant amount.  This is on a Ryzen 3600 btw.

Oddly I wouldn't mind if things still ran a bit rough if they at least put some part under full load so I knew where to poke to squeeze some more performance out :P.  Although I'll have to give process lasso a shot later (dashing out atm) and see if that changes it somehow, that and disabling SMT as suggested.

Link to comment
Share on other sites

22 minutes ago, SharraShimada said:

@Chaoticlusts first thing is to check, what boost option is in place. Intel-Boost applies only for a few seconds in default mode, and thats it. You can change this, if your cooling can handle the higher TDP in BIOS (if possible on your board).

 

Boost is working normally in other applications, I know absolute max frequency boosts don't hold long particularly with Zen2 as AMD was particularly generous with their definition of boost but still while 4.2 may not be constant 4+ is as running fully multithreaded workloads it'll hit 3.95 on every core and stick there solidly as long as needed. Zen2's a bit funny as far as boost settings go (I assume you mean things like PBO), a lot of the time it doesn't actually increase performance due to how aggressively AMD binned the chiplets and how well the default boosting handles things, for my chip I've benchmarked with it on and off and gain maybe 50mhz while putting the cpu under a lot more load (it ups the voltage quite a bit) . Temps max out somewhere in the 60c range running full load all cores more like 40c-50c in oni.

Just gave process lasso a try and it's still not fully loading the cores (one is at 3ghz the others are 1.5ghz-2ghz) but I did manage to eek out about a 5%-10% increase in fps which certainly isn't nothing :).  If anyone is curious the setting the seemed to get me the best outcome was chucking every process on my system to one core (2 threads) and then letting ONI play with the remaining 5c10t, I tried restricting ONI to 1 thread per core but it seemed to do a teensy bit better with both threads.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

Please be aware that the content of this thread may be outdated and no longer applicable.

×
×
  • Create New...