Jump to content
Sign in to follow this  
SledDriver

P3D multicore usage anomoly

Recommended Posts

21 minutes ago, killthespam said:

That's why I mentioned that LM (p3D) needs to find a way of optimizing the core usage properly, not as it is now.

You make it sound like it's actually possible. Actually there are very few cases in algorithm design where you can scale infinitely based on core count, and they're usually related to graphics processing or stream processing where there are no dependencies between the different data elements.

Imagine it's Thanksgiving, and you're cooking an elaborate five course meal. It will take you a long time. Add a person to help you, and things go much faster because you can divide the work. Add a third person, then a fourth, and you'll discover that your marginal productivity is decreasing because you're getting in each other's way, or you're waiting for a shared resource (cutting board, oven, large pot, microwave, etc). It's why stuff doesn't scale linearly upwards with core counts.

It's Amdahl's Law - which was identified a half century ago. https://en.wikipedia.org/wiki/Amdahl's_law

Cheers!

  • Like 1
  • Upvote 1

Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
2 hours ago, SledDriver said:

Hmm. With 12 cores available, having one at 100% and the rest merely jogging along has to be bad programming/OS assignment/whatever.

With P3D 4.4, you do not have to (should not) run with one core at 100%.   In fact, it is not advisable.  The application will indeed spread loads much better than in the past, it just needs to be configured to do so.

As Bob mentions above -- 

"...With hardware VSync and a 30 Hz (4K) monitor, my core 0 in P3Dv4 typically sits around 70% on the ground with complex acft/scenery and AI, and often in the mid 30% range inflight, certainly not 100%...."

If you have modern, top-end hardware, and you aren't seeing the Core0 cpu usage that Bob (and I) see, then I'd suggest revisiting P3D configuration threads.


Rhett

7800X3D ♣ 32 GB G.Skill TridentZ  Gigabyte 4090  Crucial P5 Plus 2TB 

Share this post


Link to post
1 hour ago, Mace said:

If you have modern, top-end hardware, and you aren't seeing the Core0 cpu usage that Bob (and I) see, then I'd suggest revisiting P3D configuration threads.

Got any pointers to useful config threads? More than happy to read up. I have years of experience tweaking FSX, and about 1 week on P3D, so am all ears for learning right now.

Share this post


Link to post

Gents,

what I don't understand is the following. No HT, no affinity mask, first core is core 0 and most of the time is at a very high percentage sometimes 100% for long periods of time based on a/c complexity, scenery, and other sim settings. The other cores are at low 50% or 30% across. When a core is at the max and the demand is higher we will get high temps and stutters.

Now if I use again no HT and an affinity mask to remove core 0 and assign the other cores active only for P3D the first core for the sim now lets say is core 1 (could also assign  2, 3 4 etc) that core now  will be at a very high percentage close or at 100% and the others  at low 50% or 30% across. At this point, I intend to believe that no matter what you do, the way P3D is programmed you will always have that problem core assigned to be 1, that particular one will be at 100% or a very high value than the others and when there is a higher demand hitting 100% or maybe more it will create stutters or high temps. By the way, I noticed that to be the situation watching task manager every time when I used a different affinity mask.

I did not see this issue with XP11. Now not being a programmer, of course, I don't have the knowledge and understanding of this, all I can see is facts of core loading for long periods of time at 100%, high temps on CPU and stutters on P3D and better usage of computer resources on the other sim. 

At this point when I can compare facts and numbers, I still think that there is something not properly optimized and there is much space for improvement.

Edited by killthespam

I9- 13900K- CPU @ 5.0GHz, 64 GB RAM @ 6200MHz, NVIDIA RTX 4090

Share this post


Link to post

I don't know the technical background behind the load on cores, but I advise that you just leave things the way they are and don't turn off and on core0. It is true that when you load a scenario and you are at an airport not yet flying, you will have core0 used to 100% and the other cores relatively low. However, when you actually fly, you will see that the other cores (especially 3 and 4 if you have a quad-core CPU) will be hammered to almost 100% when scenery is being loaded en route - especially if you have complex scenery.

If you equalize the CPU load by turning off and on core0 in the CPU affinity setting, you may initially get the impression that an equalized load is better. However, when you overfly complex scenery, you won't have as much CPU resources available as without doing the said thing. I tested it myself a while ago and found delayed autogen loading and blurry textures on approach when doing the procedure. You can test it as well if you like.

Share this post


Link to post
4 hours ago, Luke said:

You make it sound like it's actually possible.

It's Amdahl's Law - which was identified a half century ago. https://en.wikipedia.org/wiki/Amdahl's_law

Cheers!

In your analogy there is no question there is an optimal configuration of workers, tasks, and dependencies.   Are you suggesting LM has already fully optimized this configuration in V4.4 and if so, why is that your conclusion?

Majestic, our beloved Q400 developer, uses some methodology to offload I think the flight modeling off of P3D/FSX's main thread.  That particular quite complex aircraft is as easy on processing demand as the default planes, and this makes me think there might be headroom to creatively access w/ more cores.  From your citation it appears to come down to this:  

"the theoretical speedup is limited to at most 20 times. For this reason, parallel computing with many processors is useful only for highly parallelizable programs."

Ok then the question becomes what more can be 'highly parallelized'.  I get the sense terrain loading must be, and that is already distributing over multiple cores.  Can all of the AI aircraft be offloaded off the main thread?  How about ATC?  How about the flight model, as w/ the incredibly performing Majestic Q400?

 

 

Edited by Noel

Noel

System:  7800x3D, Thermal Grizzly Kryonaut, Noctua NH-U12A, MSI Pro 650-P WiFi, G.SKILL Ripjaws S5 Series 32GB (2 x 16GB) 288-Pin PC RAM DDR5 6000, WD NVMe 2Tb x 1, Sabrent NVMe 2Tb x 1, RTX 4090 FE, Corsair RM1000W PSU, Win11 Home, LG Ultra Curved Gsync Ultimate 3440x1440, Phanteks Enthoo Pro Case, TCA Boeing Edition Yoke & TQ, Cessna Trim Wheel, RTSS Framerate Limiter w/ Edge Sync for near zero Frame Time Variance achieving ultra-fluid animation at lower frame rates.

Aircraft used in A Pilot's Life V2:  PMDG 738, Aerosoft CRJ700, FBW A320nx, WT 787X

 

Share this post


Link to post

This is the CPU load that I have with HT on and OFF no AM.

Any suggestions comments are very welcome.

https://www.dropbox.com/s/5184wa0i6tl714d/cpu2.JPG?dl=0

https://www.dropbox.com/s/snt4v4pit60sh37/CPU.JPG?dl=0

As I mentioned on XP11 all cores are almost even at 60% across the way I had it set, to be noted I'm not using XP11 anymore.

 

Thank you!

 

 

 

 

Edited by killthespam

I9- 13900K- CPU @ 5.0GHz, 64 GB RAM @ 6200MHz, NVIDIA RTX 4090

Share this post


Link to post
48 minutes ago, Noel said:

In your analogy there is no question there is an optimal configuration of workers, tasks, and dependencies.   Are you suggesting LM has already fully optimized this configuration in V4.4 and if so, why is that your conclusion?

It's not, but my suggestion is that the optimal configuration may take more effort and risk than it's worth, and we may be closer to it than we think. We need to realize that we're never going to get 8 cores pegged at 100% - there are probably enhancements to be gained but they are not going to be trivial in effort. We also have to remember what L-M's "other" customers consider acceptable may be very different.

 

51 minutes ago, Noel said:

Majestic, our beloved Q400 developer, uses some methodology to offload I think the flight modeling off of P3D/FSX's main thread.  That particular quite complex aircraft is as easy on processing demand as the default planes, and this makes me think there might be headroom to creatively access w/ more cores.  From your citation it appears to come down to this:  

I would ask you the question - "are you suggesting that the flight model consumes a significant amount of CPU/GPU time?" I seriously question this, mostly because calculations on small data sets are something CPUs do very, very well. Additionally, when I am in a PMDG aircraft I switch from the VC to spot view and my frame rate will easily go up 50%, even though the flight modeling remains active.

Cheers!


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post

I don't know Luke but I do know the Q400 is very sophisticated and yet is easy on frames and yes that may be a red herring.

Terrain texture loading seems to be a process than can occur in parallel to a higher degree and hence gets distribute over my 8 available logical processors.  What others are there?   I know when I have lots of commercial airport traffic performance gets impacted.  Why not have ALL ground and air traffic off of the main thread for example?  


Noel

System:  7800x3D, Thermal Grizzly Kryonaut, Noctua NH-U12A, MSI Pro 650-P WiFi, G.SKILL Ripjaws S5 Series 32GB (2 x 16GB) 288-Pin PC RAM DDR5 6000, WD NVMe 2Tb x 1, Sabrent NVMe 2Tb x 1, RTX 4090 FE, Corsair RM1000W PSU, Win11 Home, LG Ultra Curved Gsync Ultimate 3440x1440, Phanteks Enthoo Pro Case, TCA Boeing Edition Yoke & TQ, Cessna Trim Wheel, RTSS Framerate Limiter w/ Edge Sync for near zero Frame Time Variance achieving ultra-fluid animation at lower frame rates.

Aircraft used in A Pilot's Life V2:  PMDG 738, Aerosoft CRJ700, FBW A320nx, WT 787X

 

Share this post


Link to post
9 hours ago, Noel said:

Terrain texture loading seems to be a process than can occur in parallel to a higher degree and hence gets distribute over my 8 available logical processors.  What others are there?   I know when I have lots of commercial airport traffic performance gets impacted.  Why not have ALL ground and air traffic off of the main thread for example?  

Texture loading is I/O, which is trivial to multi-thread since those threads aren't doing much except waiting for said I/O to finish. That's why there was such a huge win with FSX SP1. (If you want to feel nostalgic, set your AFFINITYMASK to 1 and get that good old fashioned single-core experience).

Your example is more difficult, because the traffic isn't independent of you, or each other. There's a single global world state, and the traffic is part of it. My (uninformed) guess is that the rendering isn't something that they can easily break out into different threads, or that the rendering engine itself is something that makes so many assumptions about being single-threaded that it would take a complete (risky, expensive, but probably needed) rewrite.

Cheers!


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post

"Texture loading is I/O, which is trivial to multi-thread since those threads aren't doing much except waiting for said I/O to finish. That's why there was such a huge win with FSX SP1."

Indeed--a huge win by taking a task that was accomplished for years in the main thread when someone realized it could be ported off the main thread.  I read a while back that XP sends its AI into other cores.  And besides, I disagree w/ your premise that AI traffic must be a dependent part of the whole integrated world--it may take reimagining how that works, but I don't see it as necessary.  Same same for ground traffic on roadways.

My point is that it will take some thinking outside the box to access all of that horsepower in waiting, that elephant in your PC case, and my guess is that even though Luke you seem to have already come to a final conclusion that it's too expensive and the yield won't offset the development cost that way of thinking never spawns breakthroughs.  But yes, it appears it will take some serious outside of the box thinking to do more with parallel processing in a flight sim--but that is where the most potential processing power remains it appears.

Edited by Noel

Noel

System:  7800x3D, Thermal Grizzly Kryonaut, Noctua NH-U12A, MSI Pro 650-P WiFi, G.SKILL Ripjaws S5 Series 32GB (2 x 16GB) 288-Pin PC RAM DDR5 6000, WD NVMe 2Tb x 1, Sabrent NVMe 2Tb x 1, RTX 4090 FE, Corsair RM1000W PSU, Win11 Home, LG Ultra Curved Gsync Ultimate 3440x1440, Phanteks Enthoo Pro Case, TCA Boeing Edition Yoke & TQ, Cessna Trim Wheel, RTSS Framerate Limiter w/ Edge Sync for near zero Frame Time Variance achieving ultra-fluid animation at lower frame rates.

Aircraft used in A Pilot's Life V2:  PMDG 738, Aerosoft CRJ700, FBW A320nx, WT 787X

 

Share this post


Link to post
15 hours ago, SledDriver said:

Got any pointers to useful config threads? More than happy to read up. I have years of experience tweaking FSX, and about 1 week on P3D, so am all ears for learning right now.

There are quite a few discussions of the 30Hz VSync technique here.  P3D does not need anywhere near the kind of tweaking that FSX did...I leave HT off and leave the affinity mask alone.  My best results have been to limit the CPU and GPU workloads so that they are not running all-out all of the time.  Maintaining some processor headroom allows the absorption of workload spikes without making the sim miss frames. 

For the CPU, I let the hardware VSync throttle the CPU load back by limiting frame production.  That's done by setting frame rate to unlimited and turning on VSync in the P3D settings, and no messing with nVidia inspector or other external utilities.  Done that way, with a 30 Hz hardware refresh rate set on the display, the CPU is constrained to produce frames at the hardware refresh rate, and no more.  When the CPU is trying to get to 60 or unlimited fps, then it will run the main thread (usually on core 0) at 100% all of the time.  It's pedaling as hard as it can all the time, and when workload spikes occur, such as loading AI, autogen, crossing tile boundaries, airport scenery loading etc, the best rate the CPU can hold will fluctuate significantly, and that fluctuation produces stutters.  In the interest of smoothness, a steady flow of frames at 30 fps is far preferable to 45 fps with excursions all over the map--and it leaves room for the CPU to take up additional load fluctuations with much less chance of dropping below the 30fps threshold.  Use of GSync to deal with more frame rate fluctuations might also be a good way to go if you have the hardware, as the monitor can roll with the flow of varying frame rates (up to a point) without producing stuttering or tearing.

For the GPU, I keep steady-state GPU load to no more than 80% with the heaviest workload I intend to throw at it.  That means managing AA, dynamic lighting, cloud layers/texture sizes etc so that I don't hit the 95-100% wall on GPU load, because that'll produce the same kind of frame rate fluctuations as a maxxed-out CPU.  I typically run my 4K 30Hz display with 4xSSAA except at night with DL on in heavy weather, where I change to 4xMSAA on the fly to keep the GPU load under control.  I monitor CPU and GPU load using MSI Afterburner, with the stats displayed on the LCD screen of a Logitech G13 game controller that also serves as a view controller.

Regards


Bob Scott | President and CEO, AVSIM Inc
ATP Gulfstream II-III-IV-V

System1 (P3Dv5/v4): i9-13900KS @ 6.0GHz, water 2x360mm, ASUS Z790 Hero, 32GB GSkill 7800MHz CAS36, ASUS RTX4090
Samsung 55" JS8500 4K TV@30Hz,
3x 2TB WD SN850X 1x 4TB Crucial P3 M.2 NVME SSD, EVGA 1600T2 PSU, 1.2Gbps internet
Fiber link to Yamaha RX-V467 Home Theater Receiver, Polk/Klipsch 6" bookshelf speakers, Polk 12" subwoofer, 12.9" iPad Pro
PFC yoke/throttle quad/pedals with custom Hall sensor retrofit, Thermaltake View 71 case, Stream Deck XL button box

Sys2 (MSFS/XPlane): i9-10900K @ 5.1GHz, 32GB 3600/15, nVidia RTX4090FE, Alienware AW3821DW 38" 21:9 GSync, EVGA 1000P2
Thrustmaster TCA Boeing Yoke, TCA Airbus Sidestick, 2x TCA Airbus Throttle quads, PFC Cirrus Pedals, Coolermaster HAF932 case

Portable Sys3 (P3Dv4/FSX/DCS): i9-9900K @ 5.0 Ghz, Noctua NH-D15, 32GB 3200/16, EVGA RTX3090, Dell S2417DG 24" GSync
Corsair RM850x PSU, TM TCA Officer Pack, Saitek combat pedals, TM Warthog HOTAS, Coolermaster HAF XB case

Share this post


Link to post
1 hour ago, Noel said:

Indeed--a huge win by taking a task that was accomplished for years in the main thread when someone realized it could be ported off the main thread.  I read a while back that XP sends its AI into other cores.  And besides, I disagree w/ your premise that AI traffic must be a dependent part of the whole integrated world--it may take reimagining how that works, but I don't see it as necessary.  Same same for ground traffic on roadways.

You use words like "realize" and "reimagine" like the people at LM are stupid and lack any ability to consider significant changes to their product and its code. Given what they are doing, I am extremely confident that is not the case. But if you feel you are so much more imaginative than their engineers, perhaps you should apply. Everyone is looking for skilled software engineers these days.

Adding threading support to FSX SP1 wasn't a question of 'realization'. If you had suggested doing so at the time FS9 was released or even FSX, you would have been quite correctly shut down. Single-thread performance was continuing to ramp up every year, and 99.9% of consumers had just a single-core processor. Multi-threading on a single-core CPU actually slows things down a bit if multiple threads are CPU bound since you have to do expensive context switching.

Don't forget as well that C++ doesn't have anywhere near the same amount of threading constructs as a modern language such as C# or Java. Multi-threading code in those languages is something that an engineering manager or architect won't take lightly, since it adds cost not just in terms of development but also ongoing maintenance. To rewrite something multi-threaded in C++ in an old code base? I'm impressed SP1 was as stable as it was, but that too is likely because they were conservative.

The assertion that X-Plane sends AI to other cores needs qualification. First, it's worth asking what exactly XP is doing (modeling, render, etc) and it's a different engine that P3D. Have LM engineers had conversations about shifting the work around? Probably extensive conversations, but it again goes back to what is the cost and effort, likelihood of success and risk to the stability of the platform, plus the elephant in the room - what do L-M's "other" customers want from the sim?

I've come to my conclusions because I've been a software engineer and now an engineering manager for over a quarter century. I don't claim specific insight into LM, but their actions look exactly like a team that is slowly, conservatively modifying a legacy codebase. To suggest it merely needs a little "imagination" is not fair to LM at all.

Cheers!

Edited by Luke
  • Like 2
  • Upvote 3

Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
26 minutes ago, Luke said:

I've come to my conclusions because I've been a software engineer and now an engineering manager for over a quarter century. I don't claim specific insight into LM, but their actions look exactly like a team that is slowly, conservatively modifying a legacy codebase. To suggest it merely needs a little "imagination" is not fair to LM at all.

Cheers!

Bingo. I always get a chuckle when people talk about tossing the "old engine" lock, stock, and barrel. I see what you see. They are modernizing it from the "outside in". Meaning that in order to get the entire rendering engine overhauled, they need to start by adding all of the supporting elements to the existing code. Things like HDR, PBR, and DL have been added, but the old lighting and texture routines are also still in place running side by side. Eventually, the legacy bits will be removed when LM stops using them. at some point down the road.

In the end, they will toss the legacy engine, but it will be done by removing it after the new code is in place. Piece by piece.

Edited by MDFlier

 i9-10850K, ASUS TUF GAMING Z490-PLUS (WI-FI), 32GB G.SKILL DDR4-3603 / PC4-28800, EVGA GeForce RTX 2080 Ti BLACK EDITION 11GB running 3440x1440 

Share this post


Link to post
11 hours ago, Luke said:

You use words like "realize" and "reimagine" like the people at LM are stupid and lack any ability to consider significant changes to their product and its code. Given what they are doing, I am extremely confident that is not the case. But if you feel you are so much more imaginative than their engineers, perhaps you should apply. Everyone is looking for skilled software engineers these days.

Adding threading support to FSX SP1 wasn't a question of 'realization'. If you had suggested doing so at the time FS9 was released or even FSX, you would have been quite correctly shut down. Single-thread performance was continuing to ramp up every year, and 99.9% of consumers had just a single-core processor. Multi-threading on a single-core CPU actually slows things down a bit if multiple threads are CPU bound since you have to do expensive context switching.

Don't forget as well that C++ doesn't have anywhere near the same amount of threading constructs as a modern language such as C# or Java. Multi-threading code in those languages is something that an engineering manager or architect won't take lightly, since it adds cost not just in terms of development but also ongoing maintenance. To rewrite something multi-threaded in C++ in an old code base? I'm impressed SP1 was as stable as it was, but that too is likely because they were conservative.

The assertion that X-Plane sends AI to other cores needs qualification. First, it's worth asking what exactly XP is doing (modeling, render, etc) and it's a different engine that P3D. Have LM engineers had conversations about shifting the work around? Probably extensive conversations, but it again goes back to what is the cost and effort, likelihood of success and risk to the stability of the platform, plus the elephant in the room - what do L-M's "other" customers want from the sim?

I've come to my conclusions because I've been a software engineer and now an engineering manager for over a quarter century. I don't claim specific insight into LM, but their actions look exactly like a team that is slowly, conservatively modifying a legacy codebase. To suggest it merely needs a little "imagination" is not fair to LM at all.

Cheers!

I'm not talking about LM or LR in particular so I'm not talking about rewriting code base, I'm talking about where the untapped processing power lives, and if that means off to another programming language that is more suited to do more w/ parallel processing that's where the future lies.  My assertion was simply my passing on what I read many years ago I believe on their website re XP and AI.  And further, I never said all LM needs was a little imagination--that is how you reacted to what I wrote.  I said "But yes, it appears it will take some serious outside of the box thinking to do more with parallel processing in a flight sim--but that is where the most potential processing power remains it appears."  Said nothing about LM except they are the developer with massive deep pockets so yes it would be great to see an entity w/ those potential resources develop a modern FS maximizing multithreading in whatever languages facilitate that the most because, once again, that is where the processing power lies.


Noel

System:  7800x3D, Thermal Grizzly Kryonaut, Noctua NH-U12A, MSI Pro 650-P WiFi, G.SKILL Ripjaws S5 Series 32GB (2 x 16GB) 288-Pin PC RAM DDR5 6000, WD NVMe 2Tb x 1, Sabrent NVMe 2Tb x 1, RTX 4090 FE, Corsair RM1000W PSU, Win11 Home, LG Ultra Curved Gsync Ultimate 3440x1440, Phanteks Enthoo Pro Case, TCA Boeing Edition Yoke & TQ, Cessna Trim Wheel, RTSS Framerate Limiter w/ Edge Sync for near zero Frame Time Variance achieving ultra-fluid animation at lower frame rates.

Aircraft used in A Pilot's Life V2:  PMDG 738, Aerosoft CRJ700, FBW A320nx, WT 787X

 

Share this post


Link to post
Guest
This topic is now closed to further replies.
Sign in to follow this  
  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...