Jump to content
Sign in to follow this  
ComSimPilot

P3Dv5 CPU utilizes all cores - an observation

Recommended Posts

Hi all,

I have recently upgraded from the 7700K to the 10900K and I would like to share this observation. For years we have been complaining that FSX/P3D do not utilize all CPU threads. This image below proves that P3Dv5 is using all cores a very positive sign that of how much P3D has evolved since its first iteration.

Sjylugh.png

Edited by ComSimPilot
  • Like 1

Simulators: Prepar3D v5 Academic | X-Plane 1111.50+ | DCS  World  Open Beta MSFS 2020 Premium Deluxe | 
PC Hardware: Dell U3417W Intel i9 10900K | msi RTX 2080 Ti  Gaming X Trio msi MPG Z490 Gaming Edge Wifi | G.Skill 32GB 3600Mhz CL16 | Samsung 970 EVO Plus+860 EVO+850 EVO x 1TB, Western Digital Black Caviar Black x 6 TB Corsair RM1000i Corsair H115i Platinum Fractal Design Define S2 Gunmetal |
Flight Controls: Fulcrum One Yoke Virpil VPC WarBRD Base Virpil VPC MongoosT-50CM Grip, Thrustmaster Warthog+F/A-18C Grip Thrustmaster TPR Rudder Pedals | Virtual Fly TQ6+Throttle Quadrant | Sismo B737 Max Gear Lever | TrackIR 5Monsterteck Desk Mounts |
My fleet catalog: Link                                                                                                                                                       

Share this post


Link to post

It did in P3D v4 also. Just sayin'.
(Ryzen 1700X and now Ryzen 3700X user). 😊
 

Edited by F737NG
  • Like 2

AMD Ryzen 5800X3D; MSI RTX 3080 Ti VENTUS 3X; 32GB Corsair 3200 MHz; ASUS VG35VQ 35" (3440 x 1440)
Fulcrum One yoke; Thrustmaster TCA Captain Pack Airbus edition; MFG Crosswind rudder pedals; CPFlight MCP 737; Logitech FIP x3; TrackIR

MSFS; Fenix A320; A2A PA-24; HPG H145; PMDG 737-600; AIG; RealTraffic; PSXTraffic; FSiPanel; REX AccuSeason Adv; FSDT GSX Pro; FS2Crew RAAS Pro; FS-ATC Chatter

Share this post


Link to post

FSX is the same. With HT enabled you can see that both Logical Processors (LP) of core zero are nearly maxed therefore the main rendering thread is sharing bandwidth of core zero with the second thread. With FSX and P3D we need to see only the first LP utilised. We use an Affinity Mask to do that, there's lots of info around the site regarding this problem.

  • Upvote 2

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
2 minutes ago, SteveW said:

FSX is the same. With HT enabled you can see that both Logical Processors (LP) of core zero are nearly maxed therefore the main rendering thread is sharing bandwidth of core zero with the second thread. With FSX and P3D we need to see only the first LP utilised. We use an Affinity Mask to do that, there's lots of info around the site regarding this problem.

Steve thanks for the reply. Trying to understand your comment, you mean that the first LP is not fully utilized? As I see it my processor is utilized more than 90% which is a good sign, isn't it?  Would you suggest a specific AM for the 10900K. Currently I run with no set AM in the .cfg file. 


Simulators: Prepar3D v5 Academic | X-Plane 1111.50+ | DCS  World  Open Beta MSFS 2020 Premium Deluxe | 
PC Hardware: Dell U3417W Intel i9 10900K | msi RTX 2080 Ti  Gaming X Trio msi MPG Z490 Gaming Edge Wifi | G.Skill 32GB 3600Mhz CL16 | Samsung 970 EVO Plus+860 EVO+850 EVO x 1TB, Western Digital Black Caviar Black x 6 TB Corsair RM1000i Corsair H115i Platinum Fractal Design Define S2 Gunmetal |
Flight Controls: Fulcrum One Yoke Virpil VPC WarBRD Base Virpil VPC MongoosT-50CM Grip, Thrustmaster Warthog+F/A-18C Grip Thrustmaster TPR Rudder Pedals | Virtual Fly TQ6+Throttle Quadrant | Sismo B737 Max Gear Lever | TrackIR 5Monsterteck Desk Mounts |
My fleet catalog: Link                                                                                                                                                       

Share this post


Link to post

No. I am saying the first two LPs of HT core 0, the two top leftmost graphs, are both nearly fully utilised. That means they are each getting only 50% of the possible throughput. So what we do is use an AM with an "01" for each HT core so that these tasks gain up to 100% of core bandwidth because they are not shared.

We could turn HT off and by that we would enable only one LP per core, so that we are avoiding the sharing of the core.

Rather than disable HT we can use the AM to enable only one LP per core.

with HT disabled we get ten cores 1111111111

and with HT enabled we get 20 LPs 01,01,01,01,01,01,01,01,01,01

So we could use an AM = 349525 instead of disabling HT and we are using ten cores.

With the ten core CPU we get 20 LPs with HT enabled, two per core. It is still only ten cores. Even so we might only want to use 8 cores and leave two cores for the system (4 LPs). SO giving an example AM of 00,00,01,01,01,01,01,01,01,01. See that each pair, separated by commas, belong to one core.

So we are using 8 cores (8 LPs) for P3D and leaving two cores (4 LPs) for the system. Remember that the "01" on the far right represents the two top graphs on the left. We can copy and paste that list of 01's into the binary field of Windows calculator set to programmer mode and we get the decimal value 21845 So we can use an AM of 21845 for the ten core CPU and get good performance

Edited by SteveW
  • Like 1
  • Upvote 2

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

But you are quite right to point out that P3D will use every Logical processor it finds.

With Hyperthreading (HT) disabled P3D will find all ten cores since there is only one LP per core and will make a task on each LP. However, when HT is enabled there are 2 LPs per core and we see P3D filling all 20 LPs with 20 tasks, that's still the ten cores we started with each core shared between two tasks. With a movie converter we probably want all 20 LPs. But with P3D (and FSX) we have one task that renders the screen, the remaining tasks pull in the data.

We want to be sure that the main rendering task get's 100% of the core. So that by applying the AM in HT mode we can enable only the first LP with that leftmost two graphs  the rightmost  "01" in the binary AM without switching off HT.

We also see an overhead on the CPU for each task set up so we don't necessarily want too many tasks, because after a certain point, no matter how many tasks, we can't pull in data any faster, the system is saturated at some point. Again with HT disabled we half the task count, or leaving HT enabled we restrict the task count with the AM.

Edited by SteveW

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

Well, this whole discussion has me scratching my head.  The pic posted by the OP shows no activity on LPs 1, 12, and 15, which is odd with no AM unless the OP is using per-core hyperthreading on the 10900K to disable HT on physical cores 0, 6, and 7.

And Steve...I'm having a hard time connecting the poster's original thesis with your posts here...you talk about a "problem", but the OP is saying the opposite--that the software is finally using all the LPs (though it isn't in this case).

One question comes to mind as I try to make sense of this...if you do use per-core HT, I wonder what the affinity mask looks like (e.g. if HT is disabled on cores 0 and 1, and enabled on the other 8, does the affinity mask reflect an 18 LP processor?

 


Bob Scott | President and CEO, AVSIM Inc
ATP Gulfstream II-III-IV-V

System1 (P3Dv5/v4): i9-13900KS @ 6.0GHz, water 2x360mm, ASUS Z790 Hero, 32GB GSkill 7800MHz CAS36, ASUS RTX4090
Samsung 55" JS8500 4K TV@30Hz,
3x 2TB WD SN850X 1x 4TB Crucial P3 M.2 NVME SSD, EVGA 1600T2 PSU, 1.2Gbps internet
Fiber link to Yamaha RX-V467 Home Theater Receiver, Polk/Klipsch 6" bookshelf speakers, Polk 12" subwoofer, 12.9" iPad Pro
PFC yoke/throttle quad/pedals with custom Hall sensor retrofit, Thermaltake View 71 case, Stream Deck XL button box

Sys2 (MSFS/XPlane): i9-10900K @ 5.1GHz, 32GB 3600/15, nVidia RTX4090FE, Alienware AW3821DW 38" 21:9 GSync, EVGA 1000P2
Thrustmaster TCA Boeing Yoke, TCA Airbus Sidestick, 2x TCA Airbus Throttle quads, PFC Cirrus Pedals, Coolermaster HAF932 case

Portable Sys3 (P3Dv4/FSX/DCS): i9-9900K @ 5.0 Ghz, Noctua NH-D15, 32GB 3200/16, EVGA RTX3090, Dell S2417DG 24" GSync
Corsair RM850x PSU, TM TCA Officer Pack, Saitek combat pedals, TM Warthog HOTAS, Coolermaster HAF XB case

Share this post


Link to post

In the image the top leftmost two graphs are nearly maxed, that's two LPs of one core shared, maybe look again.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post

...double checked and sure enough LP0 is almost maxed (top left), and LP1 (Just right of top left) is fully maxed. Both those LPs are getting around 50% of the core throughput.

Disabling HT would allow only one task per core. Using the AM method of "01" for each core allows only one Task per core.

The overall CPU throughput is also showing around 94% because most of the 20 LPs are all maxed.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post
10 minutes ago, w6kd said:

if you do use per-core HT, I wonder what the affinity mask looks like (e.g. if HT is disabled on cores 0 and 1, and enabled on the other 8, does the affinity mask reflect an 18 LP processor?

First of all, always use LP0 (or call it core zero if you like, HT disabled). generally HT on or off is across the whole CPU. However with HT disabled on 0 and 1 and the remaining all HT enabled, then the CPU looks like an 18 core CPU. and you would use an AM 01,01,01,01,01,01,01,01,1,1, the two rightmost ones representing cores 0 and 1. Always use the comma or dot delimited nomenclature when using or mixing in HT.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post

...in technical discussions it's also more professional and less disconcerting to avoid the terms "scratching of head", "having a hard time" and so on as are not necessary and put off those trying to learn something. 

Edited by SteveW

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

Going back to the example of converting a video stream, the program can render a frame on each LP. With HT enabled the program can render two frames at once on one core. Even so  those two frames are rendered at half speed because the core is shared. Even so, because the HT mode saves time swapping the context (that is, saving and loading registers of each thread for it's share of the core time, or time-slice) that time is shorter.

Similarly we can use that gain in performance with some of the tasks in P3D (and FSX) by allowing two tasks per core, one on each LP. We can see that P3D (and FSX) do that with every task. We avoid sharing the first task with another task on the first core (core zero) by disabling HT or using "01".

However, with many cored CPUs (such as the ten core) we easily gain the maximum draw on scenery from the system without using all ten cores. With lesser cores we might gain a small amount of performance pulling in the scenery data with two per core.

A test can be made by measuring how quickly the system can get to the first render of the simulator screen. We can use a stopwatch and compare as we add cores or LPs to the AM. We see that the time taken reduces as we add cores. At some point that time decrease becomes very short, we have reached the maximum pull on the data. However we still continue to see small gains as each LP added enabled the data to be assembled more quickly. Since these tasks are not time sensitive like the main rendering task, we can see that enabling pairs of HT LPs continue to reduce the time taken to get to the start of the simulation. However, as I mentioned, too many tasks becomes a burden on the system overall and contributes to poorer performance in the main rendering task.

Edited by SteveW

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
16 minutes ago, SteveW said:

...double checked and sure enough LP0 is almost maxed (top left), and LP1 (Just right of top left) is fully maxed. Both those LPs are getting around 50% of the core throughput.

Disabling HT would allow only one task per core. Using the AM method of "01" for each core allows only one Task per core.

The overall CPU throughput is also showing around 94% because most of the 20 LPs are all maxed.

OK, the 94% overall CPU load supports that.  The graphs show the same heavy line at 0 and 100%, and I find it rare that a core would be firewalled at 100% without even a momentary dip.

7 minutes ago, SteveW said:

...in technical discussions it's also more professional and less disconcerting to avoid the terms "scratching of head", "having a hard time" and so on as are not necessary and put of those trying to learn something. 

I consider the terms quite appropriate to impart a sense of confusion about what was being said.  Maybe it's a US vs British language thing.  If anyone is truly so weak-kneed that "I'm scratching my head" (in America, a colloquialism for "this has me confused") makes them run away and hide, well, so be it. 


Bob Scott | President and CEO, AVSIM Inc
ATP Gulfstream II-III-IV-V

System1 (P3Dv5/v4): i9-13900KS @ 6.0GHz, water 2x360mm, ASUS Z790 Hero, 32GB GSkill 7800MHz CAS36, ASUS RTX4090
Samsung 55" JS8500 4K TV@30Hz,
3x 2TB WD SN850X 1x 4TB Crucial P3 M.2 NVME SSD, EVGA 1600T2 PSU, 1.2Gbps internet
Fiber link to Yamaha RX-V467 Home Theater Receiver, Polk/Klipsch 6" bookshelf speakers, Polk 12" subwoofer, 12.9" iPad Pro
PFC yoke/throttle quad/pedals with custom Hall sensor retrofit, Thermaltake View 71 case, Stream Deck XL button box

Sys2 (MSFS/XPlane): i9-10900K @ 5.1GHz, 32GB 3600/15, nVidia RTX4090FE, Alienware AW3821DW 38" 21:9 GSync, EVGA 1000P2
Thrustmaster TCA Boeing Yoke, TCA Airbus Sidestick, 2x TCA Airbus Throttle quads, PFC Cirrus Pedals, Coolermaster HAF932 case

Portable Sys3 (P3Dv4/FSX/DCS): i9-9900K @ 5.0 Ghz, Noctua NH-D15, 32GB 3200/16, EVGA RTX3090, Dell S2417DG 24" GSync
Corsair RM850x PSU, TM TCA Officer Pack, Saitek combat pedals, TM Warthog HOTAS, Coolermaster HAF XB case

Share this post


Link to post
3 minutes ago, w6kd said:

I consider the terms quite appropriate to impart a sense of confusion about what was being said.  Maybe it's a US vs British language thing.  If anyone is truly so weak-kneed that "I'm scratching my head" (in America, a colloquialism for "this has me confused") makes them run away and hide, well, so be it. 

That is often referred to as supercilious - it  only serves to make your post appear to be saying that anther is wrong. But they were not.

 

Edited by SteveW

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

Let's look at how a single core handles two identical threads:

SwitchingImproves.jpg

On the right the core is HT disabled - no Hyperthreading, each thread (one orange and one purple) gets a time slice and between each slice the CPU has to save the situation to be loaded later.

Now in a simple way of showing it, on the left is the same core with HT enabled, and with the same two threads arranged so that each thread occupies an LP to itself. The two threads are still time sliced, since there is only one core. But these two threads finish sooner with HT enabled because their context is saved.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...