Jump to content
Sign in to follow this  
MammyJammy

New P3D 5.3+ Affinity Mask Calculator

Recommended Posts

@kevinfirth. Thanks for explaining things. Crikey, it does get complicated doesn’t it.

I agree that leaving core 0 free is a waste because the OS doesn’t appear to use it. Just 1 core or 2 VPs sitting there doing bugger all. Odd that that option on the spreadsheet would be an option. Maybe for CPUs with more cores.

I’ll change my settings tomorrow. Back to the footie now. 😉

  • Like 1

Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post

Hi Steve, this is great information! Here is what I am using and my sim has never been smoother:

AffinityMask=4095
P3DCoreAffinityMask=1020
MainThreadScheduler=0
RenderThreadScheduler=2
FrameWorkerThreadScheduler=4

spacer.png

In the NCP I have the "Prepar3D.exe profile, for “Vertical sync” set to Adaptive (half refresh rate)" & Maximum Frame rate set to 30FPS.

In the P3D “Display Settings” I have Unlocked on the fps slider control, and set "VSync=OFF" (In your eg. above it is set to ON). Not sure if I should follow your lead and change that Display Settings VSync=ON. Just curious as to what you think. I am getting good results, performance wise. Like they say "if it ain't broke, don't fix it it!".

Appreciate your Insights.

Regards,

Tom

Edited by Ray Proudfoot
Long quoted post removed.
  • Like 1

i913900KF (5.8GHz) | Case: Fractal PopAir RGB I MSI Z790-VC | MSI Gaming RTX 4070Ti Super 16GB | Kingston Fury Beast 32GB DDR5 | SOLIDIGM P41 Plus 2TB NVMe M.2 SSD | Samsung SSD 870 EVO 2TB | Thermalright Frozen Notte 240 MM Liquid Cooling | Samsung 41" Monitor 1920 x 1080 60Hz | Honeycomb Alpha & Bravo | Logitech G Pro pedals | Tobii EyeTracker | 850W Thermaltake 80+ GOLD |

Share this post


Link to post
2 hours ago, Ray Proudfoot said:

Hi Steve, appreciate the reply. I notice on the first three cores (6 VPs) you’re using only one VP on each. Is that to allow sharing with the OS?

Steve has written previously how even though with Hyperthreading you have 2 LPs they actually share 1 real core. Therefore as I understand it if both LP's for say for core 2 are at 50% that consumes the entire cores capability so in reality if you use LP2 and not LP3 in the above recommendation it allows 100% of that core to be used by the Main Thread Scheduler only and not share Core 2 with background P3D rendering tasks or anything else assuming you've used something like Process Lasso or SImStarter to keep other things off LP2 and LP3 in this example. I guess if LP2 never ever exceeded 50% and any P3D rendering or other activity would never exceed 50% on LP3 if it were activated then yes maybe that virtual LP was wasted. However whenever the total of the two LP's or the single LP hits 100% stutters may occur. As I understand it hyperthreading does not double your computing power. If you look back or search Steve's posts he explains this in detail with math examples. 

I hope I have explained what I think I have learned from Steve accurately and well enough.

Joe

  • Like 2

Joe (Southern California)

SystemI9-9900KS @5.1Ghz/ Corsair H115i / Gigabyte A-390 Master / EVGA RTX 2080 Ti FTW3 Hybrid w 11Gb / Trident 32Gb DDR4-3200 C14 / Evo 970 2Tb M.2 / Samsung 40inch TV 40ku6300 4K w/ Native 30 hz capability  / Corsair AX850 PS / VKB Gunfighter Pro / Virpil MongoosT-50 Throttle / MFG Crosswind Pedals /   LINDA, VoiceAttack, ChasePlane, AIG AI, MCE, FFTF, Pilot2ATC, HP Reverb G2

Share this post


Link to post
3 hours ago, Ray Proudfoot said:

The references to 16 cores should be 12 of course. 😉

I’m running my monitor at 30Hz. Still undecided if HT On is better than HT Off. Is there any real improvement with one over the other with 6 cores?

The reference to 16 should have said 6 as it was mis-typed over. You have 6 cores whether HT is on or off. The difference with HT on means that the sim can put two tasks on the core one per LP.

The advantage of HT enabled is to gather scenery more quickly as Kevin explained. The Affinity settings are there to ensure we don't share a core with two tasks one per LP and maintain our fps capacity. With HT off it is impossible to share the core as there is only one LP per core and the core can provide 100% throughput guaranteed. With HT enabled and an AM that allows two tasks per core that main task cannot have full performance.

 

1 hour ago, TomCYYZ said:

Hi Steve, this is great information! Here is what I am using and my sim has never been smoother:

Appreciate your Insights.

I tested that and because each main task shares the core with a scenery loader then when scenery is gathered that drains performance from the main task and the fps capacity is reduced during the collection. It is better to think in terms of time between frames changing as scenery loads rather than fps. With VSync=On you might not see problems happening unless in dense areas of scenery whereby the limit of the main core is reached with the addition of the main task and the scenery loading being performed by only one core and fps drops below the Vsync.

The two examples I provided Ray work quite a bit better than 4095,1020,0,2,4 which ignores the basic requirements to put the main tasks on cores to themselves.

Kevin and Joe have mastered the problem admirably.

 

Edited by SteveW
  • Like 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

As Joe explained, each core can have one non-hyperthreading logical processor with HT disabled. With HT enabled that core appears to look like two physical cores. The way P3D works (and FSX) is to count the number of apparent cores and make a task on each one.

With 6 cores the sim has possibility to make 6 tasks Main, Render, FrameWorker, and three scenery gathering tasks each one per core.

With HT enabled the sim has possibility to make 12 tasks, Main, Render, FrameWorker, and nine scenery gathering tasks. There will be two per core. If this is allowed for the main tasks then those tasks will suffer when tasks on their sister LPs got going. We need to be flying along to see this happening so that new scenery, models and other objects are collected and set up in the background, those take seconds to complete.

Instead we mask the second LP of three cores to contain the three main tasks so that the sim only puts those tasks onto one LP of each core leaving the second LP free. so we end up with 3 cores one task per core and three cores each with two scenery gathering tasks making six. So we are using 9 LPs of the 12 but we are still using all six cores.

The question was asked about other processes using those free LPs, they are not free the core is in use and whether we have HT on or off the thousands of threads on the system use all the cores anyway. But when the sim is running we don't run other things anyway, if we do have add-ons running we can control where they run by corralling them onto LPs running the scenery loaders.

Scenery loaders work in parallel two per core, which accelerates scenery loading and improves stability in the main tasks and that helps maintain fps. Main tasks are monolithic and require access to maximum throughput so we don't want other tasks on those cores as should be self evident.

 

  • Like 2
  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
2 hours ago, joepoway said:

Steve has written previously how even though with Hyperthreading you have 2 LPs they actually share 1 real core.

Agreed. So far, so good. 👍

2 hours ago, joepoway said:

so in reality if you use LP2 and not LP3 in the above recommendation it allows 100% of that core to be used by the Main Thread Scheduler only and not share Core 2 with background P3D rendering tasks

Is that based on the second scenario Steve described?...

MainThreadScheduler = 2 = core 1 LP2

I understand the basics of using each core / vp. It’s when it comes to the Main, Render and Frameworker entries plus how scenery loading works in conjunction with those it gets tricky. Does P3D use any cores / VPs for loading scenery that aren’t in use by those three entries?

2 hours ago, joepoway said:

assuming you've used something like Process Lasso or SImStarter to keep other things off LP2 and LP3

I do use SimStarterNG to assign my executables to specific VPs. Generally 10 and 11. There are four so two each to 10 and 11. Is it okay to share those VOs with P3D?

2 hours ago, joepoway said:

As I understand it hyperthreading does not double your computing power.

That’s my understanding too. Where I’m not sure is where Hyperthreading ON becomes better than it being OFF or vice versa. Is it dependent on a certain number of cores?


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post

Let's look at an HyperThreading enabled core:

We get two Logical Processors (LPs) emulated by that one core with a little extra circuitry on the core that allows each to read data and instructions simultaneously. However, the core can only be doing the work of one LP at a time, computing those instructions and delivering the result data. A non-hyperthreading core does not continue reading data and instructions for other threads it is only one LP.

The two HT LPs are time-sliced on the core so that they appear to work at the same time but as I have said the core only computes for one at a time. To do that, the core maintains the current situation for both LPs, which includes registers, code and so on, for both LPs.

The HyperThreding core keeps hold of the situation for two LPs ready for when it switches between them, there's no time lost. With a non-hyperthreading core it must store and recall the situation at each time-slice.

Put simply, two threads, one per LP on an HT core, finish sooner than those same two threads on a non-HT core with its single LP.

So to use that extra HyperThreading performance, to consume two threads faster with two LPs, we run parallel tasks on them! But if we want to run a time sensitive task that might require anything up to 100% of the capacity of that core, we avoid allowing the other LP to run a task so that the core is not shared.

As Joe said, if two LPs of an HT core are at 100% they are each actually only getting half the time of that core.

In P3D we have three tasks that are time sensitive, we allow one per core, using three cores if we have them. Those cores we can't use the parallel power of HT. All the remaining LPs of the other cores will have parallel tasks two per core where they compute faster on the same bus bandwidth. We can see this when we enable HT we get much faster file loading and networking on the system because those tasks are highly threaded.

With fewer HT cores, four is a good example, we can allow the two secondary main tasks (Render and FrameWorker) to go on one LP each of a core. They work better together on a core if we have to, because they perform with smooth demands. This is better than shared with a scenery gathering task because those spike a lot in their demand.

The main task (MainThreadScheduler) is the one that sets up the fps, or rather sets the time to compute each frame. If we reduce the performance of that compute cycle the time between each frame takes longer. If that core reaches 100% there is no more time available in the core to compute the frame and then as objects and scenery increase in complexity and demand, the time between frames increases, reducing the fps. So we basically must protect that task with a core to itself if we want maximum detail in the sim. We make settings so that the core reaches just below 100% in the denser areas where we fly and fps is maintained with overhead in the core.

Meanwhile, we have as many scenery gathering tasks as we can get, two per core works better than one per core as it would be with HT disabled. When reading simulator files and arranging the objects and scenery, with HT enabled there's no competition with HT disabled, HT disabled cannot keep up.

With the main task we basically get the same performance from that core HT enabled or disabled. We should get the same main task performing the same in HT on or off. We do not if we share it and mismanaged Affinity produced the HT=OFF phenomenon whereby that guarantees the main task gets all 100% no AM required. With HT enabled we must apply an AM=01 to that MainThreadScheduler core so that it appears to P3D to have only one LP.

.

 

Edited by SteveW
  • Like 3

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
48 minutes ago, Ray Proudfoot said:

Is that based on the second scenario Steve described?...

MainThreadScheduler = 2 = core 1 LP2

I understand the basics of using each core / vp. It’s when it comes to the Main, Render and Frameworker entries plus how scenery loading works in conjunction with those it gets tricky. Does P3D use any cores / VPs for loading scenery that aren’t in use by those three entries?

Yes Ray I referred in my explanation to his second scenario, you are correct.

However I would not use that scenario I would use the first recommendation since I have seen no real benefit from leaving core 0 for the operating system.

I should note I use an i9-9900ks with hyperthreading On so I have more cores but the principle is the same.

In the recommended scenario 1 :

11,11,11,01,01,01=AffinityMask = 4053
11,11,11,01,01,01=P3DCoreAffinityMask = 4053
00,00,00,00,00,01=MainThreadScheduler = 0 = core 0 
00,00,00,00,01,00=RenderThreadScheduler = 1 = core 1 
00,00,00,01,00,00=FrameWorkerThreadScheduler = 2 = core 2 

In simple terms (I hope) from right to left in the above example think of the logical processors as P1, P2, P3, P4 and so on to P12 for 6 cores with HT 

MainThreadScheduler= 0  puts  Main thread on P1 which is on the first of 6 cores

P3D is told with AM=4053 not to use P2 (HT of first core) therefore no scenery rendering etc is put on the same shared core as MainThread

RenderThreadScheduler = 1 puts  Render Thread on P3 which is on the second of the 6 cores

P3D is told with AM=4053 not to use P4 (HT of second core) therefore no scenery rendering etc is put on the same shared core as RenderThread

FrameWorkerThreadScheduler = 2  puts  FrameWorkerThread on gets P5 which is on the third of 6 cores

P3D is told with AM=4053 not to use P6 (HT of third core) therefore no scenery rendering etc is put on the same shared core as FrameWorker

By doing this you are directing P3D where the main threads should be run and run on separate cores (new feature with v5.3) and keeping any other P3D activities or programs launched by P3D from using the "sister HT's"  (in my terms P2,P4, and P6) of those three primary thread cores in use.

Additionaly I recommend using a utility to coral other high use add-ons off the first three cores including the "sister HT's".

That would mean nothing else on P1, P2, P3, P4, P5, and P6  which is my "plain English" reference it's really C0, C1, C2, C3, C4 and C5 in things like Process Lasso if that makes sense.

I hope this helps and doesn't further confuse things. The way cores, LP's, and HT with all the affinity mask stuff are described is very confusing and I hope I haven't added to the confusion 🙂

 

Lastly regarding HT on or off there's a million opinions but mine is if you overclock and it's stable and you don't have heat issues I would use HT but as stated above keep things off the "HT portions" of the first three cores assigned for the three main threads of P3D. 

 

Good luck

Joe

Edited by joepoway
  • Like 4

Joe (Southern California)

SystemI9-9900KS @5.1Ghz/ Corsair H115i / Gigabyte A-390 Master / EVGA RTX 2080 Ti FTW3 Hybrid w 11Gb / Trident 32Gb DDR4-3200 C14 / Evo 970 2Tb M.2 / Samsung 40inch TV 40ku6300 4K w/ Native 30 hz capability  / Corsair AX850 PS / VKB Gunfighter Pro / Virpil MongoosT-50 Throttle / MFG Crosswind Pedals /   LINDA, VoiceAttack, ChasePlane, AIG AI, MCE, FFTF, Pilot2ATC, HP Reverb G2

Share this post


Link to post

Awesome post Joe!

  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

Thanks Steve & Joe, Great advice!  I am going to go with Scenario 1.

 

  • Like 1
  • Upvote 1

i913900KF (5.8GHz) | Case: Fractal PopAir RGB I MSI Z790-VC | MSI Gaming RTX 4070Ti Super 16GB | Kingston Fury Beast 32GB DDR5 | SOLIDIGM P41 Plus 2TB NVMe M.2 SSD | Samsung SSD 870 EVO 2TB | Thermalright Frozen Notte 240 MM Liquid Cooling | Samsung 41" Monitor 1920 x 1080 60Hz | Honeycomb Alpha & Bravo | Logitech G Pro pedals | Tobii EyeTracker | 850W Thermaltake 80+ GOLD |

Share this post


Link to post

The P3DCoreAffinityMask in v5.3+ is easiest to consider as being the old fashioned FSX/P3D AffinityMask which as we know confines the simulator tasks to those LPs unmasked with a "one" in the binary notation.

If our binary is "000011", decimal 3, for a six core CPU we are allowing only two cores, 0 and 1, the first and second cores reading from right to left.

In FSX we might set up an AM that uses six cores of eight "00111111", decimal 63, cores 0 to 5. The sim would start up on those cores, three scenery loaders and three main tasks. Incidentally that sounds the same as P3D, and it is more or less, but some work has moved around between those threads  to produce a more even spread of work between them.

After FSX has done that, allocated out the tasks, it expands affinity to all cores, "11111111", decimal 255, cores 0 to 7. That is so that if another process is started by FSX, example from exe.xml, that process "sees" all 8 cores.

In Ideal Flight this is called EFSA (Expanded Flight Sim Affinity).

With P3D (prior to v5.3+), those processes started from within P3D were confined to whatever AffinityMask we applied, guaranteeing they use cores of the simulator.

Now, with P3D v5.3+ we can expand the affinity from the confinement of "P3DCoreAffinityMask" with "AffinityMask". Processes started from within the sim get access to those cores in the expanded AffinityMask.

Just like FSX, but better still, we can arrange it to not allow processes onto our sensitive main task cores. In "AffinityMask" we must avoid sharing those three important cores as we do in "P3DCoreAffinityMask" and if we have more cores we can extend the cores allocated in "P3DCoreAffinityMask" for add-ons. However, in the manual it reminds us that it is the responsibility of the developer of the add-on to arrange for it to use the best cores.

If the add-on doesn't do Core Affinity we can start it ourselves with a batch file or with an external affinity steering program like Process Lasso so that it is not allowed onto those sensitive cores. We can allow programs to run on those cores with the scenery tasks and we allow at least two per program.

 

Edited by SteveW
  • Like 3
  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post

@joepoway, that is the best explanation I have read with all due respect to Steve who has also been extremely helpful.

I fully understand what the purpose of each command in the JobScheduler section now does. Your post really does deserve its own topic so it can be placed in the Tips & Tricks section. Would you create a new topic with that info and I’ll do the rest.

  • Upvote 1

Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
6 hours ago, SteveW said:

If the add-on doesn't do Core Affinity we can start it ourselves with a batch file or with an external affinity steering program like Process Lasso so that it is not allowed onto those sensitive cores. We can allow programs to run on those cores with the scenery tasks and we allow at least two per program.

The executables launched via entries in the two add-on.cfg files remain outside direct control. One example is couatl64.exe - part of FSDT - so each time I have to manually change its Affinity setting.

Any clues as to how that can be automated? Currently it inherits the same AM as P3D using all cores / VPs.


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
8 minutes ago, airservices said:

I don't use AM and my sim runs smooth 👍

I can assure you you do. Look in Prepar3D.cfg at the [JobScheduler] section. Those entries confirm an AM has been set by P3D on launch.


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...