Pete Dowson

DXGI_ERROR_DEVICE_HUNG error: Is there a solution?

Recommended Posts

4 hours ago, JoeFackel said:

I saw posts where someone clocked down (!) his 1080ti and got rid of the device_hung errors. At least the max. power consumption to 80% if i recall correctly.

Yes Joe that was myself, was the ONLY single thing that helped.  Absolutely nothing else and I mean nothing else worked at all. Tried every single driver, tried brand new OS installs of current W10, an older 1703 Build, old to new drivers (at least 18 of them now), even went back to W8.1 and still got the crashes.  Tried running a bare sim with just Orbx scenery, no addons like Active Sky or shader mods, tried the TdrLevel and TdrDelay edits in the registry, tried the ShowDeviceLostWarning=0 entry in the P3D config file, tried restoring my cpu overclock to default, tested my ram, like you Pete I do not overclock my GPU, its factory overclocked so I downclocked the core and memory speeds back the standard nvidia speeds.  After a final format of my system I reinstalled W10, fully updated it and installed a default P3D 4.3 (4.2 client also failed on a test install) and it still gave a Device_hung crash, default settings, default sim, no addons, default system clock speeds etc.  The latest thing to try which LM advised I think is to change PANELS_ALWAYS_ON_TOP=1 to 0.  I had hoped this had worked but on a test today after 3.5 hours I return home to see it had crashed.  The only thing that helps in my system is to set the Power Limit of my 1080ti to 75-80%, this value may give varied results with others.  Joe Young insists I have a faulty card but so many of us cannot have faulty cards, my card passes every benchmark and stress test like Furmark no problem.

I really had hoped you would have been the savior to these crashes Pete, I remember when I first got Prepar3D v1.3 or 1.4 back in 2012 and they were plaqued with D3D.dll crashes all the time which you fixed with FSCUIPC.  I believe there is something in the sim, or other games which might throw up this error, which does not like something an individual users graphics card is doing and gives us this error.

Chris

  • Upvote 1

Share this post


Link to post
Help AVSIM continue to serve you!
Please donate today!

1 hour ago, cj-ibbotson said:

Yes Joe that was myself, was the ONLY single thing that helped.  Absolutely nothing else and I mean nothing else worked at all. Tried every single driver, tried brand new OS installs of current W10, an older 1703 Build, old to new drivers (at least 18 of them now), even went back to W8.1 and still got the crashes.  Tried running a bare sim with just Orbx scenery, no addons like Active Sky or shader mods, tried the TdrLevel and TdrDelay edits in the registry, tried the ShowDeviceLostWarning=0 entry in the P3D config file, tried restoring my cpu overclock to default, tested my ram, like you Pete I do not overclock my GPU, its factory overclocked so I downclocked the core and memory speeds back the standard nvidia speeds.  After a final format of my system I reinstalled W10, fully updated it and installed a default P3D 4.3 (4.2 client also failed on a test install) and it still gave a Device_hung crash, default settings, default sim, no addons, default system clock speeds etc.  The latest thing to try which LM advised I think is to change PANELS_ALWAYS_ON_TOP=1 to 0.  I had hoped this had worked but on a test today after 3.5 hours I return home to see it had crashed.  The only thing that helps in my system is to set the Power Limit of my 1080ti to 75-80%, this value may give varied results with others.  Joe Young insists I have a faulty card but so many of us cannot have faulty cards, my card passes every benchmark and stress test like Furmark no problem.

Power supply to the GPU may be an issue.  I've seen, for example, where folks used a VGA power cable that has two connectors daisy-chained on the end of it to feed both aux power connectors on a high-end GPU.  That's a bad way to fly, as it is likely pulling the voltage on that single cable down enough to cause instability under load.  Another possibility would be a multi-rail power supply with an overloaded 12v rail connected to one or both power connectors on the GPU.  Or it could be as simple as a weak or underspecced PSU not keeping the voltage up.  Or a bad cable or bad cable connector.  The fact that reducing the power load limit stops the problem is suspicious.

The second possible culprit for errors under load is heat.  The GPU temp does not tell the story of on-board voltage regulator temps, memory temps, etc.  Could be the GPU core is operating at a temp that's comfortably between the lines, but another section of the GPU is overheating due to a poor cooling solution, limited airflow, or a malfunctioning fan or fans and/or a bad fan curve that doesn't ramp the cooling up enough.

Regards

Share this post


Link to post
4 hours ago, Pete Dowson said:

I see. Ok. Thanks. I think that probably accounts for the fact that one of the GPUs was hardly used

Sorry, I thought you were running 2 GPUs not 3?  3 GPUs on the X series MBs will work fine so long as you dedicate them to each display, but don't use SLI performance mode.  But in your case I think you have distortion issues that need to be corrected by external software/hardware for your projectors?  As you found out, per LM, only 2-way SLI "performance mode" is supported.  Even then, driver support of SLI can sometimes break various rendering aspects (usually FX or AG goes missing and/or pops in/out) ... best to find a driver that works and stick with it when it comes to SLI performance mode (even more so for dedicate sim setup).

I've only seen DXGI error once (and only once) and that was after installing FSL Spotlights many months ago.  I seem to recall that problem got resolved either with FSL Spotlight update or driver/DDU reset.

Per my testing, SLI adds CPU overheard, as much as 17%.  SLI is great IF (important IF) you aren't capping out your CPU.  If you're capping out your CPU then SLI is probably going to make matters worse, not better.  Like I've suggested in other threads, use SLI to get better AA and DL and weather performance, do NOT use it if you want high AI traffic or high road/boat traffic or high AG building density ... those are going to stress the CPU and SLI will just stress the CPU that much more and FPS will actually be worse.

If you look at my 9900K testing and 7900X testing scenario I selected that location and settings specifically so as to NOT overload the CPU.  If I overloaded the CPU then you'll see no gains from the GPU side at all, doesn't matter what GPU you use, if the CPU is at 100% any testing beyond that is not really valid.

As always, it's a balancing act and that's why no two systems will be the same and no single solution "for all".

Cheers, Rob.

 

Share this post


Link to post
56 minutes ago, w6kd said:

Power supply to the GPU may be an issue.  I've seen, for example, where folks used a VGA power cable that has two connectors daisy-chained on the end of it to feed both aux power connectors on a high-end GPU.  That's a bad way to fly,

Bob, agree with you ... in fact in my recent 9900K build I foolishly assumed that modular PSUs could use modular cables across different brands of PSU and/or the fancy braided cables available online.  I found out that I absolutely could NOT mix and match PSU modular cables from different vendors and/or use custom braided cables unless they were "extension" only cables going 1 to 1 on every wire.  That one had me frustrated for a while thinking I had band pumps or faulty MB only to discover it was the PSU "modular" cabling ... I thought there were standards to these "modular" cables/PSUs, apparently I thought wrong ... "modular" doesn't mean interchangeable between vendors.

Lesson for me was:

1.  Always use the PSU vendors cables that come with the PSU (no exceptions)
2.  Never buy online fancy colored braided cables unless they are pin to pin EXTENSION cables

Cheers, Rob.

Share this post


Link to post
51 minutes ago, w6kd said:

Power supply to the GPU may be an issue.  I've seen, for example, where folks used a VGA power cable that has two connectors daisy-chained on the end of it to feed both aux power connectors on a high-end GPU.  That's a bad way to fly, as it is likely pulling the voltage on that single cable down enough to cause instability under load.  Another possibility would be a multi-rail power supply with an overloaded 12v rail connected to one or both power connectors on the GPU.  Or it could be as simple as a weak or underspecced PSU not keeping the voltage up.  Or a bad cable or bad cable connector.  The fact that reducing the power load limit stops the problem is suspicious.

The second possible culprit for errors under load is heat.  The GPU temp does not tell the story of on-board voltage regulator temps, memory temps, etc.  Could be the GPU core is operating at a temp that's comfortably between the lines, but another section of the GPU is overheating due to a poor cooling solution, limited airflow, or a malfunctioning fan or fans and/or a bad fan curve that doesn't ramp the cooling up enough.

Regards

Ive a good quality 1000w OCZ Gold with the correct cables to the card, no splitters or convertors.  Card needs a 6 and an 8 pin supply.  I did change the cables to the gpu a few months ago as I did once get an error on boot saying there was no power to the graphics card, only ever happened once.  Seriously dont want to risk trying another psu at a huge cost possibly for nothing and I do not know anyone with a pc, as we are a dying breed.  MSI Afterburner and HWMonitor tell me the gpu is currently at 66 degrees.  I have in past tests set the gpu fans manually at 100%, can see and hear they ramp up to full speed but it did not prevent crashes.  Ive also been working hard this week on my cpu temps changing the configuration of the AIO water cooler to different positions and fan arrangements.  Currently running at 4.5 and maxing at 61 degrees which sim running.  Current Min temp is less than 20 with one core showing it went as low as 14 degrees 🙂

Share this post


Link to post
3 minutes ago, Rob Ainscough said:

Never buy online fancy colored braided cables unless they are pin to pin EXTENSION cables

Not even nice pink ones? 🤔

  • Like 1

Share this post


Link to post

I'm having this same problem. I created the other thread topic referenced in here, so I'll just continue in here with you guys. I ran the DDU in safe mode, updated to the latest driver, and still the same crash. GPU (GTX 1080) usage was only 72%, GPU Memory 81%, and voltage 72%. Nothing anymore than it's been for all of my other time in the sim. I already have TdrDelay=8 too. I'm really stumped on this as I never had this before 4.4.

Share this post


Link to post
18 minutes ago, pvupilot said:

I'm having this same problem. I created the other thread topic referenced in here, so I'll just continue in here with you guys. I ran the DDU in safe mode, updated to the latest driver, and still the same crash. GPU (GTX 1080) usage was only 72%, GPU Memory 81%, and voltage 72%. Nothing anymore than it's been for all of my other time in the sim. I already have TdrDelay=8 too. I'm really stumped on this as I never had this before 4.4.

There's another TDR setting you can try in addition to TdrDelay, which is TdrDdiDelay (a DWORD value created the same way and in the same registry location as TdrDelay).  The default is 5, I have been using 20 for years, along with 10 for TdrDelay.  The TdrDdiDelay sets the amount of time a thread can leave the driver.  Note that if you set TdrLevel to 0, none of these delay settings are used--they're meaningless--as the detection of driver errors by the TDR code is disabled.

Also, for those using factory-overclocked GPUs, a factory overclock IS still an overclock.  I don't know what sort of binning/testing process is used by the various OEMs, but several years ago I had a couple very high-end 980Ti water-cooled boards in an SLI configuration that would not run stable at their aggressive advertised factory overclock, but they worked like champs with a relatively modest downclock from the factory settings (still overclocked significantly above the factory settings).  So it would not be at all surprising to me that some folks might run into stability issues at the factory settings on some of the factory-overclocked boards.  And overclocked w/r/t GPUs usually means both the core clocks and the memory clock, which are set independently.  So if you're having trouble, I'd start with both core and memory clocks at nVidia stock settings and work progressively up from there.

Regards

Share this post


Link to post
23 minutes ago, pvupilot said:

I'm having this same problem. I created the other thread topic referenced in here, so I'll just continue in here with you guys. I ran the DDU in safe mode, updated to the latest driver, and still the same crash. GPU (GTX 1080) usage was only 72%, GPU Memory 81%, and voltage 72%. Nothing anymore than it's been for all of my other time in the sim. I already have TdrDelay=8 too. I'm really stumped on this as I never had this before 4.4.

Have you tried the only fix that worked for me and using MSI Afterburner of EVGA Precision to reduce Power Limit setting to 80%

 

6 minutes ago, w6kd said:

There's another TDR setting you can try in addition to TdrDelay, which is TdrDdiDelay (a DWORD value created the same way and in the same registry location as TdrDelay).  The default is 5, I have been using 20 for years, along with 10 for TdrDelay.  The TdrDdiDelay sets the amount of time a thread can leave the driver.  Note that if you set TdrLevel to 0, none of these delay settings are used--they're meaningless--as the detection of driver errors by the TDR code is disabled.

Also, for those using factory-overclocked GPUs, a factory overclock IS still an overclock.  I don't know what sort of binning/testing process is used by the various OEMs, but several years ago I had a couple very high-end 980Ti water-cooled boards in an SLI configuration that would not run stable at their aggressive advertised factory overclock, but they worked like champs with a relatively modest downclock from the factory settings (still overclocked significantly above the factory settings).  So it would not be at all surprising to me that some folks might run into stability issues at the factory settings on some of the factory-overclocked boards.  And overclocked w/r/t GPUs usually means both the core clocks and the memory clock, which are set independently.  So if you're having trouble, I'd start with both core and memory clocks at nVidia stock settings and work up from there.

Regards

No TDR edits of any sort worked for me. I tried both the level and delay variants in both 32 and 64 bit entries. Tried 0, also 8 and even seen a post claim to try 60. Most did nothing or completely locked up the pc forcing me having to reboot.

My 1080ti is factory overclocked and one of the earliest things I did was to down clock both the core and memory to a big standard 1080ti speeds..didn't help at all

Chris

Edited by cj-ibbotson

Share this post


Link to post

Hi all:

The dreaded DXGI error has struck me. Just updated to a 1080ti. Tried the registry key with no luck. However, FWIW, on my last flight I was running GPU-Z in logging mode. TBH, I don't know if these numbers will help diagnose the problem, but thought I would contribute something. Have posted it in my dropbox account below.

Click me to see the GPU-Z error log at time of error

Share this post


Link to post
4 hours ago, Rob Ainscough said:

Sorry, I thought you were running 2 GPUs not 3?

No, always 3 on this rig since July 2017. It was clear in my earlier exchanges with you, I think, but easily forgotten I expect.

4 hours ago, Rob Ainscough said:

3 GPUs on the X series MBs will work fine so long as you dedicate them to each display

That was the intention, but we found that the performance was noticeably poorer than with everything on one GPU. So then we tried SLI and that was just the same, but smoother.

4 hours ago, Rob Ainscough said:

But in your case I think you have distortion issues that need to be corrected by external software/hardware for your projectors?

It's a version of Immersive Pro. It deals with the otherwise curved image (shame we can't switch the "flat screen" distortion added by Projectors off! ;-)), and merges delibrately ovelapped images to ensure smooth transitions.

But performance is identical without that process running.

4 hours ago, Rob Ainscough said:

Per my testing, SLI adds CPU overheard, as much as 17%.  SLI is great IF (important IF) you aren't capping out your CPU.  If you're capping out your CPU then SLI is probably going to make matters worse, not better.

Well, as I said, the performance as measured by FPS was identical with or without SLI. It was smoothness which was better with SLI. and, yes, with this multi-scenery screen setup core 0 has always hit 100% with my test scenario.

4 hours ago, Rob Ainscough said:

those are going to stress the CPU and SLI will just stress the CPU that much more and FPS will actually be worse.

Not here, sorry.

Since disabling SLI my FPS have been the same, (not checked smoothness because not had time to fly), but the single operating GPU is now at 90-95% usage in my test scenario. I've not checked the temperature. That's instead of up to 45% on each of 2 out of 3 with the 3-way SLI.

Will try to configure 2-way SLI tomorrow. 90-95% is worrying. 

Pete

 

Share this post


Link to post
16 minutes ago, Pete Dowson said:

and, yes, with this multi-scenery screen setup core 0 has always hit 100% with my test scenario

I think you missed my point, if your CPU is at 100% 1,2,3 GPUs will not make any difference.

16 minutes ago, Pete Dowson said:

It was clear in my earlier exchanges with you, I think, but easily forgotten I expect.

Very likely, your information unfortunately stayed in volatile memory and got lost when I turned in for the night. 🙂

1 hour ago, pgde said:

The dreaded DXGI error has struck me. Just updated to a 1080ti. Tried the registry key with no luck. However, FWIW, on my last flight I was running GPU-Z in logging mode.

I took a look at your data and looks to me your card went south (memory corruption) a few milliseconds before the DXGI error point:

DXGIErrors1.thumb.jpg.4e1f7c702713d9565ba4c09e445e5aa8.jpg

Sudden flush of memory usage and drastic frequency drop ... 

Cheers, Rob.

Share this post


Link to post
7 hours ago, Rob Ainscough said:

I think you missed my point, if your CPU is at 100% 1,2,3 GPUs will not make any difference.

1. It is not at 100% all the time. That is just in my test scenerios, heavy airport plus traffic.

2. SLI did make it run smoother.

3. Without SLI enabled I am now seeing 90-95% usage (according to Task Manager) on the one GPU in the test scenario, rather than 2 at 45% and one around 5% max. I find that a concern, and haven't even dared look at the temperature!

7 hours ago, Rob Ainscough said:

I took a look at your data and looks to me your card went south (memory corruption) a few milliseconds before the DXGI error point:

That data seems to date back to April. Is that when I got my first DXGI error? I don't recognise the data. Did i send it to you? What's it from?

More to the point, can you identify which card it is? 1st slot, 2nd or 3rd? The 2 which run at 45% might be 1st and 3rd if the order of GPUs in the Task Manager display is anything to go by. Didn't I read somewhere that those are the main slots in any case (and used for 2-way SLI) with the second slot only being for 3-way?

Or do I have to do trial and error?

Thanks,
Pete

 

Edited by Pete Dowson

Share this post


Link to post
3 hours ago, Pete Dowson said:

That data seems to date back to April. Is that when I got my first DXGI error? I don't recognise the data. Did i send it to you? What's it from?

Hi Pete,

I think we are seeing the American date format. I’m assuming these figures were recorded on 4th December, I.e. yesterday. Does that make more sense?

Regards,

Mike

Share this post


Link to post
3 hours ago, Pete Dowson said:

That data seems to date back to April. Is that when I got my first DXGI error? I don't recognise the data. Did i send it to you? What's it from?

Rob was responding to someone else

Share this post


Link to post
15 minutes ago, pracines said:

Rob was responding to someone else

Ah, so he was. Whoops! The first two parts were for me!

Pete

 

Share this post


Link to post

So Rob, does the memory corruption mean that the card is defective? Still under warranty so I can RMA it. I never had this problem with my 1070....

Thanks for your analysis.....

P. (the other Peter 😁)

Share this post


Link to post
33 minutes ago, pgde said:

So Rob, does the memory corruption mean that the card is defective? Still under warranty so I can RMA it. I never had this problem with my 1070....

Thanks for your analysis.....

P. (the other Peter 😁)

Sadly my own card is just outside the warranty (if it was just 12 months) I've had crashes since June or July but was convinced my card wasn't faulty as so many experience these crashes. I also did not want to be without a GPU for a month waiting for tests etc.

I might review running a log like you did and post results. Only thing is I can't sit in front of pc for hrs whilst it's testing so unsure at which point the log is applicable as I wouldn't know the exact time of the crash unless it happens whilst I'm present. Any tips?

Chris

Edited by cj-ibbotson

Share this post


Link to post

common issue this, mentioned on here countless times, mentioned and and deflected many times on the LM forumn as well.

Try 397.64 drivers that should eliminate the issue.  It has with me. Ive said a million times youll never get a huge multinational company like LM admitting theres an issue.

On a separate note I was watching matt davies on youtube the other night in work flying into singapore in some bad weather and the sim threw him into the deck.. 20 years on we still have wind shift issues and sillyness with p3d/fsx activesky which has nevber been fixed not in 20 years.

but as much as i found the video very funny ,  listening to the other dude (no idea who is he)  talking about airline operations and diversion airfields he clearly was absolutely clueless  about what happens with commercial and bad weather alternates  , I was glad its not just me that get throw to the ground in a little bit of turb.  

Same with this dxgi error in a way,  youll never ever ever ever ever ever ever get them to admit their software might be at fault, same as at my airline youll never get the bosses to admit the IFE is gash on a certain fleet,  too much liability and brand damage if they admit it.

 

 

Edited by tooting

Share this post


Link to post
29 minutes ago, tooting said:

...Ive said a million times youll never get a huge multinational company like LM admitting theres an issue...

Same with this dxgi error in a way,  youll never ever ever ever ever ever ever get them to admit their software might be at fault, same as at my airline youll never get the bosses to admit the IFE is gash on a certain fleet,  too much liability and brand damage if they admit it.

Respectfully, this is gibberish to me.  LM never developed Microsoft ESP or FSX for which Prepar3D is based on.  They brought on a team of experts who were developing Microsoft ESP, a program based off of the FSX engine.  We should be happy LM decided to bring the team aboard to further develop this wonderful product for the flight simulation community!!  I just cannot see how this error caused this particular crash. 

  • Like 2

Share this post


Link to post

Well, since the DXGI errors occurred with my 3-way SLI (which is apparently not supported by P3D in any case), but not with SLI disabled (but with the horrible result that I was getting 95% GPU usage -- not much headroom!), I tried 2-way SLI.

Since the 3-way usge for the 3 cards was maxing at 45% + 5% + 45% I disabled the middle one in Device Manager (to save fiddling about with the connectors for now).

After mny hours of testing I can attest to the fact that this eliminated the DXGI crash, even at night which was where I could reproduce it within minutes 100% of the time.

The loading was maxing at a healthier 65% + 65%, but more usually around 50% each. Nice sharing of the workload.

My next step (tomorrow) wil be to re-enable that GPU but disable the third one. This might not work (not sure how fussy SLI connections are), but if it does and gives the same good result, then I'll know the video card is okay. If not then it is definitely suspect.

Either way, one of them is coming out.

Pete

 

Share this post


Link to post

i dont get this error with any other software I use.  only P3D.  and if I use ANY driver later than 397.64  I get the issue.  But ANY driver before 397.64 I dont get it.

Please explain how this cant be an issue with P3D then. 

Share this post


Link to post
4 hours ago, pgde said:

So Rob, does the memory corruption mean that the card is defective?

There is obviously an issue there, I can't see software doing that ... BUT, you might want to hit up nVidia engineer and present them with your data an see what they suggest.

34 minutes ago, tooting said:

Same with this dxgi error in a way,  youll never ever ever ever ever ever ever get them to admit their software might be at fault

Who's the mysterious "them"?  Jumping to conclusions and point fingers doesn't help anyone and that approach never solves problems?

A quick google search with "dxgi error device hung" and you'll see the problem is common across all kinds of games/sims/etc., the solution in most cases is to reduce GPU/Memory frequency or return the GPU under warranty (RMA) or replace the GPU with another one.  

Cheers, Rob.

  • Like 1

Share this post


Link to post
2 minutes ago, tooting said:

i dont get this error with any other software I use.  only P3D.  and if I use ANY driver later than 397.64  I get the issue.  But ANY driver before 397.64 I dont get it.

Please explain how this cant be an issue with P3D then. 

From that 'analysis' it must surely be down to nVidia, messing up their drivers, not P3D? I don't really see how P3D could cause a GPU to report timeouts, hangs, or anything driver/hardware related. It's at the wrong level in the Windows architecture. The most it could do is put a severe enough load on it to show up failures elsewhere.

Continually looking for improved performance, I have updated drivers regularly (though a bit behind at present -- I have 416.22 downloaded but not yet installed).  But I only install the proper WHQL ones.

Pete

 

 

Share this post


Link to post

like Ive just said Rob.  Before if I use ANY driver later than 397.64  I get the issue.  But ANY driver before 397.64 I dont get it. 

I agree its absolutely a problem with 1080ti's and overclocks as you dont seem to get it on 980tis.

but theres a conflict between the drivers and p3d and the drivers after the 397.64 thats the issue.

and i know ive seen other posts about other games with dxgi issues.  It took me weeks to suss out which driver was the factor  

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now