Branimir

DXGI_ERROR_DEVICE_HUNG error: Is there a solution?

Recommended Posts

On 12/4/2018 at 9:10 PM, w6kd said:

There's another TDR setting you can try in addition to TdrDelay, which is TdrDdiDelay (a DWORD value created the same way and in the same registry location as TdrDelay).  The default is 5, I have been using 20 for years, along with 10 for TdrDelay.  The TdrDdiDelay sets the amount of time a thread can leave the driver.  Note that if you set TdrLevel to 0, none of these delay settings are used--they're meaningless--as the detection of driver errors by the TDR code is disabled.

I tried both of these today using the 32bit DWord, I returned my GPU's power level to 100% (always crashes at this) and those registry settings did nothing to help, the sim crashed with a DEVICE_HUNG error within 5 mins.  I will delete them and try the QWord version

Chris

Share this post


Link to post
Help AVSIM continue to serve you!
Please donate today!

4 hours ago, joepoway said:

Hi Chris

I could be wrong but I thought you swapped cards a few weeks ago with your son and demonstrated the issue was your card, perhaps I have you confused with someone else.

Regardless I assume you tried the 397.64 drivers as stated by someone else above as "the solution" and it didn't solve the problem. BTW it didn't solve any of my friends problem with this issue.

As I stated previously in this forum only a new card fixed the issue for a couple of my friends. IF there are emerging card issues that would explain why this issue DOES show up across several software games and sims as I have also seen in my research, not just P3D.

I look forward to Pete's conclusion when he reactivates the card he disabled to see if indeed his issue was caused by his card as well.

Joe

Hi Joe, yes I tried my sons 1080 (non ti) and it appeared to work for 6 hours, he had to return to university so I could only test one afternoon.  Given some have problems and many dont I thought perhaps his card was one of the safe ones.  I actually came close to swapping him my £720 card for his £400 card.  I tried basically every driver back to 382 or something, Ive deleted the downloaded drivers now as none helped.  I dont know what to do, you can barely get 1080ti cards now, the UK's biggest computer retailer where I bought my own sells a few for £900, £200 more than I paid for mine, and they are lower spec.  I doubt if I managed to convince them my card is 'faulty' they would exchange.  I probably dont even have a warranty anymore with them or nVidia, no idea what my rights are and given Christmas is coming up Id be without a pc for at least a month..possibly too for shipping both ways and testing.

4 hours ago, tooting said:

try the 397.64 drivers, I bet you a discounted virgin mates rates ticket in upper.... you wont get the issue, im that confident.

All drivers from current back to something like 382.xx failed for me.  I tried a good 18 of them back as far as those which were compatible with my 1080ti.  These crashes have been reported since v3.2 or 3.4 dating back 2 years on LM forums and are well before the 1080ti came out.  Users experience these crashes even on AMD cards.

 

4 hours ago, Rob Ainscough said:

Do some basic internet searches, you'll see a very common solution ... but don't let that stop you from jumping to conclusions.  The more the app/game/sim makes use of VRAM the higher the probability it will expose hardware issues ... P3D can use A LOT of VRAM and can stress a GPU (SSAA will do that), TEXTURE_SIZE_EXP=9 or 10 will use anywhere from 4GB to 10+GB VRAM.

Cheers, Rob.

Rob during many of my tests over the last few months I got DXGI crashes on a fresh Windows 10 install and a default sim with default settings so there was basically no load on my card.  Could there not be something in the coding (Direct X or something) of P3D and some other titles which cannot handle some minor glitch in many users cards?  My card passes every other game at 4K and benchmarks and stress testing utilities without a hiccup.  Ive had crashes for the past 6 months and sadly my card is now past its 12 month warranty.  I never sent it back as I would assume the shop would run the same type of stress tests and just return the card to me.  

Chris

Edited by cj-ibbotson

Share this post


Link to post
46 minutes ago, cj-ibbotson said:

Hi Joe, yes I tried my sons 1080 (non ti) and it appeared to work for 6 hours, he had to return to university so I could only test one afternoon.  Given some have problems and many dont I thought perhaps his card was one of the safe ones.  I actually came close to swapping him my £720 card for his £400 card.  I tried basically every driver back to 382 or something, Ive deleted the downloaded drivers now as none helped.  I dont know what to do, you can barely get 1080ti cards now, the UK's biggest computer retailer where I bought my own sells a few for £900, £200 more than I paid for mine, and they are lower spec.  I doubt if I managed to convince them my card is 'faulty' they would exchange.  I probably dont even have a warranty anymore with them or nVidia, no idea what my rights are and given Christmas is coming up Id be without a pc for at least a month..possibly too for shipping both ways and testing.

All drivers from current back to something like 382.xx failed for me.  I tried a good 18 of them back as far as those which were compatible with my 1080ti.  These crashes have been reported since v3.2 or 3.4 dating back 2 years on LM forums and are well before the 1080ti came out.  Users experience these crashes even on AMD cards.

 

Rob during many of my tests over the last few months I got DXGI crashes on a fresh Windows 10 install and a default sim with default settings so there was basically no load on my card.  Could there not be something in the coding (Direct X or something) of P3D and some other titles which cannot handle some minor glitch in many users cards?  My card passes every other game at 4K and benchmarks and stress testing utilities without a hiccup.  Ive had crashes for the past 6 months and sadly my card is now past its 12 month warranty.  I never sent it back as I would assume the shop would run the same type of stress tests and just return the card to me.  

Chris

I feel for you Chris. A friend of mine has an Gigabyte Arorus 1080ti with a 3 or 4 year warranty. He was having the same issues as you we swapped cards all was well so he contacted the vendor and received a new card in a week or so and all is good for now, this just happened last month and his card was about 6 months old.

Joe

Share this post


Link to post
6 minutes ago, joepoway said:

I feel for you Chris. A friend of mine has an Gigabyte Arorus 1080ti with a 3 or 4 year warranty. He was having the same issues as you we swapped cards all was well so he contacted the vendor and received a new card in a week or so and all is good for now, this just happened last month and his card was about 6 months old.

Joe

I'll send Overclockers.co UK an email. They do have the Arorus in stock but it's a slower version of my card I think. I paid £720 for mine a year ago but all £899 on their website. What did your friend say given that many cards perform fine except for the dxgi crashes in P3d?

Chris

Edited by cj-ibbotson

Share this post


Link to post
1 hour ago, cj-ibbotson said:

Could there not be something in the coding (Direct X or something) of P3D and some other titles which cannot handle some minor glitch in many users cards?

Sure apps/software can trigger CTDs, freezes, etc. ... but the DEVICE HUNG exception is being "trapped" or more accurately "handled" ... you can see it's a P3D dialog/window displaying the DEVICE HUNG message, that IS a P3D Window which means exception processing in P3D caught the exception and P3D code is still active and running.  That exception would be relayed thru DX11 API which would be working with the nVidia driver ... so this means the application code (P3D) is working as it should and trapping the device error.

There are really only two possibilities:

1.  Corrupted or faulty nVidia driver

2.  Component failure in hardware that could be intermittent ... given the data log provided early pgde here: 

It looks to me like a memory timing failure (VRAM is just faster RAM and subject to all the same timing issues as regular RAM) ... but I'm not an nVidia engineer so they would be the ones to best comment.  Also, another factor in your case is that you downgraded to a lesser GPU with less power requirements as a "test" to see if the problem would trigger and the problem did NOT trigger.

If you're comfortable with pulling apart your GPU, you can remove cooler but do it VERY slowly and check the heat transfer pads are correctly placed on the appropriate components of the GPU (usually memory chips and the VRMs).  I've taken apart many GPUs over the years as I always replace the air coolers with water blocks and I've seen some pretty poor quality control on placement of these heat transfer pads, some not even on the VRM or memory chip and skewed off to the side (this was also the case with one of my 2080Ti I recently changed to waterblock setup).  I've also see very sloppy application of GPU thermal paste (3X the quantity needed and overflowing all over the place).  This is a "free" option if you're comfortable taking these GPU's apart ... just some tiny screws ...  you can use an EK waterblock installation manual if you need a guide of what screws to remove here: https://www.ekwb.com/shop/EK-IM/EK-IM-3830046994912.pdf

Cheers, Rob.

EDIT: Some additional diagnostics you can do:

1.  Use DDU in Safe Mode to uninstall the driver as if for a NEW GPU install, remove the GPU and move it to another free PCIe X16 slot.
2.  For any file corruptions you can run CMD (Run As Admin) "SFC /scannow" and see if it reports anything, if it does check the CBS log file for reported errors
3.  Run DISM /Online /Cleanup-Image /CheckHealth, and DISM /Online /Cleanup-Image /ScanHealth, and DISM /Online /Cleanup-Image /RestoreHealth

I doubt this three items will solve your issue, but they are good to run regardless and they will find OS issues.

  • Like 2

Share this post


Link to post
13 minutes ago, Rob Ainscough said:

Sure apps/software can trigger CTDs, freezes, etc. ... but the DEVICE HUNG exception is being "trapped" or more accurately "handled" ... you can see it's a P3D dialog/window displaying the DEVICE HUNG message, that IS a P3D Window which means exception processing in P3D caught the exception and P3D code is still active and running.  That exception would be relayed thru DX11 API which would be working with the nVidia driver ... so this means the application code (P3D) is working as it should and trapping the device error.

There are really only two possibilities:

1.  Corrupted or faulty nVidia driver

2.  Component failure in hardware that could be intermittent ... given the data log provided early pgde here: 

It looks to me like a memory timing failure (VRAM is just faster RAM and subject to all the same timing issues as regular RAM) ... but I'm not an nVidia engineer so they would be the ones to best comment.  Also, another factor in your case is that you downgraded to a lesser GPU with less power requirements as a "test" to see if the problem would trigger and the problem did NOT trigger.

If you're comfortable with pulling apart your GPU, you can remove cooler but do it VERY slowly and check the heat transfer pads are correctly placed on the appropriate components of the GPU (usually memory chips and the VRMs).  I've taken apart many GPUs over the years as I always replace the air coolers with water blocks and I've seen some pretty poor quality control on placement of these heat transfer pads, some not even on the VRM or memory chip and skewed off to the side.  I've also see very sloppy application of GPU thermal paste (3X the quantity needed and overflowing all over the place).  This is a "free" option if you're comfortable taking these GPU's apart ... just some tiny screws ...  you can use an EK waterblock installation manual if you need a guide of what screws to remove here: https://www.ekwb.com/shop/EK-IM/EK-IM-3830046994912.pdf

Cheers, Rob.

 

Although I think you are right your reasoning is still not water tight (pun). What if P3D is over-driving the GPU beyond specification and that somewhere in the driver specs it says that it is not the driver or GPU's responsibility to handle it? So if that is true, P3D exceeds the GPU spec, the GPU crashes and P3D handles the crash.

It would then still be P3D's fault wouldn't it?

It would be sad engineering practice I admit but is such a scenario possible perhaps via a bug in the GPU driver that allows P3D to exceed spec?

Share this post


Link to post
1 hour ago, Rob Ainscough said:

Sure apps/software can trigger CTDs, freezes, etc. ... but the DEVICE HUNG exception is being "trapped" or more accurately "handled" ... you can see it's a P3D dialog/window displaying the DEVICE HUNG message, that IS a P3D Window which means exception processing in P3D caught the exception and P3D code is still active and running.  That exception would be relayed thru DX11 API which would be working with the nVidia driver ... so this means the application code (P3D) is working as it should and trapping the device error.

There are really only two possibilities:

1.  Corrupted or faulty nVidia driver

2.  Component failure in hardware that could be intermittent ... given the data log provided early pgde here: 

It looks to me like a memory timing failure (VRAM is just faster RAM and subject to all the same timing issues as regular RAM) ... but I'm not an nVidia engineer so they would be the ones to best comment.  Also, another factor in your case is that you downgraded to a lesser GPU with less power requirements as a "test" to see if the problem would trigger and the problem did NOT trigger.

If you're comfortable with pulling apart your GPU, you can remove cooler but do it VERY slowly and check the heat transfer pads are correctly placed on the appropriate components of the GPU (usually memory chips and the VRMs).  I've taken apart many GPUs over the years as I always replace the air coolers with water blocks and I've seen some pretty poor quality control on placement of these heat transfer pads, some not even on the VRM or memory chip and skewed off to the side (this was also the case with one of my 2080Ti I recently changed to waterblock setup).  I've also see very sloppy application of GPU thermal paste (3X the quantity needed and overflowing all over the place).  This is a "free" option if you're comfortable taking these GPU's apart ... just some tiny screws ...  you can use an EK waterblock installation manual if you need a guide of what screws to remove here: https://www.ekwb.com/shop/EK-IM/EK-IM-3830046994912.pdf

Cheers, Rob.

EDIT: Some additional diagnostics you can do:

1.  Use DDU in Safe Mode to uninstall the driver as if for a NEW GPU install, remove the GPU and move it to another free PCIe X16 slot.
2.  For any file corruptions you can run CMD (Run As Admin) "SFC /scannow" and see if it reports anything, if it does check the CBS log file for reported errors
3.  Run DISM /Online /Cleanup-Image /CheckHealth, and DISM /Online /Cleanup-Image /ScanHealth, and DISM /Online /Cleanup-Image /RestoreHealth

I doubt this three items will solve your issue, but they are good to run regardless and they will find OS issues.

Thanks for this info Rob.  I am not happy taking it apart.  The retailer I purchased it from (Overclockers) literally just over one year ago no longer sells them but their other 10xx.xx gpus and 20xx range from Inno3D come with 2 or 3 yr warranties.  Inno3D isnt listed on the direct to manufacter list which is worrying as the list is very long.  I  would have more confidence Inno could diagnose a fault better than the shop whose terms state if they cannot find one they will return the card and charge for carriage.  I do not see any UK contact details for Inno which maybe the reason that they cant deal with warranties.  

I have always used DDU when installing the many many drivers I have tested.  I am currently using the latest.  No single driver has seen any beneficial results.  I also recently did  "SFC /scannow" as I got a couple of Kernalbase.dll crashes and Microsoft advised one poster to try the scan amongst other things. It did find a few errors and repair them.  I ran the scan a 2nd time and no errors.  I havent tried those other tests given that Ive had these crashes on a brand new installation with no other software installed.  Also been running a dual boot test with 2 versions of Windows 10, one on an SSD and another on a Velociraptor drive and both gave DXGI crashes so its not a drive error issue.  I got the card at the end of Nov 2017 but only started seeing DXGI crashes around June or July, pretty much not long after v4.3 came out.  Before that I didnt use the sim for long periods so unsure if the error always existed as it can take moments to many hours to crash.

42 minutes ago, glider1 said:

Although I think you are right your reasoning is still not water tight (pun). What if P3D is over-driving the GPU beyond specification and that somewhere in the driver specs it says that it is not the driver or GPU's responsibility to handle it? So if that is true, P3D exceeds the GPU spec, the GPU crashes and P3D handles the crash.

It would then still be P3D's fault wouldn't it?

It would be sad engineering practice I admit but is such a scenario possible perhaps via a bug in the GPU driver that allows P3D to exceed spec?

This is something I keep thinking, mainly given the card shows no other signs of faults and performs well on everything else, no apparent heat issues either and Ive gotten error on an entirely default and fresh W10 installation with a default sim and no add ons and no overclocks of any sort.  Perhaps most cards dont have an issue but some cards, as all cards I guess are different like cpus, get tripped up by something the sim is trying to do..the card or driver stops then P3D shows the error.

Im for contacting the vendor I bought it from and see what options I have. 

EDIT - just reading Overclockers forums regarding Inno3D and after goods are RMA'd to Overclockers they assess them and contact Inno3D, if they accept it is to be RMA'd to them for repair it has to then go to HONG KONG 😞 Takes 4-6 weeks

Chris

Edited by cj-ibbotson

Share this post


Link to post
6 minutes ago, glider1 said:

What if P3D is over-driving the GPU beyond specification

Like a tight shader loop not hitting it's precision point for exit?  Would that trigger a device hung - I'm not sure it would?  Or a race condition ... if it did, then it would most certainly trigger excessive heat and that would show up in data logs and throttling would happen ... per the logs provided early GPU temps were not a problem.  Memory timing errors would surface regardless of heat (within reason) ... that's almost random draw of allocation and hitting the bad memory chip or address range where the weak transistors are located.

Anything is possible, but what were trying to accomplish is what's the most "likely" source.  Because someone claims it only happens with P3D doesn't mean much, in fact, if you google search the DEVICE HUNG problem just about every person who reports the problems says exactly the same thing "only happens with Battlefield IV", "only happens with CoD", "only happens with Tomb Raider", "only happens with ... fill in your software of choice".

Cheers, Rob.

  

 

Share this post


Link to post
3 hours ago, cj-ibbotson said:

I tried both of these today using the 32bit DWord, I returned my GPU's power level to 100% (always crashes at this) and those registry settings did nothing to help, the sim crashed with a DEVICE_HUNG error within 5 mins.  I will delete them and try the QWord version

Microsoft's docs say it's a DWORD value.

https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

 

Share this post


Link to post
4 hours ago, cj-ibbotson said:

I'll send Overclockers.co UK an email. They do have the Arorus in stock but it's a slower version of my card I think. I paid £720 for mine a year ago but all £899 on their website. What did your friend say given that many cards perform fine except for the dxgi crashes in P3d?

Chris

He basically explained that the GPU based error kept happening more frequently and when he swapped out the card for another everything worked well and he could light switch the issue. They didn’t disput anything and he had to “ buy” the replacement and we they received the returned GPU they refunded his money.

Good Luck

Joe

  • Like 1

Share this post


Link to post
1 hour ago, joepoway said:

He basically explained that the GPU based error kept happening more frequently and when he swapped out the card for another everything worked well and he could light switch the issue. They didn’t disput anything and he had to “ buy” the replacement and we they received the returned GPU they refunded his money.

Good Luck

Joe

The thing there is that the replacement card might still suffer the same fate it is just a matter of time if it turns out that something in the software chain stresses the GPU and degrades it over time above what it can handle long term either by a violation of software specification or unofficially by a bad board level hardware design or bad chip level design.

I hear people always referring to 1000,2000 series cards and not 900 series where this happens. Is that true?

If it is, we have a big clue.

A big lead in this would be to research whether the bug also happens in 900 series or earlier GPU's. From word of mouth it does not. If it does not, why not in earlier series? What would P3D do differently on 1000,2000 GPU series that it doesn't do on 900 series GPU's? Even dynamic lighting can be done by a 900 series GPU can't it?

If the device hung bug does happen in 900 series cards as well, then you can rule out hardware design as a cause of this crash because it is too unlikely to span three generations of cards.

If it only happens on 1000,2000 series cards, I think it is unlikely that P3D with its old code base would treat 900 series differently to 1000,2000 series which would suggest that it is very likely a hardware level problem in the 1000,2000 series, EDIT or a driver level problem in the 1000,2000 series. But a driver level problem is unlikely, because P3D would be using the same function calls which would exist across all three generations of GPU driver.

 

Edited by glider1

Share this post


Link to post

I don't think that there is one single cause.

As an example, I was plagued with this error but "luckily" it developed into

P3D v4 simply closing Windows down.

I took a leap of faith and replaced my very good Corsair power supply.

Corsair were good enough to send me a new one, replacing like with like.

Since then there has not been one repetition, so the cause was a weakness in the

power supply, presumably exposed by P3D v4 which is the most demanding software

that is installed.

In the light of the above post, the graphics card was a GTX 970

but of course, it was not the cause of the problem, in this specific case.

 

Edited by nolonger

Share this post


Link to post

Well, I tried 2-way SLI with the 1st and 2nd GPU (the 2nd was the suspect), and, whilst it worked fine with a less dense scenery (EGCC), it crashed with a DXGI device hung error as soon as I loaded EGLL. This was day time! In my 3-way arrangement that only happened at night (unless FTX England was enabled).

So, either the middle card is faulty, or the slot it is in is bad for SLI. I could try swapping it into the 3rd slot and re-testing, but I have other things to do today and over the weekend, so I'll leave that test till next week.

Pete

 

Share this post


Link to post
8 hours ago, w6kd said:

I've seen differing posts saying to try the 64 bit keyword given the drivers and OS are 64 bit but neither versions have worked for me sadly.  I had my Power Limit of the card at 80%, a little higher than previous tests, and it crashed after an hour or two.  Set it to 75% before going to bed and the sim was still running 8 hours later so the power is causing some sort of issue.  Adjusting the Power Limit also adjusts the temp limit I think though the card has never reached the temperature limit if set at 100% which is to crash P3D

 

6 hours ago, joepoway said:

He basically explained that the GPU based error kept happening more frequently and when he swapped out the card for another everything worked well and he could light switch the issue. They didn’t disput anything and he had to “ buy” the replacement and we they received the returned GPU they refunded his money.

Good Luck

Joe

I seriously do not want to buy another card as Im reading on the forums of the vendor I bought the card from that Inno3D can easily refuse to honour a warranty.  One user found his card made a hissing noise then a pop then failure.  He returned it to Overclockers.co.uk who then contacted Inno3D and sent them photos.  Inno3D refused to allow an RMA and said the card was damaged by the user. He states it was no such thing and the photos show shroud damage which he believes either happened in transit or the retailer dropped the card.  Cards also have to be returned to flipping Hong Kong for repair which I seriously do not want to do.  Its worrying so many cards showing these errors when you google it, even many with the latest 2080's

 

5 hours ago, glider1 said:

I hear people always referring to 1000,2000 series cards and not 900 series where this happens. Is that true?

If it is, we have a big clue.

A big lead in this would be to research whether the bug also happens in 900 series or earlier GPU's. From word of mouth it does not. If it does not, why not in earlier series? What would P3D do differently on 1000,2000 GPU series that it doesn't do on 900 series GPU's? Even dynamic lighting can be done by a 900 series GPU can't it?

If the device hung bug does happen in 900 series cards as well, then you can rule out hardware design as a cause of this crash because it is too unlikely to span three generations of cards.

If it only happens on 1000,2000 series cards, I think it is unlikely that P3D with its old code base would treat 900 series differently to 1000,2000 series which would suggest that it is very likely a hardware level problem in the 1000,2000 series, EDIT or a driver level problem in the 1000,2000 series. But a driver level problem is unlikely, because P3D would be using the same function calls which would exist across all three generations of GPU driver.

 I did some googling this morning to try and find the early post on Lockheed Martins Forums dating back 2 years ago but their search facility keeps throwing up an error.  Other Google results did bring up a simmer using P3D and a 700 series card.  I got my 1080ti on 28th Nov 2017 but did not experience DXGI crashes until June or July 2018.  Ive wiped my system many times and gone back to old OS builds and old drivers and it still crashes as I immediately thought a driver or Windows update around June July may have triggered it but obviously not.

Chris

Share this post


Link to post

For those who have RMA'd their 1080ti due to this problem and received a replacement, has this solved the problem or does it reoccur?

Thanks and Happy Holidays!

  • Upvote 1

Share this post


Link to post
4 hours ago, cj-ibbotson said:

I've seen differing posts saying to try the 64 bit keyword given the drivers and OS are 64 bit but neither versions have worked for me sadly.  I had my Power Limit of the card at 80%, a little higher than previous tests, and it crashed after an hour or two.  Set it to 75% before going to bed and the sim was still running 8 hours later so the power is causing some sort of issue.  Adjusting the Power Limit also adjusts the temp limit I think though the card has never reached the temperature limit if set at 100% which is to crash P3D

I'd do a long GPU stress test run with OCCT while monitoring both the GPU temp and Vddc with GPU-Z or nVidia Inspector--and watch the 12v rail voltage on the mobo as well.  I'm wondering if bumping up the GPU voltage might help.

If you're using a modular PSU, reseating the cable connectors and possibly swapping them out wouldn't be a bad idea, either.

The best test would be to beg, borrow, or steal another 1080Ti to see if it's a bad/weak card, which seems more and more likely.

Regards

Share this post


Link to post
37 minutes ago, w6kd said:

or steal another 1080Ti

Or steal?? Lol!

Share this post


Link to post
1 hour ago, pgde said:

For those who have RMA'd their 1080ti due to this problem and received a replacement, has this solved the problem or does it reoccur?

Thanks and Happy Holidays!

It remedied the situation in the situation I know of. Perhaps there is a slow degradation in the 1080ti and when run in the P3D taxing situations will cause a memory or processsor glitch that throws this error?

Joe

Share this post


Link to post
2 hours ago, w6kd said:

I'd do a long GPU stress test run with OCCT while monitoring both the GPU temp and Vddc with GPU-Z or nVidia Inspector--and watch the 12v rail voltage on the mobo as well.  I'm wondering if bumping up the GPU voltage might help.

If you're using a modular PSU, reseating the cable connectors and possibly swapping them out wouldn't be a bad idea, either.

The best test would be to beg, borrow, or steal another 1080Ti to see if it's a bad/weak card, which seems more and more likely.

Regards

I've just come home from work. Have had the SIM running since midnight last night with Power Limit set to 75% it was crash free this morning and still running at 2pm today so I decided to enable logging in GPU-Z I arrived home from work at 6pm and noticed SIM crashed around 2:30pm so it ran for about 14 hours. I did notice on the log 2 entries at the apparent time of crash that the GPU load hit 100% though it has done this before when monitoring and not crashed no no other 100% entries in this log. I will attach the log later as heading back out. 

I will run those tests though unsure what to look out for? I cannot bump up any power limit or voltage at all otherwise I get dxgi crashes in P3D. Can't add any speed to core or memory or crashes. Even if I set power limit to 75% to get it fairly stable like my current test and I increased voltage the SIM will crash quickly. Any other game etc I can put voltage to max, power limit to max, increase core and memory speeds a lot and no crash. I've done this to test the card since P3d gives crashes. It's the only bloody thing which crashes so im not confident Overclocked will change it. They need to send it to Hong Kong which takes well over a month. If they refuse to fix it I've 2 special delivery courier charges to pay which maybe hefty given the value of the product. I only briefly had access to my son's 1080 as I do not know anyone with a gaming pc. Also I hear the vendor is quite ruthless in its valuation of they are to refund..they take age and wear and tear off the value and do not refund the price you paid. With 1080ti card costing £200 more than I paid a year ago I'd be well out of pocket

Chris

Edited by cj-ibbotson

Share this post


Link to post
28 minutes ago, cj-ibbotson said:

I will run those tests though unsure what to look out for? I cannot bump up any power limit or voltage at all otherwise I get dxgi crashes in P3D. Can't add any speed to core or memory or crashes. Even if I set power limit to 75% to get it fairly stable like my current test and I increased voltage the SIM will crash quickly. Any other game etc I can put voltage to max, power limit to max, increase core and memory speeds a lot and no crash. I've done this to test the card since P3d gives crashes. It's the only bloody thing which crashes so im not confident Overclocked will change it. They need to send it to Hong Kong which takes well over a month. If they refuse to fix it I've 2 special delivery courier charges to pay which maybe hefty given the value of the product. I only briefly had access to my son's 1080 as I do not know anyone with a gaming pc. Also I hear the vendor is quite ruthless in its valuation of they are to refund..they take age and wear and tear off the value and do not refund the price you paid. With 1080ti card costing £200 more than I paid a year ago I'd be well out of pocket

Increasing the power limit is not the same as increasing the voltage.  The power limit allows the GPU to exceed its design thermal design power by a specified margin.  Using an overclocking utility like MSI Afterburner, eVGA Precision, or nVidia Inspector, you can also specify a positive offset to be applied to the stock GPU voltage, which can stabilize things if it's running at the ragged edge of stability, which it sounds like it may be.  Sometimes just adding a few mV can make all the difference...I just had to do that with some RAM in a new build that would throw the occasional error until I bumped up the voltage just 15 mV over spec.

I feel for your plight with the card...but if the hardware is failing, then it's failing.  I buy eVGA cards exclusively to avoid the kinds of unhappiness you're having with the vendor and/or OEM.

Good luck!

Share this post


Link to post
1 hour ago, cj-ibbotson said:

I've just come home from work. Have had the SIM running since midnight last night with Power Limit set to 75% it was crash free this morning and still running at 2pm today so I decided to enable logging in GPU-Z I arrived home from work at 6pm and noticed SIM crashed around 2:30pm so it ran for about 14 hours. I did notice on the log 2 entries at the apparent time of crash that the GPU load hit 100% though it has done this before when monitoring and not crashed no no other 100% entries in this log. I will attach the log later as heading back out. 

Chris

Chris -- if you look closely at my log, you will see the exact same thing that you experienced == immediately before the crash, the GPU hit 100% for two rows and then crashed. And the GPU usage before that was relatively constant. I happened to be here starting a flight from PAJN (Juneau to Anchorage) to PANC and was just beginning a taxi to the active. Nothing should have caused the GPU to peak like that, other than a hardware problem (I made this flight a number of times with my 1070 without any problem). I also have had the card running at 100% (experimenting with Dynamic Lighting at night) without a crash. So, I wonder if something is broken on the card and the 100% reading is telling us there is a problem. I have done another log which I have posted here (2nd crash log) and the exact same thing happened == two rows of 100% then crash. Gigabyte provided me with a RMA without problem (probably since the card has a 3 year warranty). I had to order some anti-static bags to send the card back to Giagbyte which won't be here until next Monday or so. If you want me to duplicate a flight of yours to see if my card crashes where yours did, just let me know.

Regards,

P.

Edit: Also it can't be a temperature or power problem since the log entries for temp and TDP look fine.

Edited by pgde

Share this post


Link to post
1 hour ago, w6kd said:

Increasing the power limit is not the same as increasing the voltage.  The power limit allows the GPU to exceed its design thermal design power by a specified margin.  Using an overclocking utility like MSI Afterburner, eVGA Precision, or nVidia Inspector, you can also specify a positive offset to be applied to the stock GPU voltage, which can stabilize things if it's running at the ragged edge of stability, which it sounds like it may be.  Sometimes just adding a few mV can make all the difference...I just had to do that with some RAM in a new build that would throw the occasional error until I bumped up the voltage just 15 mV over spec.

I feel for your plight with the card...but if the hardware is failing, then it's failing.  I buy eVGA cards exclusively to avoid the kinds of unhappiness you're having with the vendor and/or OEM.

Good luck!

What I meant was increasing ANYTHING including voltage to the card triggers a quick crash even if I have the power limit set power to stabilise Prepar3D. With everything also on default increasing the voltage triggers a crash even a small increase. I forgot to say earlier I did change the modular cables from the PSU a couple of months back after getting these crashes, it came with 6 cables so all original. These increases only affect P3D. On tests elsewhere I can increase the core by 100 the memory by even 400 and it runs fine on 3 different Futuremark benchmark tests on max settings. If I increase either by even a tiny 25mhz P3D crashes so go figure. How can all other apps or programs I have cope on tests with massive increases but the SIM cannot cope with even a tiny increase or even default? 😞 I'm just home so will attach a log and hopefully Rob or yourself can shed some light

 

37 minutes ago, pgde said:

Chris -- if you look closely at my log, you will see the exact same thing that you experienced == immediately before the crash, the GPU hit 100% for two rows and then crashed. And the GPU usage before that was relatively constant. I happened to be here starting a flight from PAJN (Juneau to Anchorage) to PANC and was just beginning a taxi to the active. Nothing should have caused the GPU to peak like that, other than a hardware problem (I made this flight a number of times with my 1070 without any problem). I also have had the card running at 100% (experimenting with Dynamic Lighting at night) without a crash. So, I wonder if something is broken on the card and the 100% reading is telling us there is a problem. I have done another log which I have posted here (2nd crash log) and the exact same thing happened == two rows of 100% then crash. Gigabyte provided me with a RMA without problem (probably since the card has a 3 year warranty). I had to order some anti-static bags to send the card back to Giagbyte which won't be here until next Monday or so. If you want me to duplicate a flight of yours to see if my card crashes where yours did, just let me know.

Regards,

P.

Edit: Also it can't be a temperature or power problem since the log entries for temp and TDP look fine.

Thanks I'll attach the log shortly. Nine occured during the day and at 20000 ft over the Alps. I wasn't even 'flying'. Due to long tests I have simply allowed aircraft to slew right across Europe and back at around 160 knots. It can still trigger a crash even if no input from myself. Crashes are entirely random so not replicable. By the way my card never even came in an anti static bag just a foam surround, inner box and outer box packaging

Chris

Edited by cj-ibbotson

Share this post


Link to post

Perhaps my PSU is actually buggered. Couple of times lately it's failed to start when pressing the power button but internal board lights still on so removing the power lead for 5 mins helps. A few times lately no power and no lights. I tried different lead and still nothing. I switched all off and left and eventually got it on but bios clock changed to 2025. I changed motherboard battery and all seemed well for few days but again tonight no power showing. Now I don't know if it's the PSU or the card at fault. Looks like I'm going to have to spend even more on upgrade. Had considered a 9900K build but need CPU, mb, bigger AIO cooler and new case now looks like need to add another £150 for a PSU. Been put off by stories of very high temps. Also busy till mid Jan so hadn't planned on a new build yet. Wondering if I should do a new build and retain 1080ti for testing rather than send it back and problem has always been PSU? I changed pcix cables in July after on one boot I got an error that the graphics card had no power. This was after started experiencing dxgi crashes

Edited by cj-ibbotson

Share this post


Link to post
21 minutes ago, cj-ibbotson said:

What I meant was increasing ANYTHING including voltage to the card triggers a quick crash even if I have the power limit set power to stabilise Prepar3D. With everything also on default increasing the voltage triggers a crash even a small increase. I forgot to say earlier I did change the modular cables from the PSU a couple of months back after getting these crashes, it came with 6 cables so all original. These increases only affect P3D. On tests elsewhere I can increase the core by 100 the memory by even 400 and it runs fine on 3 different Futuremark benchmark tests on max settings. If I increase either by even a tiny 25mhz P3D crashes so go figure. How can all other apps or programs I have cope on tests with massive increases but the SIM cannot cope with even a tiny increase or even default? 😞 I'm just home so will attach a log and hopefully Rob or yourself can shed some light

Was this the original video card in your system when Windows was last installed?  Have the drivers been upgraded?

In Prepar3D.cfg, make sure there is one and *only* one Display.Device.xxxx section and that it matches your GPU.

Then uninstall your video driver, followed by a boot into safe mode and a run of DDU to remove all remnants of the old driver installation, then reboot and reinstall the latest driver (with A/V turned off) and with the "clean" installation option selected.

Very conflicting data here...your descriptions of the P3D crashes is indicative of a hardware problem, but that's incongruent with the ability to do successful test runs of benchmarks.

Given the great expense of sending your video card back, a rebuild from the OS on up might be advisable as a second-to-last resort before sending the GPU back--after, of course, backing up the entire system.  Sometimes a corrupt system file can render all the normal fault checking drills useless. 

Regards

Share this post


Link to post
9 minutes ago, cj-ibbotson said:

Perhaps my PSU is actually buggered.

As I mentioned, that was my problem and it damaged my Corsair H 80 at

the same time. Luckily, it didn't break anything else.

My thinking was and is if the software has been ruled out, the problem is hardware.

I am however only operating on logic, not extensive and deep computer knowledge.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now