Chapter 5. Common Problems

This section provides solutions to common problems associated with the NVIDIA Linux x86_64 Driver.

My X server fails to start, and my X log file contains the error:

(EE) NVIDIA(0): The NVIDIA kernel module does not appear to
(EE) NVIDIA(0):      be receiving interrupts generated by the NVIDIA graphics
(EE) NVIDIA(0):      device PCI:x:x:x. Please see the COMMON PROBLEMS
(EE) NVIDIA(0):      section in the README for additional information.

This can be caused by a variety of problems, such as PCI IRQ routing errors, I/O APIC problems or conflicts with other devices sharing the IRQ (or their drivers).

If possible, configure your system such that your graphics card does not share its IRQ with other devices (try moving the graphics card to another slot if applicable, unload/disable the driver(s) for the device(s) sharing the card's IRQ, or remove/disable the device(s)).

Depending on the nature of the problem, one of (or a combination of) these kernel parameters might also help:

Parameter Behavior
pci=noacpi don't use ACPI for PCI IRQ routing
pci=biosirq use PCI BIOS calls to retrieve the IRQ routing table
noapic don't use I/O APICs present in the system
acpi=off disable ACPI

My X server fails to start, and my X log file contains the error:

(EE) NVIDIA(0): The interrupt for NVIDIA graphics device PCI:x:x:x
(EE) NVIDIA(0):      appears to be edge-triggered. Please see the COMMON
(EE) NVIDIA(0):      PROBLEMS section in the README for additional information.

An edge-triggered interrupt means that the kernel has programmed the interrupt as edge-triggered rather than level-triggered in the Advanced Programmable Interrupt Controller (APIC). Edge-triggered interrupts are not intended to be used for sharing an interrupt line between multiple devices; level-triggered interrupts are the intended trigger for such usage. When using edge-triggered interrupts, it is common for device drivers using that interrupt line to stop receiving interrupts. This would appear to the end user as those devices no longer working, and potentially as a full system hang. These problems tend to be more common when multiple devices are sharing that interrupt line.

This occurs when ACPI is not used to program interrupt routing in the APIC. This often occurs on 2.4 Linux kernels, which do not fully support ACPI, or 2.6 kernels when ACPI is disabled or fails to initialize. In these cases, the Linux kernel falls back to tables provided by the system BIOS. In some cases the system BIOS assumes ACPI will be used for routing interrupts and configures these tables to incorrectly label all interrupts as edge-triggered. The current interrupt configuration can be found in /proc/interrupts.

Available workarounds include: updating to a newer system BIOS, trying a 2.6 kernel with ACPI enabled, or passing the 'noapic' option to the kernel to force interrupt routing through the traditional Programmable Interrupt Controller (PIC). Newer kernels also provide an interrupt polling mechanism to attempt to work around this problem. This mechanism can be enabled by passing the 'irqpoll' option to the kernel.

Currently, the NVIDIA driver will attempt to detect edge triggered interrupts and X will purposely fail to start (to avoid stability issues). This behavior can be overridden by setting the "NVreg_RMEdgeIntrCheck" NVIDIA Linux kernel module parameter. This parameter defaults to "1", which enables the edge triggered interrupt detection. Set this parameter to "0" to disable this detection.

X starts for me, but OpenGL applications terminate immediately.

If X starts but you have trouble with OpenGL, you most likely have a problem with other libraries in the way, or there are stale symlinks. See Appendix C, Installed Components for details. Sometimes, all it takes is to rerun ldconfig.

You should also check that the correct extensions are present;

    % xdpyinfo

should show the “GLX” and “NV-GLX” extensions present. If these two extensions are not present, then there is most likely a problem loading the glx module, or it is unable to implicitly load GLcore. Check your X config file and make sure that you are loading glx (see Chapter 3, Configuring X for the NVIDIA Driver). If your X config file is correct, then check the X log file for warnings/errors pertaining to GLX. Also check that all of the necessary symlinks are in place (refer to Appendix C, Installed Components).

When Xinerama is enabled, my stereo glasses are shuttering only when the stereo application is displayed on one specific X screen. When the application is displayed on the other X screens, the stereo glasses stop shuttering.

This problem occurs with DDC and "blue line" stereo glasses, that get the stereo signal from one video port of the graphics card. When a X screen does not display any stereo drawable the stereo signal is disabled on the associated video port.

Forcing stereo flipping allows the stereo glasses to shutter continuously. This can be done by enabling the OpenGL control "Force Stereo Flipping" in nvidia-settings, or by setting the X configuration option "ForceStereoFlipping" to "1".

Stereo is not in sync across multiple displays.

There are two cases where this may occur. If the displays are attached to the same GPU, and one of them is out of sync with the stereo glasses, you will need to reconfigure your monitors to drive identical mode timings; please see Appendix J, Programming Modes for details.

If the displays are attached to different GPUs, the only way to synchronize stereo across the displays is with a G-Sync device, which is only supported by certain Quadro cards. Please see Appendix X, Frame Lock and Genlock for details. This applies to seperate GPUs on seperate cards as well as seperate GPUs on the same card, such as Quadro FX 4500 X2. Note that the Quadro FX 4500 X2 only provides a single DIN connector for stereo, tied to the bottommost GPU. In order to synchronize onboard stereo on the other GPU you must use a G-Sync device.

My X server fails to start, and my X log file contains the error:

(EE) NVIDIA(0): Failed to load the NVIDIA kernel module!

The X driver will abort with this error message if the NVIDIA kernel module fails to load. If you receive this error, you should check the output of dmesg for kernel error messages and/or attempt to load the kernel module explicitly with modprobe nvidia. If unresolved symbols are reported, then the kernel module was most likely built against a Linux kernel source tree (or kernel headers) for a kernel revision or configuration that doesn't match the running kernel.

You can specify the location of the kernel source tree (or headers) when you install the NVIDIA driver using the --kernel-source-path command line option (see sh NVIDIA-Linux-x86_64-96.43.01-pkg1.run --advanced-options for details).

Old versions of the module-init-tools include modprobe binaries that report an error when instructed to load a module that's already loaded into the kernel. Please upgrade your module-init-tools if you receive an error message to this effect.

The X server reads /proc/sys/kernel/modprobe to determine the path to the modprobe utility and falls back to /sbin/modprobe if the file doesn't exist. Please make sure that this path is valid and refers to a modprobe binary compatible with the Linux kernel running on your system.

The "LoadKernelModule" X driver option can be used to change the default behavior and disable kernel module auto-loading.

Installing the NVIDIA kernel module gives an error message like:

#error Modules should never use kernel-headers system headers
#error but headers from an appropriate kernel-source

You need to install the source for the Linux kernel. In most situations you can fix this problem by installing the kernel-source or kernel-devel package for your distribution

OpenGL applications crash and print out the following warning:

WARNING: Your system is running with a buggy dynamic loader.
This may cause crashes in certain applications.  If you
experience crashes you can try setting the environment
variable __GL_SINGLE_THREADED to 1.  For more information
please consult the FREQUENTLY ASKED QUESTIONS section in
the file /usr/share/doc/NVIDIA_GLX-1.0/README.txt.

The dynamic loader on your system has a bug which will cause applications linked with pthreads, and that dlopen() libGL multiple times, to crash. This bug is present in older versions of the dynamic loader. Distributions that shipped with this loader include but are not limited to Red Hat Linux 6.2 and Mandrake Linux 7.1. Version 2.2 and later of the dynamic loader are known to work properly. If the crashing application is single threaded then setting the environment variable __GL_SINGLE_THREADED to 1 will prevent the crash. In the bash shell you would enter:

    % export __GL_SINGLE_THREADED=1

and in csh and derivatives use:

    % setenv __GL_SINGLE_THREADED 1

Previous releases of the NVIDIA Accelerated Linux Driver Set attempted to work around this problem. Unfortunately, the workaround caused problems with other applications and was removed after version 1.0-1541.

Quake3 crashes when changing video modes.

You are probably experiencing a problem described above. Please check the text output for the “WARNING” message described in the previous hint. Setting __GL_SINGLE_THREADED to 1 as will fix the problem.

I cannot build the NVIDIA kernel module, or, I can build the NVIDIA kernel module, but modprobe/insmod fails to load the module into my kernel. What is wrong?

These problems are generally caused by the build using the wrong kernel header files (i.e. header files for a different kernel version than the one you are running). The convention used to be that kernel header files should be stored in /usr/include/linux/, but that is deprecated in favor of /lib/modules/RELEASE/build/include (where RELEASE is the result of uname -r. The nvidia-installer should be able to determine the location on your system; however, if you encounter a problem you can force the build to use certain header files by using the --kernel-include-dir option. For this to work you will of course need the appropriate kernel header files installed on your system. Consult the documentation that came with your distribution; some distributions do not install the kernel header files by default, or they install headers that do not coincide properly with the kernel you are running.

There are problems running Heretic II.

Heretic II installs, by default, a symlink called libGL.so in the application directory. You can remove or rename this symlink, since the system will then find the default libGL.so (which our drivers install in /usr/lib). From within Heretic II you can then set your render mode to OpenGL in the video menu. There is also a patch available to Heretic II from lokigames at: http://www.lokigames.com/products/heretic2/updates.php3/

My system hangs when switching to a virtual terminal if I have rivafb enabled.

Using both rivafb and the NVIDIA kernel module at the same time is currently broken. In general, using two independent software drivers to drive the same piece of hardware is a bad idea.

Compiling the NVIDIA kernel module gives this error:

You appear to be compiling the NVIDIA kernel module with
a compiler different from the one that was used to compile
the running kernel. This may be perfectly fine, but there
are cases where this can lead to unexpected behavior and
system crashes.

If you know what you are doing and want to override this
check, you can do so by setting IGNORE_CC_MISMATCH.

In any other case, set the CC environment variable to the
name of the compiler that was used to compile the kernel.

You should compile the NVIDIA kernel module with the same compiler version that was used to compile your kernel. Some Linux kernel data structures are dependent on the version of gcc used to compile it; for example, in include/linux/spinlock.h:

        ...
        * Most gcc versions have a nasty bug with empty initializers.
        */
        #if (__GNUC__ > 2)
          typedef struct { } rwlock_t;
          #define RW_LOCK_UNLOCKED (rwlock_t) { }
        #else
          typedef struct { int gcc_is_buggy; } rwlock_t;
          #define RW_LOCK_UNLOCKED (rwlock_t) { 0 }
        #endif

If the kernel is compiled with gcc 2.x, but gcc 3.x is used when the kernel interface is compiled (or vice versa), the size of rwlock_t will vary, and things like ioremap will fail. To check what version of gcc was used to compile your kernel, you can examine the output of:

    % cat /proc/version

To check what version of gcc is currently in your $PATH, you can examine the output of:

    % gcc -v

X fails with error

Failed to allocate LUT context DMA

This is one of the possible consequences of compiling the NVIDIA kernel interface with a different gcc version than used to compile the Linux kernel (see above).

I recently updated various libraries on my system using my Linux distributor's update utility, and the NVIDIA graphics driver no longer works.

Conflicting libraries may have been installed by your distribution's update utility; please see Appendix C, Installed Components for details on how to diagnose this.

I have rebuilt the NVIDIA kernel module, but when I try to insert it, I get a message telling me I have unresolved symbols.

Unresolved symbols are most often caused by a mismatch between your kernel sources and your running kernel. They must match for the NVIDIA kernel module to build correctly. Please make sure your kernel sources are installed and configured to match your running kernel.

How do I tell if I have my kernel sources installed?

If you are running on a distro that uses RPM (Red Hat, Mandrake, SuSE, etc), then you can use rpm to tell you. At a shell prompt, type:

    % rpm -qa | grep kernel

and look at the output. You should see a package that corresponds to your kernel (often named something like kernel-2.6.15-7) and a kernel source package with the same version (often named something like kernel-devel-2.6.15-7 or kernel-source-2.4.18-3). If none of the lines seem to correspond to a source package, then you will probably need to install it. If the versions listed mismatch (e.g., kernel-2.6.15-7 vs. kernel-devel-2.6.15-10), then you will need to update the kernel-devel package to match the installed kernel. If you have multiple kernels installed, you need to install the kernel-devel package that corresponds to your running kernel (or make sure your installed source package matches the running kernel). You can do this by looking at the output of uname -r and matching versions.

I am unable to load the NVIDIA kernel module that I compiled for the Red Hat Linux 7.3 2.4.18-3bigmem kernel.

The kernel header files Red Hat Linux distributes for Red Hat Linux 7.3 2.4.18-3bigmem kernel are misconfigured. NVIDIA's precompiled kernel module for this kernel can be loaded, but if you want to compile the NVIDIA kernel interface files yourself for this kernel, then you will need to perform the following:

    # cd /lib/modules/`uname -r`/build/
    # make mrproper
    # cp configs/kernel-2.4.18-i686-bigmem.config .config
    # make oldconfig dep

Note: Red Hat Linux ships kernel header files that are simultaneously configured for ALL of their kernels for a particular distribution version. A header file generated at boot time sets up a few parameters that select the correct configuration. Rebuilding the kernel headers with the above commands will create header files suitable for the Red Hat Linux 7.3 2.4.18-3bigmem kernel configuration only, thus making the header files for the other configurations unusable.

OpenGL applications leak significant amounts of memory on my system!

If your kernel is making use of the -rmap VM, the system may be leaking memory due to a memory management optimization introduced in -rmap14a. The -rmap VM has been adopted by several popular distributions, the memory leak is known to be present in some of the distribution kernels; it has been fixed in -rmap15e.

If you suspect that your system is affected, please try upgrading your kernel or contact your distribution's vendor for assistance.

Some OpenGL applications (like Quake3 Arena) crash when I start them on Red Hat Linux 9.0.

Some versions of the glibc package shipped by Red Hat that support TLS do not properly handle using dlopen() to access shared libraries which use some TLS models. This problem is exhibited, for example, when Quake3 Area dlopen()'s NVIDIA's libGL library. Please obtain at least glibc-2.3.2-11.9 which is available as an update from Red Hat.

I have installed the driver, but my Enable 3D Acceleration checkbox is still grayed out.

Most distribution-provided configuration applets are not aware of the NVIDIA accelerated driver, and consequently will not update themselves when you install the driver. Your driver, if it has been installed properly, should function fine.

X does not restore the VGA console when run on a TV. I get this error message in my X log file:

Unable to initialize the X int10 module; the console may not be
restored correctly on your TV.

The NVIDIA X driver uses the X Int10 module to save and restore console state on TV out, and will not be able to restore the console correctly if it cannot use the Int10 module. If you have built the X server yourself, please be sure you have built the Int10 module. If you are using a build of the X server provided by a Linux distribution, and are missing the Int10 module, please contact your distributor.

When changing settings in games like Quake 3 Arena, or Wolfenstein Enemy Territory, the game crashes and I see this error:

...loading libGL.so.1: QGL_Init: dlopen libGL.so.1 failed: 
/usr/lib/tls/libGL.so.1: shared object cannot be dlopen()ed:
static TLS memory too small

These games close and reopen the NVIDIA OpenGL driver (via dlopen()/dlclose()) when settings are changed. On some versions of glibc (such as the one shipped with Red Hat Linux 9), there is a bug that leaks static TLS entries. This glibc bug causes subsequent re-loadings of the OpenGL driver to fail. This is fixed in more recent versions of glibc; please see Red Hat bug #89692: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=89692

X crashes during startx, and my X log file contains this error message:

(EE) NVIDIA(0): Failed to obtain a shared memory identifier.

The NVIDIA OpenGL driver and the NVIDIA X driver require shared memory to communicate; you must have CONFIG_SYSVIPC enabled in your kernel.

When I try to install the driver, the installer claims that X is running, even though I have exited X.

The installer detects the presence of an X server by checking for X's lock files: /tmp/.Xn-lock, where 'n' is the number of the X Display (the installer checks for X Displays 0-7). If you have exited X, but one of these files has been left behind, then you will need to manually delete the lock file. Do not remove this file if X is still running!

My system runs, but seems unstable. What is wrong?

Your stability problems may be AGP-related. See Appendix F, Configuring AGP for details.

OpenGL applications are running slowly

The application is probably using a different library still on your system, rather than the NVIDIA supplied OpenGL library. Please see Appendix C, Installed Components for details.

There are problems running Quake2.

Quake2 requires some minor setup to get it going. First, in the Quake2 directory, the install creates a symlink called libGL.so that points at libMesaGL.so. This symlink should be removed or renamed. Second, in order to run Quake2 in OpenGL mode, you must type

    % quake2 +set vid_ref glx +set gl_driver libGL.so

Quake2 does not seem to support any kind of full-screen mode, but you can run your X server at the same resolution as Quake2 to emulate full-screen mode.

I am using either nForce of nForce2 internal graphics, and I see warnings like this in my X log file:

Not using mode "1600x1200" (exceeds valid memory bandwidth usage)

Integrated graphics have more strict memory bandwidth limitations that limit the resolution and refresh rate of the modes you request. To work around this, you can reduce the maximum refresh rate by lowering the upper value of the VertRefresh range in the Monitor section of your X config file. Though not recommended, you can disable the memory bandwidth test with the NoBandWidthTest X config file option.

X takes a long time to start (possibly several minutes).

Most of the X startup delay problems we have found are caused by incorrect data in video BIOSes about what display devices are possibly connected or what i2c port should be used for detection. You can work around these problems with the X config option IgnoreDisplayDevices (please see the description in Appendix D, X Config Options).

Fonts are incorrectly sized after installing the NVIDIA driver.

Incorrectly sized fonts are generally caused by incorrect DPI (Dots Per Inch) information. You can check what X thinks the physical size of your monitor is, by running:

 % xdpyinfo | grep dimensions

This will report the size in pixels, and in millimeters.

If these numbers are wrong, you can correct them by modifying the X server's DPI setting. See Appendix Y, Dots Per Inch for details.

General problems with ALi chipsets

There are some known timing and signal integrity issues on ALi chipsets. The following tips may help stabilize problematic ALI systems:

  • Disable TURBO AGP MODE in the BIOS.

  • When using a P5A upgrade to BIOS Revision 1002 BETA 2.

  • When using 1007, 1007A or 1009 adjust the IO Recovery Time to 4 cycles.

  • AGP is disabled by default on some ALi chipsets (ALi1541, ALi1647) to work around severe system stability problems with these chipsets. See the comments for NVreg_EnableALiAGP in os-registry.c to force AGP on anyway.