Search by Tags

Linux (Apalis T30) - Using a Nvidia CUDA Graphics Card with Apalis T30

 

Tags

Compare with Revision

Subscribe for this article updates

With the public availability of the CUDA Toolkit 5.5 Release Candidate (RC) ARM platforms are now supported out-of-the-box.

We are still testing CUDA 5.5, however you can try CUDA 5.0 on Apalis T30 as explained further down.


  • Apalis T30 CUDA Top View

    Apalis T30 CUDA Top View


  • Apalis T30 CUDA Perspective View

    Apalis T30 CUDA Perspective View

Prerequisite

  • Apalis T30 2GB Module
  • Apalis Evaluation Board
  • Apalis T30 Mezzanine (type specific extension board featuring a PCIe x16(@x4) slot)
  • NVIDIA NVS 310 (CUDA graphics card)

Install CUDA 5.0 onto an Apalis T30

First download and extract our latest Apalis T30 Embedded Linux BSP:

wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Images/Apalis_T30_LinuxImageV2.0Beta2_20130816.tar.bz2
sudo tar xjvf Apalis_T30_LinuxImageV2.0Beta2_20130816.tar.bz2

As a next step download and extract the Apalis T30 CUDA package (please note that this will complement/overwrite files from previously extracted regular BSP package):

wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/Apalis_T30_LinuxImageV2.0-CUDA_5.0_v1.1.tar.bz2
sudo tar xjvf Apalis_T30_LinuxImageV2.0-CUDA_5.0_v1.1.tar.bz2

Now flash it as usual:

cd Apalis_T30_LinuxImageV2.0
./update.sh

As a final step on the Apalis T30 target itself update the X-server with a Xinerama enabled version and reboot:

wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg_1.11.2-r11_armv7ahf-vfp.ipk
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/libdrm2_2.4.39-r3.0_armv7ahf-vfp.ipk
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg-extension-dri2_1.11.2-r11_armv7ahf-vfp.ipk
wget -c http://developer.toradex.com/files/toradex-dev/uploads/media/Colibri/Linux/Extra/xserver-xorg-module-libwfb_1.11.2-r11_armv7ahf-vfp.ipk
opkg install xserver-xorg_1.11.2-r11_armv7ahf-vfp.ipk libdrm2_2.4.39-r3.0_armv7ahf-vfp.ipk xserver-xorg-extension-dri2_1.11.2-r11_armv7ahf-vfp.ipk xserver-xorg-module-libwfb_1.11.2-r11_armv7ahf-vfp.ipk
reboot

Samples

Some samples can be found in the home directory:

root@apalis-t30:~# ls CUDA-5.0_samples/
BlackScholes               data                       result.dat
FDTD3d                     dct8x8                     scalarProd
FlowCPU.flo                deviceQuery                scan
FlowGPU.flo                deviceQueryDrv             segmentationTreeThrust
FunctionPointers           dwtHaar1D                  shfl_scan
HSOpticalFlow              dxtc                       simpleAssert
MC_EstimatePiInlineP       eigenvalues                simpleAtomicIntrinsics
MC_EstimatePiInlineQ       fastWalshTransform         simpleCUBLAS
MC_EstimatePiP             fluidsGL                   simpleCUFFT
MC_EstimatePiQ             histogram                  simpleCallback
MC_SingleAsianOptionP      imageDenoising             simpleCubemapTexture
Mandelbrot                 inlinePTX                  simpleDevLibCUBLAS
MersenneTwisterGP11213     interval                   simpleGL
MonteCarloMultiGPU         level_00.ppm               simpleHyperQ
SobelFilter                level_01.ppm               simpleIPC
SobolQRNG                  level_02.ppm               simpleLayeredTexture
alignedTypes               level_03.ppm               simpleMultiCopy
asyncAPI                   level_04.ppm               simpleMultiGPU
bandwidthTest              level_05.ppm               simpleP2P
barbara_cuda1.bmp          level_06.ppm               simplePitchLinearTexture
barbara_cuda2.bmp          level_07.ppm               simplePrintf
barbara_cuda_short.bmp     level_08.ppm               simpleSeparateCompilation
barbara_gold1.bmp          level_09.ppm               simpleStreams
barbara_gold2.bmp          lineOfSight                simpleSurfaceWrite
batchCUBLAS                marchingCubes              simpleTemplates
bicubicTexture             matrixMul                  simpleTexture
bilateralFilter            matrixMulCUBLAS            simpleTexture3D
bindlessTexture            matrixMulDrv               simpleTextureDrv
binomialOptions            matrixMulDynlinkJIT        simpleVoteIntrinsics
boxFilter                  mergeSort                  simpleZeroCopy
cdpAdvancedQuicksort       nbody                      smokeParticles
cdpLUDecomposition         newdelete                  sortingNetworks
cdpQuadtree                oceanFFT                   stereoDisparity
cdpSimplePrint             output.pgm                 template
cdpSimpleQuicksort         output_CPU.pgm             template_runtime
clock                      output_GPU.pgm             threadFenceReduction
concurrentKernels          particles                  threadMigration
conjugateGradient          postProcessGL              transpose
conjugateGradientPrecond   ptxjit                     vectorAdd
convolutionFFT2D           quasirandomGenerator       vectorAddDrv
convolutionSeparable       radixSortThrust            volumeFiltering
convolutionTexture         randomFog                  volumeRender
cppIntegration             recursiveGaussian
cudaOpenMP                 reduction
root@apalis-t30:~# xrandr --output DP-0 --mode 1920x1080 --right-of DP-1
root@apalis-t30:~# cd CUDA-5.0_samples/
root@apalis-t30:~/CUDA-5.0_samples# ./FunctionPointers
./FunctionPointers Starting...

Reading image: lena.pgm
I: display Image (no filtering)
T: display Sobel Edge Detection (Using Texture)
S: display Sobel Edge Detection (Using SMEM+Texture)
Use the '-' and '=' keys to change the brightness.
b: switch block filter operation (Mean/Sobel)
p: switch point filter operation (Threshold ON/OFF)

root@apalis-t30:~/CUDA-5.0_samples# ./Mandelbrot
[CUDA Mandelbrot/Julia Set] - Starting...
> Device 0: <         NVS 310 >, Compute SM 2.1 detected
GPU Device 0: "NVS 310" with compute capability 2.1

Data initialization done.
Initializing GLUT...
Loading extensions: No error
OpenGL window created.
Starting GLUT main loop...

Press [s] to toggle between GPU and CPU implementations
Press [j] to toggle between Julia and Mandelbrot sets
Press [r] or [R] to decrease or increase red color channel
Press [g] or [G] to decrease or increase green color channel
Press [b] or [B] to decrease or increase blue color channel
Press [e] to reset
Press [a] or [A] to animate colors
Press [c] or [C] to change colors
Press [d] or [D] to increase or decrease the detail
Press [p] to record main parameters to file params.txt
Press [o] to read main parameters from file params.txt
Left mouse button + drag = move (Mandelbrot or Julia) or animate (Julia)
Press [m] to toggle between move and animate (Julia) for left mouse button
Middle mouse button + drag = Zoom
Right mouse button = Menu
Press [?] to print location and scale
Press [q] to exit

Creating GL texture...
Texture created.
Creating PBO...
PBO created.

root@apalis-t30:~/CUDA-5.0_samples# ./SobelFilter
CUDA Sobel Edge-Detection Starting...

Reading image: lena.pgm
I: display Image (no filtering)
T: display Sobel Edge Detection (Using Texture)
S: display Sobel Edge Detection (Using SMEM+Texture)
Use the '-' and '=' keys to change the brightness.

root@apalis-t30:~/CUDA-5.0_samples# ./bicubicTexture
Starting bicubicTexture
[CUDA BicubicTexture] (OpenGL Mode)
CUDA device [NVS 310] has 1 Multi-Processors
Loaded 'lena_bw.pgm', 512 x 512 pixels

        Controls
        =/- : Zoom in/out
        b   : Run Benchmark g_FilterMode
        c   : Draw Bicubic Spline Curve
        [esc] - Quit

        Press number keys to change filtering g_FilterMode:

        1 : nearest filtering
        2 : bilinear filtering
        3 : bicubic filtering
        4 : fast bicubic filtering
        5 : Catmull-Rom filtering


root@apalis-t30:~/CUDA-5.0_samples# ./bilateralFilter
./bilateralFilter Starting...

Loading ./data/nature_monte.bmp...
BMP width: 640
BMP height: 480
BMP file loaded successfully!
Loaded './data/nature_monte.bmp', 640 x 480 pixels


Found 1 CUDA Capable device(s) supporting CUDA

Device 0: "NVS 310"
  CUDA Runtime Version     :    5.0
  CUDA Compute Capability  :    2.1

Found CUDA Capable Device 0: "NVS 310"
Setting active device to 0
Using device 0: NVS 310
Running Standard Demonstration with GLUT loop...

Press '+' and '-' to change filter width
Press ']' and '[' to change number of iterations
Press 'e' and 'E' to change Euclidean delta
Press 'g' and 'G' to changle Gaussian delta
Press 'a' or  'A' to change Animation mode ON/OFF


root@apalis-t30:~/CUDA-5.0_samples# ./boxFilter
./boxFilter Starting...

Loaded './data/lenaRGB.ppm', 1024 x 1024 pixels

Found 1 CUDA Capable device(s) supporting CUDA

Device 0: "NVS 310"
  CUDA Runtime Version     :    5.0
  CUDA Compute Capability  :    2.1

Found CUDA Capable Device 0: "NVS 310"
Setting active device to 0
Running Standard Demonstration with GLUT loop...

Press '+' and '-' to change filter width
Press ']' and '[' to change number of iterations
Press 'a' or  'A' to change animation ON/OFF


root@apalis-t30:~/CUDA-5.0_samples# ./imageDenoising
CUDA ImageDenoising Starting...

[CUDA ImageDenoising]
Allocating host and CUDA memory and loading image file...
Loading ./data/portrait_noise.bmp...
BMP width: 320
BMP height: 408
BMP file loaded successfully!
Data init done.
Initializing GLUT...
OpenGL window created.
Loading extensions: No error
Creating GL texture...
Texture created.
Creating PBO...
PBO created.
Starting GLUT main loop...
Press [1] to view noisy image
Press [2] to view image restored with knn filter
Press [3] to view image restored with nlm filter
Press [4] to view image restored with modified nlm filter
Press [ ] to view smooth/edgy areas [RED/BLUE] Ct's
Press [f] to print frame rate
Press [?] to print Noise and Lerp Ct's
Press [q] to exit

root@apalis-t30:~/CUDA-5.0_samples# ./nbody
Run "nbody -benchmark [-numbodies=]" to measure perfomance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=    (number of bodies (>= 1) to run in simulation)
        -device=       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy= (load a tipsy model file for simulation)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
> Compute 2.1 CUDA device: [NVS 310]

root@apalis-t30:~/CUDA-5.0_samples# ./oceanFFT
[CUDA FFT Ocean Simulation]

Left mouse button          - rotate
Middle mouse button        - pan
Right mouse button         - zoom
'w' key                    - toggle wireframe
[CUDA FFT Ocean Simulation]

root@apalis-t30:~/CUDA-5.0_samples# ./particles
CUDA Particles Simulation Starting...

grid: 64 x 64 x 64 = 262144 cells
particles: 16384

root@apalis-t30:~/CUDA-5.0_samples# ./postProcessGL
./postProcessGL Starting...

(Interactive OpenGL Demo)
   OpenGL device is Available
Creating a Texture render target GL_RGBA16F_ARB
Shader compilation error: Fragment info
-------------
0(4) : warning C7533: global variable gl_Color is deprecated after version 120


        Controls
        (right click mouse button for Menu)
        [ ] : Toggle CUDA Post Processing (on/off)
        [a] : Toggle Animation (on/off)
        [=] : Increase Blur Radius
        [-] : Decrease Blur Radius
        [esc] - Quit


root@apalis-t30:~/CUDA-5.0_samples# ./randomFog
Random Fog
==========

CURAND initialized

Random number visualization

On creation, randomFog generates 200,000 random coordinates in spherical coordin
ate space (radius, angle rho, angle theta) with curand's XORWOW algorithm. The c
oordinates are normalized for a uniform distribution through the sphere.

The X axis is drawn with blue in the negative direction and yellow positive.
The Y axis is drawn with green in the negative direction and magenta positive.
The Z axis is drawn with red in the negative direction and cyan positive.

The following keys can be used to control the output:

        s         Generate a new set of random numbers and display as spherical coordinates (Sphere)
        e         Generate a new set of random numbers and display on a spherical surface (shEll)
        b         Generate a new set of random numbers and display as cartesian coordinates (cuBe/Box)
        p         Generate a new set of random numbers and display on a cartesian plane (Plane)

        i,l,j     Rotate the negative Z-axis up, right, down and left respectively
        a         Toggle auto-rotation
        t         Toggle 10x zoom
        z         Toggle axes display

        x         Select XORWOW generator (default)
        c         Select Sobol' generator
        v         Select scrambled Sobol' generator
        r         Reset XORWOW (i.e. reset to initial seed) and regenerate
        ]         Increment the number of Sobol' dimensions and regenerate
        [         Reset the number of Sobol' dimensions to 1 and regenerate

        +         Increment the number of displayed points by 8,000 (up to maximum 200,000)
        -         Decrement the number of displayed points by 8,000 (down to minimum 8,000)

        q/[ESC]   Quit the application.


root@apalis-t30:~/CUDA-5.0_samples# ./recursiveGaussian
CUDA Recursive Gaussian Starting...

Loaded './data/lena.ppm', 512 x 512 pixels
Press '+' and '-' to change filter width
0, 1, 2 - change filter order

root@apalis-t30:~/CUDA-5.0_samples# ./simpleGL
simpleGL (VBO) starting...


root@apalis-t30:~/CUDA-5.0_samples# ./simpleTexture3D
simpleTexture3D Starting...

Read './data/Bucky.raw', 32768 bytes
Press space to toggle animation
Press '+' and '-' to change displayed slice

root@apalis-t30:~/CUDA-5.0_samples# ./smokeParticles
CUDA Smoke Particles Starting...

Loaded './data/floortile.ppm', 256 x 256 pixels

root@apalis-t30:~/CUDA-5.0_samples# ./volumeFiltering
CUDA 3D Volume Filtering Starting...

Found 1 CUDA Capable Device(s).

Device 0: "NVS 310"
  CUDA Runtime Version     :    5.0
  CUDA Compute Capability  :    2.1

Found CUDA Capable Device 0: "NVS 310"
Setting active device to 0
Read './data/Bucky.raw', 32768 bytes
Press
  'SPACE'     to toggle animation
  'p'         to toggle pre-integrated transfer function
  '+' and '-' to change density (0.01 increments)
  ']' and '[' to change brightness
  ';' and ''' to modify transfer function offset
  '.' and ',' to modify transfer function scale


root@apalis-t30:~/CUDA-5.0_samples# ./volumeRender
CUDA 3D Volume Render Starting...

Read './data/Bucky.raw', 32768 bytes
Press '+' and '-' to change density (0.01 increments)
      ']' and '[' to change brightness
      ';' and ''' to modify transfer function offset
      '.' and ',' to modify transfer function scale

Graphics Card Information

Some information about the graphics card used.

root@apalis-t30:~/CUDA-5.0_samples# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVS 310"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 512 MBytes (536543232 bytes)
  ( 1) Multiprocessors x ( 48) CUDA Cores/MP:    48 CUDA Cores
  GPU Clock rate:                                1046 MHz (1.05 GHz)
  Memory Clock rate:                             875 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 65536 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       No
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = NVS 310

Information about Graphics Driver

Some information about the graphics driver used.

root@apalis-t30:~# modinfo nvidia
filename:       /lib/modules/3.1.10-carma/kernel/drivers/video/nvidia.ko
alias:          char-major-195-*
version:        313.24
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:
vermagic:       3.1.10-carma SMP preempt mod_unload ARMv7
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_RMEdgeIntrCheck:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp

  • Apalis T30 CUDA Samples Desktop

    Apalis T30 CUDA Samples Desktop