Increased reliability on Intel GPUs via software diverse redundancy

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Cita com:

Correu electrònic de l'autor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Projecte Final de Màster Oficial

Condicions d'accés

Accés obert

Llicència

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

In the past decade, Artificial Intelligence has revolutionized various industries, including automotive, avionics, and health sectors. The installation of Advanced Driver Assistance Systems (ADAS) is now a reality, with the goal of achieving fully self-driving cars (SDCs) in the near future. ADAS and Autonomous Driving (AD) systems require processing vast amounts of data at high frequency using complex algorithms (Deep Learning (DL)) to meet tight time constraints (Real Time (RT)). Traditional computing has become a bottleneck, with CPUs unable to handle the data efficiently. High-performance GPUs have partially fulfilled these timing constraints, leading to continuous innovation in device performance and efficiency. For example, Nvidia introduced the Jetson AGX Xavier SoC in 2017, designed for machine learning applications in the automotive sector. However, AD and ADAS challenges also involve safety constraints, such as functional safety. Redundancy is necessary for identifying and correcting erroneous outcomes. To ensure high safety levels, diverse redundancy is used to avoid common cause faults (CCF). High-performance hardware for AD must be verified and validated (V&V) to ensure safety goals, but these processes can be costly. The automotive industry seeks to avoid non-recurring costs by using commercial off-the-shelf products (COTS). However, COTS devices have drawbacks, including limited redundancy and guarded implementation details. Researchers are developing software-only diverse redundancy solutions on top of COTS devices to overcome these limitations. Two main challenges are ensuring redundant computation for error detection and guaranteeing diverse redundancy to detect errors even when they affect all replicas. Current solutions are limited and mostly focused on NVIDIA GPUs. This thesis presents a software-only solution for diverse redundancy on Intel GPUs, providing strong diversity guarantees for the first time. Built on OpenCL, a hardware-agnostic programming language, the technique relies on intrinsics-special functions optimized by integrators. The intrinsics enable identifying hardware threads on the GPU and smart tailoring of workload geometry and allocation to specific computing elements. As a result, redundant threads use physically diverse execution units, meeting diverse redundancy requirements with affordable performance overheads. Several scenarios are developed to measure the impact of modifications to a standard OpenCL kernel execution. First, allocating only half of the available GPU resources; then, overriding the scheduler to use half of the resources; next, duplicating the work to mimic two kernel execution; and finally, executing both kernels in independent parts of the GPU.

Descripció

Provinença

Titulació

MÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)

Document relacionat

Citació

Ajut

DOI

Versió de l'editor

Altres identificadors

Referències