Skip to content

Latest commit

 

History

History
288 lines (231 loc) · 8.18 KB

File metadata and controls

288 lines (231 loc) · 8.18 KB

Ingenic ISP Fatal Exception Interrupt Debug Guide

Overview

This guide provides a systematic approach to debugging "fatal exception in interrupt" crashes on Ingenic 3.10 kernel systems, specifically for the TX-ISP VIC (Video Input Controller) driver.

Root Cause Analysis

Primary Causes of Fatal Exception in Interrupt

  1. NULL pointer dereference in hard-IRQ context
  2. Unaligned memory access on MIPS architecture
  3. Illegal operations in hard-IRQ context (sleep/mutex/I²C)
  4. IRQ storm from improper interrupt acknowledgment
  5. Dangerous offset-based pointer arithmetic from decompiled code

The Specific Crash We Fixed

The crash trace showed:

Code: 24130002  1080000c  26310004 <8c8200c4> 10400009  00000000  8c420020  10400006  00003025

The key instruction <8c8200c4> is a MIPS lw (load word) instruction trying to access offset 0xc4 from a base pointer. This was happening because the decompiled code was using dangerous offset arithmetic like:

// DANGEROUS - from decompiled code
void* v0_6 = **($a0_1 + 0xc4);  // Accessing offset 0xc4

Debugging Methodology

0) Make the panic observable

# Kernel cmdline
console=ttyS0,115200 loglevel=8 ignore_loglevel earlyprintk=serial,ttyS0,115200

# While debugging
sysctl -w kernel.panic_on_oops=0  # Don't auto-reboot so you see the trace

# If you have vmlinux
addr2line -e vmlinux 0x<oops-ip>  # Find exact source line

1) Identify the crash location

  • Look for the faulting instruction in the crash trace
  • Instructions like <8c8200c4> indicate load from offset 0xc4
  • This suggests dangerous offset-based pointer arithmetic

2) Find the dangerous code patterns

Search for these patterns in interrupt handlers:

// DANGEROUS PATTERNS TO FIX:
*(ptr + 0xc4)           // Direct offset access
*((void**)((char*)ptr + 0xc4))  // Cast and offset
arg1[0x35]              // Array-style offset access

3) Replace with safe struct member access

// BEFORE (dangerous):
void* v0_6 = *((void**)((char*)a0_1 + 0xc4));

// AFTER (safe):
struct tx_isp_vic_device *vic_dev = container_of(isp_dev->vic_dev, struct tx_isp_vic_device, sd);

Fixes Implemented

1. VIC Interrupt Handler (tx_isp_vic.c)

Problem: Windows-specific __try/__except syntax and unsafe register access

Fix:

  • Removed Windows-specific exception handling
  • Added comprehensive pointer validation
  • Used safe register access with bounds checking
/* CRASH-SAFE VIC interrupt handler */
static irqreturn_t isp_vic_interrupt_service_routine(int irq, void *dev_id)
{
    /* BULLETPROOF: Validate EVERYTHING before touching anything */
    if (!sd) {
        printk(KERN_ERR "VIC_IRQ: NULL subdev\n");
        return IRQ_HANDLED;
    }
    
    if ((unsigned long)sd >= 0xfffff001) {
        printk(KERN_ERR "VIC_IRQ: Invalid subdev pointer 0x%lx\n", (unsigned long)sd);
        return IRQ_HANDLED;
    }
    
    // ... safe register access with validation
}

2. ISP Core Interrupt Handler (tx_isp_core.c)

Problem: Dangerous offset arithmetic accessing 0xc4 causing the crash

Fix: Replaced all offset arithmetic with safe struct member access

/* BEFORE (dangerous - caused the crash): */
irqreturn_t isp_irq_handle(int irq, void *dev_id)
{
    // Binary Ninja: void* $v0_6 = **($a0_1 + 0xc4)
    v0_6 = *((void**)((char*)a0_1 + 0xc4));  // <-- THIS CAUSED THE CRASH
}

/* AFTER (safe): */
irqreturn_t isp_irq_handle(int irq, void *dev_id)
{
    /* SAFE: Check if we have a VIC device using proper struct member access */
    if (isp_dev->vic_dev) {
        /* SAFE: Get VIC device using proper container_of */
        struct tx_isp_vic_device *vic_dev = container_of(isp_dev->vic_dev, struct tx_isp_vic_device, sd);
        
        /* SAFE: Call the VIC interrupt handler directly */
        irqreturn_t vic_result = isp_vic_interrupt_service_routine(irq, isp_dev->vic_dev);
    }
}

3. Comprehensive Pointer Validation

Added bulletproof validation in all interrupt handlers:

/* Validate EVERYTHING before touching anything */
if (!dev_id) {
    printk(KERN_ERR "ISP_IRQ: NULL dev_id\n");
    return IRQ_HANDLED;
}

if ((unsigned long)dev_id >= 0xfffff001) {
    printk(KERN_ERR "ISP_IRQ: Invalid dev_id pointer 0x%lx\n", (unsigned long)dev_id);
    return IRQ_HANDLED;
}

4. Safe Register Access Pattern

/* SAFE: Handle any core ISP interrupts */
if (isp_dev->core_regs) {
    /* Read and clear any core ISP interrupt status */
    u32 core_status = readl(isp_dev->core_regs + 0x800);
    if (core_status & 0x1) {
        /* Clear core interrupt */
        writel(0x1, isp_dev->core_regs + 0x800);
        wmb();
    }
}

Key Principles for Safe Interrupt Handlers

1. Always Validate Pointers

if (!ptr || (unsigned long)ptr >= 0xfffff001) {
    printk(KERN_ERR "Invalid pointer\n");
    return IRQ_HANDLED;
}

2. Use Proper Struct Member Access

// GOOD: Use struct members
vic_dev->vic_regs

// BAD: Use offset arithmetic  
*((void**)((char*)vic_dev + 0xb8))

3. Use container_of for Safe Casting

// GOOD: Safe container_of
struct tx_isp_vic_device *vic_dev = container_of(subdev, struct tx_isp_vic_device, sd);

// BAD: Dangerous casting with offsets
struct tx_isp_vic_device *vic_dev = (struct tx_isp_vic_device *)((char*)subdev - 0x100);

4. Clear Interrupts FIRST

/* CRITICAL: Clear interrupts FIRST using write-1-to-clear pattern */
if (isr_main || isr_mdma) {
    writel(isr_main, vic_base + 0x1e0);   /* Clear main interrupts */
    writel(isr_mdma, vic_base + 0x1e4);   /* Clear MDMA interrupts */
    wmb();
}

5. Minimal Processing in Hard-IRQ Context

/* MINIMAL processing to avoid crashes */
if (isr_main & 1) {
    /* Frame done - just increment counter */
    vic_dev->frame_count++;
    printk(KERN_INFO "VIC_IRQ: Frame done #%d\n", vic_dev->frame_count);
}

Testing and Verification

1. Enable Debug Logging

#define pr_debug printk  // Force debug messages to appear

2. Monitor Interrupt Flow

# Watch interrupt counts
watch -n 1 'cat /proc/interrupts | grep -E "(37|38)"'

# Monitor kernel messages
dmesg -w | grep -E "(VIC_IRQ|ISP_IRQ)"

3. Stress Test

# Generate continuous interrupts
while true; do
    echo 1 > /sys/class/video4linux/video0/streaming
    sleep 1
    echo 0 > /sys/class/video4linux/video0/streaming
    sleep 1
done

Common Pitfalls to Avoid

1. Don't Use Decompiled Code Directly

  • Decompiled code uses dangerous offset arithmetic
  • Always convert to proper struct member access
  • Validate all pointer operations

2. Don't Sleep in Interrupt Context

// BAD: These will cause "scheduling while atomic"
msleep(10);
mutex_lock(&lock);
i2c_transfer();

// GOOD: Use atomic operations
atomic_inc(&counter);
spin_lock_irqsave(&lock, flags);

3. Don't Ignore Interrupt Clearing

// BAD: Process first, clear later
process_interrupt();
clear_interrupt();

// GOOD: Clear first, process later
clear_interrupt();
process_interrupt();

4. Don't Use Complex Logic in Hard-IRQ

// BAD: Complex processing in hard-IRQ
if (interrupt_status) {
    complex_buffer_management();
    update_multiple_data_structures();
    call_multiple_functions();
}

// GOOD: Minimal processing, defer the rest
if (interrupt_status) {
    atomic_inc(&frame_count);
    schedule_work(&deferred_work);
}

Summary

The "fatal exception in interrupt" crash was caused by dangerous offset-based pointer arithmetic from decompiled code trying to access offset 0xc4. The fix involved:

  1. Replacing all dangerous offset arithmetic with safe struct member access
  2. Adding comprehensive pointer validation in interrupt handlers
  3. Using proper Linux kernel patterns (container_of, etc.)
  4. Implementing bulletproof interrupt clearing sequences

The key lesson: Never use decompiled code directly in interrupt handlers. Always convert dangerous offset arithmetic to safe struct member access and add comprehensive validation.

Files Modified

  • driver/tx_isp_vic.c - Fixed VIC interrupt handler
  • driver/tx_isp_core.c - Fixed ISP core interrupt handler with safe struct access

The system should now be crash-safe and handle interrupts properly without the "fatal exception in interrupt" panics.