Radar Intro
Radars operate by emitting and receiving electromagnetic pulses, following principles similar to sound wave reflection.
- Transmitter generates radio frequency pulses with high power
- Pulses are transmitted through medium (air) through antenna
- When object is reached, pulses echo from the transmission of radio frequency energy to this object
- A small portion of the reflected energy returns to the radar through the antenna and is directed to the receiver.
- Finally, the receiver sends the energy to the signal processor to determine the direction, distance, and even speed of the object identified
Strengths:
- Long-range: hundreds of meters
- Robustness to weather and lighting conditions
- Can determine the position of obstacles invisible to the naked eye - or even to other sensors like cameras - due to distance, darkness, or weather
- Velocity measurement: Doppler effect can be used to measure relative velocity of objects
- Much cheaper than lidar
Weaknesses:
- Low resolution; hard to distinguish between closely-spaced objects and small objects
- Hard to determine shape of detected objects
- Radar signals can reflect off multiple surfaces before returning to the sensor, leading to multi-path interference
Camera-Radar Fusion
Why fuse?
- Radar doesn’t allow for delineating shapes, but is robust to conditions.
- Radar provides data in amplitudes, ranges, and Doppler spectrum
- Cameras provide rich semantic data
How to fuse?
- Point-wise addition (or average)
- Concatenation of feature maps
- Ensembles and MoE
When to fuse?
- Neural networks represent and process features in a hierarchical manner throughout their different levels of layers.
- Initial layers process coarser representation of the input, thus having more detailed spatial information.
- Move further in architecture → feature maps lose spatial detail to gain semantic information
- In last layers, feature maps completely encapsulate semantics, but are limited in terms of spatial information.
Early Fusion
- Fuse input data or fuse features from the initial layers of a network
- Full exploration of raw data
- Low computation cost (network jointly processes the fused sensing modalities, so you don’t need 2 networks)
- Sensitiveness to spatial-temporal misalignment due to calibration errors
Middle Fusion
- Feature-level fusion
- Fuse features from intermediate layers of the network
- One-layer fusion, deep fusion, sort-cut fusion
- Good balance between preserving spatial information and taking advantage of learned features
- Drawback: Hard to find optimal fusion scheme for each particular network architecture
Late Fusion
- Decision-level fusion
- Occurs in late step of networking processing, close to output
- Combines the outputs of domain-specific networks (experts) for the different sensing modalities.
- Main advantages relies on model flexibility, given that when a new sensing modality is introduced, only its expert network must be retrained.
- Main drawbacks are the high costs in terms of computation and memory, as well as the discarding of possibly important features from intermediate layers.