4. World Modeling & State Estimation

The Internal Universe: Models and Beliefs

In the last chapter, we saw how a robot gathers snapshots of the world through its sensors. But how does it connect these snapshots over time to build a persistent, internal understanding of the world? And how does it track its own position within that world? This is the domain of world modeling and state estimation.

The core challenge is that a robot can never know its true state (a collection of variables like position, orientation, and velocity) with perfect certainty. All sensor measurements are noisy, and all actions are imprecise. Therefore, the robot must estimate its state.

This estimate is not a single value, but a belief, which is a probability distribution over all possible states. The goal of state estimation is to keep this belief as sharp and accurate as possible, meaning the robot is highly confident about its state.

The Grand Challenge: SLAM

For a robot operating in a new environment, this leads to a classic chicken-and-egg problem:

To know where you are (Localization), you need an accurate map.
To build an accurate map (Mapping), you need to know where you are.

Simultaneous Localization and Mapping (SLAM) is the set of techniques that solves both problems at the same time. As the robot moves, it builds a map of the environment while simultaneously using that map to figure out its location.

The core SLAM loop is a powerful cycle of prediction and correction.

Move & Predict: The robot moves, and based on its motor commands (proprioception), it predicts its new state. Its uncertainty grows during this step.
Sense & Match: The robot observes the environment with its sensors (e.g., LiDAR or cameras) and finds landmarks that it has seen before.
Update: By matching the newly observed landmarks to the ones in its map, the robot can correct its predicted state, reducing its uncertainty. It then uses this corrected state to refine and extend the map.

A crucial part of this process is loop closure. When a robot recognizes a place it has been before after a long journey, it can drastically reduce the accumulated error in its trajectory, snapping the map into a much more consistent state.

How to Represent the World?

A "map" can take many forms, depending on the robot's task.

Occupancy Grids

This is one of the most common representations for navigation. The world is divided into a 2D or 3D grid. Each cell in the grid stores a probability value representing the belief that the cell is occupied by an obstacle. This is simple, effective, and excellent for finding clear paths.

Point Clouds

A point cloud is a direct, geometric representation of the world, typically generated by a LiDAR or a depth camera. It's a massive collection of 3D points. While very detailed, point clouds are computationally intensive to process and store. They are often used as an intermediate representation to build other map types.

Semantic Maps

A semantic map is a higher-level, more "human-like" representation of the world. Instead of just storing geometry, it stores objects and their properties (their "semantics"). The map might contain not just a cluster of points, but an entity labeled chair with a specific pose and size. This allows the robot to reason about its environment in a much more intelligent way (e.g., "Find me a chair to sit on").

Code Example: Intuition Behind State Estimation

Let's illustrate the core idea of state estimation (prediction and update) with a simple 1D example. Imagine a robot moving along a line. It thinks it's at position 10.0, but its sensors are noisy.

Python

import numpy as np

# Our belief about the robot's state (position and velocity)
# We believe it's at position 10.0, stopped, but we are uncertain.
# The covariance matrix expresses this uncertainty.
estimated_pos = 10.0
estimated_pos_uncertainty = 2.0  # High uncertainty

def predict(pos, uncertainty, movement_dist):
    """
    Predicts the next state based on movement.
    Uncertainty increases because movement itself is not perfect.
    """
    predicted_pos = pos + movement_dist
    predicted_uncertainty = uncertainty + 0.5 # Add motion uncertainty
    print(f"PREDICT: Moved by {movement_dist}. Predicted position: {predicted_pos:.2f}, Uncertainty: {predicted_uncertainty:.2f}")
    return predicted_pos, predicted_uncertainty

def update(predicted_pos, predicted_uncertainty, measurement, measurement_uncertainty):
    """
    Updates the belief by fusing the prediction with a new sensor measurement.
    This is a simplified version of the Kalman Gain calculation.
    """
    # Calculate a weighted average. Trust the value with less uncertainty more.
    kalman_gain = predicted_uncertainty / (predicted_uncertainty + measurement_uncertainty)
    
    new_pos = predicted_pos + kalman_gain * (measurement - predicted_pos)
    new_uncertainty = (1 - kalman_gain) * predicted_uncertainty

    print(f"UPDATE:  Got measurement {measurement:.2f}. Fusing prediction with measurement.")
    print(f"         New Position: {new_pos:.2f}, New Uncertainty: {new_uncertainty:.2f}\n")
    return new_pos, new_uncertainty

# --- Simulation Loop ---
# 1. Initial State
print(f"START:   Initial Position: {estimated_pos:.2f}, Uncertainty: {estimated_pos_uncertainty:.2f}\n")

# 2. First Movement & Update
estimated_pos, estimated_pos_uncertainty = predict(estimated_pos, estimated_pos_uncertainty, 5.0)
# We get a sensor measurement that says we are at 16.0, but sensors are noisy.
estimated_pos, estimated_pos_uncertainty = update(estimated_pos, estimated_pos_uncertainty, 16.0, 1.0)

# 3. Second Movement & Update
estimated_pos, estimated_pos_uncertainty = predict(estimated_pos, estimated_pos_uncertainty, 5.0)
# We get another measurement that says we are at 20.5.
estimated_pos, estimated_pos_uncertainty = update(estimated_pos, estimated_pos_uncertainty, 20.5, 1.0)

# The final estimated position is a statistically robust fusion of all our
# predictions and measurements, and our uncertainty has been reduced.

This simple example demonstrates the foundational logic of all state estimation. The robot continuously refines its belief by blending its own predictions with external measurements, allowing it to navigate the world with a confidence that no single sensor could provide.

The Internal Universe: Models and Beliefs​

The Grand Challenge: SLAM​

How to Represent the World?​

Occupancy Grids​

Point Clouds​

Semantic Maps​

Code Example: Intuition Behind State Estimation​