Description

Our autonomous ground vehicles need to localize themselves on construction sites. To achieve this, we are using AprilTags, as they provide stable features for localization and we can easily print them on stickers and apply them in the region of interest. The localization pipeline works as follows:

  1. Print AprilTags with unique IDs on stickers and place them on flat surfaces on the construction site
  2. Record a video of the construction site
  3. Process the video to create a “map” of the AprilTag locations in some reference frame
  4. Now estimating the pose of a camera is just a matter of detecting AprilTags and solving the Perspective-n-Point problem (E.g. OpenCV’s solvePnP)

The goal of this assignment is to implement step 3. You are provided with a video recording (by a calibrated camera) of a “mock” construction site at our office. Your code should produce a JSON-file containing the mapped tags in a reference frame of your choice. The format of the JSON-file is as follows:

[
  {
    // ID of the tag
    "id": 0,
    
    // XYZ position of the 4 corners of the tag in the order
    // returned by the detector, in millimetre
    "corners": [
      [-21, 21, 0],
      [21, 21, 0],
      [21, -21, 0],
      [-21, -21, 0]
    ] 
  },
  ... // More Tags
]

Data

This .tar.gz file contains a video and the camera intrinsic calibration parameters (as used in OpenCV).

monumental_vision_take_home_mapping.tar.gz

Outcome

We expect you to spend around 6-8 hours on the assignment. Prioritize building something that works end-to-end over optimizing a single component of your solution.

You are expected to deliver:

Additional information

Useful things to know:

2
^
|
| 1090mm
|
v     1940mm
3  <----------> 39