Machine Vision take-home

Description

Our autonomous ground vehicles need to localize themselves on construction sites. To achieve this, we are using AprilTags, as they provide stable features for localization and we can easily print them on stickers and apply them in the region of interest. The localization pipeline works as follows:

Print AprilTags with unique IDs on stickers and place them on flat surfaces on the construction site
Record a video of the construction site
Process the video to create a “map” of the AprilTag locations in some reference frame
Now estimating the pose of a camera is just a matter of detecting AprilTags and solving the Perspective-n-Point problem (E.g. OpenCV’s solvePnP)

The goal of this assignment is to implement step 3. You are provided with a video recording (by a calibrated camera) of a “mock” construction site at our office. Your code should produce a JSON-file containing the mapped tags in a reference frame of your choice. The format of the JSON-file is as follows:

[
  {
    // ID of the tag
    "id": 0,
    
    // XYZ position of the 4 corners of the tag in the order
    // returned by the detector, in millimetre
    "corners": [
      [-21, 21, 0],
      [21, 21, 0],
      [21, -21, 0],
      [-21, -21, 0]
    ] 
  },
  ... // More Tags
]

Data

This .tar.gz file contains a video and the camera intrinsic calibration parameters (as used in OpenCV).

monumental_vision_take_home_mapping.tar.gz

Outcome

We expect you to spend around 6-8 hours on the assignment. Prioritize building something that works end-to-end over optimizing a single component of your solution.

You are expected to deliver:

Code that produces the desired .json file given an mp4 video as input. Please use either Python, C++ or Rust, and make sure you’re happy with the quality of your code. We definitely don’t want to see a Jupyter Notebook. We would like to be able to run your code on our machines with a single command.
A short write-up of your approach, explaining the algorithm and any trade-offs you’ve made (<1 page) please include some visualizations of your algorithm on the sample dataset as well. What would you change if you had a couple more weeks to work on this?

Additional information

Useful things to know:

We are using the AprilTag family tagStandard52h13 (introduced in AprilTag 3). The AprilTags have a side length of 42mm.
In plantage_shed.mp4, we have arranged 3 markers in an L-shape. You can use this to verify that your algorithm produces meaningful results. See the diagram below for the tag IDs and the distances between their respective centers.

2
^
|
| 1090mm
|
v     1940mm
3  <----------> 39