Integration with IoT Devices

This document provides developers with a guide to integrating MultiSet's Visual Positioning System (VPS) into non-AR, camera-enabled IoT devices like the ESP32. The primary use case is to achieve instant, centimeter-level positioning by sending image data to the MultiSet REST API and receiving a precise device pose in return.

Advantages of MultiSet VPS for IoT

Integrating MultiSet VPS with your IoT devices offers several key advantages:

High-Precision Positioning: Achieve centimeter-level accuracy, enabling a new class of location-aware applications.
Instant Localization: Get a precise position and orientation from a single image, eliminating the need for drift-prone tracking over time.
Cost-Effective Hardware: The solution works with standard camera modules, allowing you to leverage affordable and widely available hardware like the ESP32-CAM.
Low On-Device Processing: The heavy computational tasks are offloaded to the MultiSet cloud, keeping the device-side requirements minimal.
Robustness: Visual positioning is less susceptible to the interference and signal-loss issues that can affect other positioning technologies like GPS or Wi-Fi, especially indoors.
Rich Data: The system provides a full 6DoF (Degrees of Freedom) pose (3D position and 3D rotation), giving you a complete understanding of the device's orientation in space.

Integration Workflow

The process of getting a device's pose is straightforward and consists of the following steps:

Capture an Image: Use the camera module on your device (e.g., ESP32) to capture an image of the surrounding environment.
Prepare API Request: Collect the necessary camera parameters, including the image resolution and intrinsic values (focal length and principal point).
Send Request to MultiSet API: Make an HTTP POST request to the MultiSet VPS API endpoint. The request will be a multipart/form-data submission containing the image and camera parameters.
Receive Pose Data: The API will process the image and return a JSON object containing the device's calculated position and rotation if the location is recognized.
Utilize Pose Data: Parse the JSON response on your device to extract the position and rotation, and use this data in your application.

Developer Integration Steps

Integration requires interacting with two main API endpoints.

1. Generate M2M Auth Token

This endpoint authenticates your device and provides a JWT token that is required for all other API calls.

Endpoint: POST https://api.multiset.ai/v1/m2m/token

Body (application/json):

codeJSON

{
  "clientId": "YOUR_CLIENT_ID",
  "clientSecret": "YOUR_CLIENT_SECRET"
}

Success Response (200 OK):

codeJSON

{
  "token": "ey...",
  "expiresIn": 3600
}

2. Query VPS Map

This endpoint takes an image and camera parameters to perform localization.

Endpoint: POST https://api.multiset.ai/v1/vps/map/query-form
Authorization Header: Authorization: Bearer <YOUR_JWT_TOKEN>
Body (multipart/form-data):
- mapCode
- fx, fy, px, py, width, height (Required camera intrinsics)
- queryImage (Required image file)

Understanding the API Response

The API returns a JSON object with the localization result.

{
  "poseFound": true,
  "location": {
    "position": {
      "x": -5.1772,
      "y": 0.2936,
      "z": -2.6439
    },
    "rotation": {
      "qx": -0.0185,
      "qy": 0.9949,
      "qz": -0.0691,
      "qw": 0.0703
    }
  },
  "confidence": 0.89,
  "mapId": "66ca3f3b773b18f09e131279"
}

poseFound: A boolean that is true if the device's location was successfully identified within the map.
location: An object containing the 6DoF pose.
- position: The (x, y, z) coordinates of the device relative to the map's origin.
- rotation: The orientation of the device represented as a quaternion (qx, qy, qz, qw).
confidence: A numerical value indicating the confidence level of the localization result.
mapId: The unique identifier of the map in which the device was localized.

Sample C++ Script for ESP32

This sample code demonstrates how to send an image to the MultiSet VPS API from an ESP32 using the HTTPClient and ArduinoJson libraries.

Prerequisites:

ESP32 board with a camera module (e.g., ESP32-CAM).
Arduino IDE with the ESP32 board support package installed.
Install the ArduinoJson library from the Arduino Library Manager.
Configure your Wi-Fi credentials, API token, and camera parameters in the script.

#include <WiFi.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include "esp_camera.h"

//
// -- PINOUT FOR AI-THINKER ESP32-CAM --
//
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27
#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

// --- WIFI and API Credentials ---
const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* client_id = "YOUR_CLIENT_ID";
const char* client_secret = "YOUR_CLIENT_SECRET";
const char* map_code = "YOUR_MAP_CODE";

// This will be filled by the authentication function
String jwt_token;

// --- Camera Intrinsics (REPLACE WITH YOUR CAMERA'S VALUES) ---
const char* fx = "669.53";
const char* fy = "669.53";
const char* px = "478.87";
const char* py = "364.92";
const char* img_width = "960";
const char* img_height = "720";

const char* api_host = "https://api.multiset.ai";

// --- Function Prototypes ---
bool getAuthToken();
void sendImageToVPS();
void parseVPSResponse(String json);
void initCamera();

void setup() {
  Serial.begin(115200);
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi...");
  }
  Serial.println("Connected to WiFi");

  initCamera();

  // Get the auth token once at the start
  if (!getAuthToken()) {
    Serial.println("Failed to get auth token. Halting.");
    while(1); // Stop execution
  }
}

void loop() {
  Serial.println("Capturing image and sending to MultiSet VPS...");
  sendImageToVPS();
  delay(30000); // Wait 30 seconds before next request
}

// NEW FUNCTION: Authenticates and retrieves the JWT token
bool getAuthToken() {
  HTTPClient http;
  http.begin(String(api_host) + "/v1/m2m/token");
  http.addHeader("Content-Type", "application/json");

  StaticJsonDocument<128> doc;
  doc["clientId"] = client_id;
  doc["clientSecret"] = client_secret;

  String requestBody;
  serializeJson(doc, requestBody);

  int httpCode = http.POST(requestBody);

  if (httpCode == 200) {
    String payload = http.getString();
    Serial.println("Successfully authenticated.");
    
    StaticJsonDocument<256> responseDoc;
    deserializeJson(responseDoc, payload);
    
    jwt_token = responseDoc["token"].as<String>();
    if(jwt_token.length() > 0) {
      Serial.println("Auth token received.");
      http.end();
      return true;
    }
  }

  Serial.print("Error getting auth token. HTTP Code: ");
  Serial.println(httpCode);
  String payload = http.getString();
  Serial.println(payload);
  http.end();
  return false;
}

void sendImageToVPS() {
  if (jwt_token.length() == 0) {
    Serial.println("No JWT token available. Cannot send request.");
    return;
  }
  
  camera_fb_t * fb = NULL;
  fb = esp_camera_fb_get();
  if (!fb) {
    Serial.println("Camera capture failed");
    return;
  }

  HTTPClient http;
  http.begin(String(api_host) + "/v1/vps/map/query-form");

  // Use the retrieved JWT token for authorization
  String auth_header = "Bearer " + jwt_token;
  http.addHeader("Authorization", auth_header);

  // -- Construct multipart/form-data request --
  String boundary = "----WebKitFormBoundary7MA4YWxkTrZu0gW";
  String contentType = "multipart/form-data; boundary=" + boundary;
  
  String request_body_prefix = "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"mapCode\"\r\n\r\n" + map_code + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"fx\"\r\n\r\n" + fx + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"fy\"\r\n\r\n" + fy + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"px\"\r\n\r\n" + px + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"py\"\r\n\r\n" + py + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"width\"\r\n\r\n" + img_width + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"height\"\r\n\r\n" + img_height + "\r\n" +
    "--" + boundary + "\r\n" +
    "Content-Disposition: form-data; name=\"queryImage\"; filename=\"capture.jpg\"\r\n" +
    "Content-Type: image/jpeg\r\n\r\n";

  String request_body_suffix = "\r\n--" + boundary + "--\r\n";

  size_t total_len = request_body_prefix.length() + fb->len + request_body_suffix.length();
  
  uint8_t * buffer = (uint8_t*) malloc(total_len);
  memcpy(buffer, request_body_prefix.c_str(), request_body_prefix.length());
  memcpy(buffer + request_body_prefix.length(), fb->buf, fb->len);
  memcpy(buffer + request_body_prefix.length() + fb->len, request_body_suffix.c_str(), request_body_suffix.length());

  int httpCode = http.POST(buffer, total_len);
  
  http.end(); // End this connection

  // Check if token expired
  if (httpCode == 401) {
    Serial.println("Token expired or invalid. Re-authenticating...");
    if (getAuthToken()) {
       Serial.println("Retrying VPS query...");
       // Here you could add logic to retry sendImageToVPS()
    }
  } else if (httpCode > 0) {
    String payload = http.getString();
    Serial.printf("[HTTP] POST... code: %d\n", httpCode);
    Serial.println("Response payload:");
    Serial.println(payload);
    parseVPSResponse(payload);
  } else {
    Serial.printf("[HTTP] POST... failed, error: %s\n", http.errorToString(httpCode).c_str());
  }

  esp_camera_fb_return(fb);
  free(buffer);
}

void parseVPSResponse(String json) {
  StaticJsonDocument<512> doc;
  DeserializationError error = deserializeJson(doc, json);

  if (error) {
    Serial.print("deserializeJson() failed: ");
    Serial.println(error.c_str());
    return;
  }

  bool poseFound = doc["poseFound"];
  if (poseFound) {
    float x = doc["location"]["position"]["x"];
    float y = doc["location"]["position"]["y"];
    float z = doc["location"]["position"]["z"];
    float qx = doc["location"]["rotation"]["qx"];
    float qy = doc["location"]["rotation"]["qy"];
    float qz = doc["location"]["rotation"]["qz"];
    float qw = doc["location"]["rotation"]["qw"];
    
    Serial.println("Pose Found!");
    Serial.printf("Position (x,y,z): %.4f, %.4f, %.4f\n", x, y, z);
    Serial.printf("Rotation (qx,qy,qz,qw): %.4f, %.4f, %.4f, %.4f\n", qx, qy, qz, qw);
  } else {
    Serial.println("Pose not found in the provided map.");
  }
}

void initCamera() {
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  
  config.frame_size = FRAMESIZE_XGA; // Set to 1024x768. Ensure this matches your intrinsics.
  config.jpeg_quality = 12;
  config.fb_count = 1;

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }
}

PreviousVuforia Model Targets NextUnity SDK Update guide

Last updated 3 months ago

Was this helpful?