GazeDetector
Pure-geometry gaze direction from MediaPipe iris landmarks.
GazeDetector (from @framefind/core) is the underlying class that useGazeDetector wraps. It uses the iris landmarks (478-point set) MediaPipe FaceLandmarker already produces, so no ONNX model is fetched.
The algorithm is simple geometry:
- Locate the iris center (landmark 468 for left eye, 473 for right).
- Map it to
[-1, 1]inside the eye's bounding box (outer/inner corners + top/bottom lids). - Average both eyes.
- Subtract a scaled version of head yaw/pitch to remove head-pose contamination.
- Apply a One Euro filter for low-lag smoothing.
Browser usage
import { GazeDetector } from "@framefind/core";
const detector = new GazeDetector({});
await detector.load();
// on each frame:
const result = detector.detectFromVideo(videoEl);
console.log(result.x, result.y, result.region);
// when done:
detector.dispose();detectFromVideo is synchronous — it returns the latest gaze immediately, no canvas needed.
Node.js
GazeDetectorNode does not run MediaPipe itself — it takes landmarks you already produced (e.g. via @tensorflow-models/face-landmarks-detection with refineLandmarks: true, which yields the 478-point iris-augmented set).
import { GazeDetectorNode } from "@framefind/core/node";
const gaze = new GazeDetectorNode({});
const result = gaze.detectFromLandmarks(landmarks, yawDeg, pitchDeg);
console.log(result.region, result.screen);If you don't have head-pose angles, pass 0, 0 — the result will simply not be head-compensated.
Options
new GazeDetector({
// Face landmarker assets.
faceLandmarkerModelUrl: "...",
mediapipeWasmPath: "...",
minFaceDetectionConfidence: 0.5,
minFacePresenceConfidence: 0.5,
minTrackingConfidence: 0.5,
// Multipliers applied to head yaw/pitch (in degrees) before subtracting from gaze.
// Tune these on your camera setup — typical values 0.008 – 0.02.
yawCompensation: 0.012,
pitchCompensation: 0.012,
// Half-width of the central "center" region. Default: 0.18
deadzone: 0.18,
// Flip gaze horizontally — set true for mirrored selfie cameras. Default: true.
mirrorX: true,
// Flip gaze vertically. Default: false.
mirrorY: false,
// Smoothing. Default: One Euro.
smoothing: { type: "oneEuro" },
// Prefer GPU delegate. Default: true, falls back to CPU.
preferGpu: true,
// Minimum ms between inferences in a frame loop. Default: 0.
inferenceIntervalMs: 0,
});Result type
type GazeRegion =
| "top-left" | "top" | "top-right"
| "left" | "center" | "right"
| "bottom-left" | "bottom" | "bottom-right";
type GazeResult = {
x: number;
y: number;
region: GazeRegion;
rawX: number;
rawY: number;
screen: { x: number; y: number };
faceDetected: boolean;
};x and y are the head-compensated, smoothed gaze in [-1, 1]. rawX/rawY are the unfiltered iris-in-eye ratio if you want to apply your own compensation. screen maps (x, y) linearly to [0, 1] so it can be used as a normalized cursor position.
Calibration
The detector ships with a per-user affine calibrator. After 5–9 samples it fits a 2×3 transform that maps the head-compensated raw gaze directly to screen coordinates in [0, 1].
// Show a dot at (0.1, 0.1) and have the user fixate for ~1 second.
detector.addCalibrationSample(0.1, 0.1);
// ...repeat for the rest of the grid...
const cal = detector.calibrate();
if (cal) {
console.log("Affine fit:", cal.xRow, cal.yRow, cal.sampleCount);
}
detector.isCalibrated(); // boolean
detector.getCalibration(); // GazeCalibration | null
detector.setCalibration(cal); // inject a saved calibration
detector.clearCalibration(); // drop samples + active transformaddCalibrationSample uses the most recent detector output, so call it only after detectFromVideo has produced a faceDetected: true result. The minimum sample count is 3 (the number of affine parameters per axis); 9 is recommended for screen-spanning accuracy.
To persist across sessions, save the GazeCalibration object — it is plain JSON.
type GazeCalibration = {
xRow: [number, number, number]; // [a, b, c] — screenX = a*rawX + b*rawY + c
yRow: [number, number, number]; // [d, e, f] — screenY = d*rawX + e*rawY + f
sampleCount: number;
};For lower-level use (e.g. averaging multiple frames per target before pushing a sample), pushCalibrationSample({ rawX, rawY, targetX, targetY }) accepts explicit raw values.
Cleanup
detector.dispose();Releases the MediaPipe face landmarker, resets smoothing, and clears any active calibration.