On-device glasses detection for modern web apps.
Real-time face attribute detection using landmarks and a lightweight ONNX model. Runs entirely in the browser or Node.js. Zero backend required.
import { useRef } from 'react';
import { useGlassesDetector } from '@framefind/react';
export function App() {
const videoRef = useRef<HTMLVideoElement>(null);
const canvasRef = useRef<HTMLCanvasElement>(null);
const { result, detect } = useGlassesDetector({
modelUrl: '/models/glasses.onnx',
});
return (
<div>
<video ref={videoRef} autoPlay playsInline muted
onPlay={() => setInterval(() => {
if (videoRef.current && canvasRef.current)
detect(videoRef.current, canvasRef.current);
}, 66)}
/>
<canvas ref={canvasRef} hidden />
<p>{result?.glasses ? 'πΆ Glasses' : 'No glasses'}</p>
</div>
);
}The problem with existing APIs
Cloud Latency is Too High
Sending 30 frames per second over the network to cloud APIs causes severe lag. Edge-inference guarantees sub-100ms real-time feedback.
Privacy & Compliance
Transmitting user face data to servers creates enormous GDPR/CCPA overhead. On-device inference means data never leaves the browser.
Bloated Dependencies
Traditional ML suites export massive WASM bundles. FrameFind uses targeted MediaPipe landmark indices and a specialized 6.2MB model.
How it works
A lean, optimized pipeline utilizing ONNX Web Runtime.
Quick start
FrameFind is modular. Install the core SDK for raw JS environments, or the React bindings for hook-based state management.
npm install @framefind/core onnxruntime-webnpm install @framefind/reactnpm install onnxruntime-nodeimport { useRef, useEffect } from 'react';
import { useGlassesDetector } from '@framefind/react';
export default function App() {
const videoRef = useRef<HTMLVideoElement>(null);
const canvasRef = useRef<HTMLCanvasElement>(null);
const { result, loading, error, detect, reset } = useGlassesDetector({
modelUrl: '/models/glasses.onnx',
threshold: 0.35,
smoothingWindow: 8,
});
useEffect(() => {
navigator.mediaDevices.getUserMedia({ video: true }).then((stream) => {
if (videoRef.current) videoRef.current.srcObject = stream;
});
}, []);
return (
<div className="p-4">
<video ref={videoRef} autoPlay playsInline muted
onPlay={() => {
setInterval(() => {
if (videoRef.current && canvasRef.current)
detect(videoRef.current, canvasRef.current);
}, 66);
}}
/>
<canvas ref={canvasRef} style={{ display: 'none' }} />
{loading && <p>Loading modelβ¦</p>}
{error && <p>Error: {error.message}</p>}
{result && (
<div className="mt-4">
<p>Status: {result.glasses ? 'Glasses Detected' : 'No Glasses'}</p>
<p>Confidence: {(result.probability * 100).toFixed(1)}%</p>
</div>
)}
<button onClick={reset}>Reset</button>
</div>
);
}Engineered for the edge
Optimization metrics from production workloads.
Architecture
Future Roadmap
- Glasses detection
- Mask detection
- Blink / drowsiness detection
- Head pose estimation
- Attention tracking
- Iris gaze estimation
Philosophy
Edge-first AI. The future of computer vision shouldn't be gated by API latency, network conditions, or cloud compute costs. By running targeted, lightweight models directly on the client, we unlock interfaces that were previously impossible.
Privacy by design. When an application requires monitoring user attention, eye contact, or facial context, sending that media feed to a remote server is fundamentally hostile to privacy. On-device inference guarantees that pixels never leave the user's hardware.
Pragmatic ML. Instead of loading a 100MB general-purpose vision model, FrameFind uses a tiered approach: an ultra-optimized landmark detector securely crops regions of interest, feeding them into tiny, specialized classification models.
API Reference
Complete reference for all public APIs across the three FrameFind packages.
@framefind/core
Glasses detection core β browser and Node.jsnew GlassesDetector(options)classOptions: modelUrl, wasmPaths, threshold (default 0.35), smoothingWindow (default 8).
detector.load()methodInitialize ONNX model. Await before calling detect methods.
detector.detectFromImageData(pixels, w, h, landmarks?)methodRun inference on raw RGBA pixel buffer.
detector.detectFromCanvas(canvas, landmarks?)methodRun inference on an HTMLCanvasElement.
detector.detectFromVideoFrame(video, offscreenCanvas, landmarks?)methodRun inference on a live video frame.
detector.resetHistory()methodClear smoothing window history.
detector.dispose()methodRelease ONNX session and free memory.
new GlassesDetectorNode(options)classOptions: modelPath (local path), threshold, smoothingWindow.
detector.load()methodLoad model via onnxruntime-node.
detector.detectFromRgbaBuffer(pixels, faceDetected?)methodInference from raw RGBA Buffer.
detector.detectFromImagePath(path)methodLoad image from disk and run inference.
detector.resetHistory()methodClear smoothing window history.
detector.dispose()methodRelease model resources.
type DetectionResult = {
glasses: boolean; // true if glasses detected
probability: number; // smoothed confidence 0β1
faceDetected: boolean; // true if landmarks provided
};type DetectorConfig = {
modelUrl: string;
threshold?: number; // default: 0.35
smoothingWindow?: number; // default: 8
};@framefind/react
React hooks for FrameFind glasses detectionuseGlassesDetector(options)hookLoads detector on mount, disposes on unmount. Returns result, loading, error, detect(), reset().
options.modelUrloptionRequired. URL to .onnx model file.
options.wasmPathsoptionFolder path for onnxruntime .wasm files.
options.thresholdoptionBinary cutoff. Default 0.35.
options.smoothingWindowoptionFrames averaged. Default 8.
options.enabledoptionPause detection without unmounting.
const {
result, // DetectionResult | null
loading, // boolean
error, // Error | null
detect, // (video, canvas, landmarks?) => void
reset, // () => void
} = useGlassesDetector({ modelUrl: '/model.onnx' });@framefind/utils
Shared types, constants, and helpersDetectionResulttype{ glasses: boolean; probability: number; faceDetected: boolean }
DetectorConfigtype{ modelUrl: string; threshold?: number; smoothingWindow?: number }
GLASSES_MODEL_URLconstOfficial CDN URL for the ONNX model.
DEFAULT_THRESHOLDconst0.35 β binary decision cutoff.
DEFAULT_SMOOTH_Nconst8 β frames averaged for smoothing.
ROI_SIZEconst112 β crop size the model expects (px).
EYE_REGION_IDXconstMediaPipe landmark indices covering the eye region.
MEAN / STDconst[0.485, 0.456, 0.406] / [0.229, 0.224, 0.225] β ImageNet normalization.
sigmoid(x: number): numberfnConverts raw model logit to probability.
smoothAverage(history: number[]): numberfnAverages prediction history for temporal smoothing.
imageDataToChwFloat32(pixels, width, height): Float32ArrayfnRGBA buffer β CHW Float32 tensor normalized with MEAN/STD.