Object Detection


If you haven't set up the SDK yet, make sure to go through those directions first. You'll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

Object Detection In Action

With an Object Detection model, you can identify objects of interest in an image or each frame of live video. Each prediction returns a set of objects, each with a label, bounding box, and confidence score.

If you just need to know the contents of an image -- not the location of the objects -- consider using Image Labeling instead.

Custom Training Models for Object Detection

You can train a custom model that is compatible with the Object Detection API by using studio.

If you have a custom model that was trained outside of Fritz AI, follow this checklist to make sure it will be compatible with the Object Detection API.

  1. Your model must be a single-shot multibox detector with boxes matching the default configuration found here.
  2. Your model must be in the TensorFlow Lite (.tflite) or Core ML (.mlmodel) formats.
  3. iOS Only The name of the input layer must be named Preprocessor/sub:0 and the 2 outputs concat:0 (boxPredictions) and concat_1:0 (classPredictions).
  4. Android Only The 1 input layer (Preprocessor/sub) and 4 output layers ('outputLocations', 'outputClasses', 'outputScores', 'numDetections') should be defined in the TensorFlow Lite conversion tool.
  5. The input should have the following dimensions: 1x300x300x3 (batch_size x height x width * num_channels). Height and width are configurable.
  6. iOS Only The output should have the following dimensions: 4 (box points) x num_anchor_boxes x 1 for boxPredictions and num_classes x 1 for classPredictions.
  7. Android Only The output should have the following dimensions: 1 x num_anchor_boxes x 4 (box points) for outputLocations, num_classes x 1 for outputClasses & outputScores, and 1 for numDetections.

Pre-trained Object Detection Model

The object detection model supports 90 labels from the COCO dataset.

Technical Specifications

ArchitectureFormat(s)Model SizeInputOutputBenchmarks
SSDLite + MobileNet V2 variantCore ML (iOS), TensorFlow Lite (Android)~17 MB300x300-pixel imageOffsets for >2,000 candidate bounding boxes, Class labels for each box, Confidence scores for each box18 FPS on iPhone X, 8 FPS on Pixel 2


You can use the FritzVisionObjectModel to detect the objects inside of images. Fritz AI provides a variety of options to configure predictions.

1. Build the FritzVisionObjectModel

To create the object model, you can either include the model in your bundle or download it over the air once the user installs your app.

Include the model in your application bundle

Add the model to your Podfile

Include Fritz/VisionObjectModel/Fast in your Podfile. This will include the model file in your app bundle.

pod 'Fritz/VisionObjectModel/Fast'

Make sure to install the recent addition.

pod install

If you've built the app with just the core Fritz pod and add a new submodule for the model, you may encounter an error "Cannot invoke initializer for type". To fix this, run a pod update and clean your XCode build to resolve the issue.

Define FritzVisionObjectModelFast

Define the instance of the FritzVisionObjectModelFast in your code. There should only be one instance that is reused for each prediction.

import Fritz
let labelModel = FritzVisionObjectModelFast()
Model initialization

It's important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won't load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
lazy var model = FritzVisionHumanPoseModelFast()

Load model in viewDidLoad

By loading the model in viewDidLoad, you'll ensure that you're not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
let model: FritzVisionHumanPoseModelFast!
override func viewDidAppear(_ animated: Bool) {
model = FritzVisionHumanPoseModelFast()

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

Download the model over the air

Only available on Growth plans

For more information on plans and pricing, visit our website.

Add FritzVision to your Podfile

Include Fritz/Vision in your Podfile.

pod 'Fritz/Vision'

Make sure to run a pod install with the latest changes.

pod install

Download Model

import Fritz
var objectModel: FritzVisionObjectModelFast?
FritzVisionObjectModelFast.fetchModel { model, error in
guard let downloadedModel = model, error == nil else { return }
objectModel = downloadedModel

2. Create FritzVisionImage

FritzVisionImage supports different image formats.

Using a CMSampleBuffer

If you are using a CMSampleBuffer from the built-in camera, first create the FritzVisionImage instance:

let image = FritzVisionImage(buffer: sampleBuffer)
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
// or
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .left
// Add metdata
visionImage.metadata = [FritzVisionImageMetadata new];
visionImage.metadata.orientation = FritzImageOrientationLeft;
Setting the Orientation from the Camera

Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzVisionImage the orientation will change depending on which camera and device orientation you are using.

When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let image = FritzVisionImage(sampleBuffer: sampleBuffer, connection: connection)

Using an UIImage

If you are using an UIImage, create the FritzVisionImage instance:

let image = FritzVisionImage(image: uiImage)

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .right
Set the image orientation

UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image's image orientation:

image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)

3. Run object detection

Use the objectModel instance you created earlier to run predictions:

guard let results = try? objectModel.predict(image) else { return }

Configure Object Prediction

Before running object detection, you can configure the prediction with a FritzVisionObjectModelOptions object.

|----------------------------|--------------------------------------------------------------------------| | Settings | | |----------------------------|--------------------------------------------------------------------------| | imageCropAndScaleOption | .scaleFit (default) | | | | | | Crop and Scale option for how to resize and crop the image for the model | |----------------------------|--------------------------------------------------------------------------| | threshold | 0.6 (default) | | | | | | Confidence threshold for prediction results in the range of [0, 1]. | |----------------------------|--------------------------------------------------------------------------| | numResults | 15 (default) | | | | | | Maxiumum number of results to return from prediction. | |----------------------------|--------------------------------------------------------------------------| | iouThreshold | 0.25 (default) | | | | | | Threshold for overlap of items within a single class in range [0, 1]. | | | | | | Lower values are more strict. | |----------------------------|--------------------------------------------------------------------------|

For example, to build a more lenient FritzVisionObjectModelOptions object:

let options = FritzVisionObjectModelOptions()
options.threshold = 0.3
options.numResults = 2
guard let results = try? objectModel.predict(image, options: options) else { return }

4. Get detected objects in the image

List Detected Objects

The [FritzVisionObject] array has a list of detected objects from the prediction result. Each object has a label, confidence, and bounding box describing where the object is located.

// Created from model prediction.
let objects: [FritzVisionObject]
for object in objects {

Draw Bounding Boxes

Use FritzVisionObject to draw bounding boxes around detected objects. The BoundingBox object on ever FritzVisionObject has logic to convert the detected bounding box into a CGRect for your given coordinate space.

// Image used in model.
let image = FritzVisionImage(...)
// Object created from model prediction.
let boundingBox = object.boundingBox
guard let size = image.size else { return }
let boundingBoxCGRect = boundingBox.toCGRect(imgHeight: size.height, imgWidth: size.width)

Custom Object Detection Model

You can use a model that has been trained with the TensorFlow Object Detection API. The model must have take an image input of size 300x300.

If you have trained your own object detection model, you can use it with FritzVisionObjectPredictor.

1. Create a custom model for your trained model in the webapp and add to your Xcode project.**

For instructions on how to do this, see :ref:Integrating a Custom Core ML Model<core_ml_ref>.

2. Conform your model**

Create a new file called :code:CustomObjectDetectionModel+Fritz.swift and conform your class like so:

import Fritz
extension CustomObjectDetectionModel: SwiftIdentifiedModel {
static let modelIdentifier = "model-id-abcde"
static let packagedModelVersion = 1

3. Define the Custom Object Detection Model**

let objectDetectionModel = FritzVisionObjectPredictor(
model: CustomObjectDetectionModel().fritz(),

4. Use the record method on the predictor to collect data

The FritzVisionObjectPredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

guard let results = try? objectModel.predict(image, options: options),
// Implement your own custom UX for users to label an image and store
// that as a list of [FritzVisionObject].
objectModel.record(image, predicted: results, modified: modifiedObjects)

Other Code Examples

Not sure how to get started? Check out these resources for more examples using Object Detection:

  • Fritz AI Studio: See object detection used to draw bounding boxes around objects.


1. Add the dependencies via Gradle

Add our repository in order to download the Vision API:

repositories {
maven { url "https://fritz.mycloudrepo.io/public/repositories/android" }

Add renderscript support and include the vision dependency in app/build.gradle. Renderscript is used in order to improve image processing performance. You'll also need to specify aaptOptions in order to prevent compressing TensorFlow Lite models.

android {
defaultConfig {
renderscriptTargetApi 21
renderscriptSupportModeEnabled true
// Don't compress included TensorFlow Lite models on build.
aaptOptions {
noCompress "tflite"
dependencies {
implementation 'ai.fritz:vision:+'

(Optional include model in your app) To include Object Detection model with your build, then you'll need to add the dependency as shown below. Note: This includes the model with your app when you publish it to the play store and will increase your app size.


Behind the scenes, Object Detection uses a TensorFlow Lite model. In order to include this with your app, you'll need to make sure that the model is not compressed in the APK by setting aaptOptions.

dependencies {
implementation 'ai.fritz:vision-object-detection-model-fast:{ANDROID_MODEL_VERSION}'

Now you're ready to detect objects with the Object Detection API.

2. Create an ObjectDetectionOnDeviceModel

You will first need to define a model and then use a predictor to run it.

If you are using on of the Fritz SDK pre-trained model and have followed the optional step above and included the Object Detection model, you can get a predictor to use immediately:

ObjectDetectionOnDeviceModel onDeviceModel = FritzVisionModels.getObjectDetectionOnDeviceModel();

If you have trained a custom object detection model with Fritz Model Training, you'll need to download the tflite model file from your model's detail page in Fritz webapp and add it to the assets folder of your app. You can then define an on-device model as follows:

List<String> labels = Arrays.asList("Background", "List", "Of", "Labels");
ObjectDetectionOnDeviceModel onDeviceModel = new ObjectDetectionOnDeviceModel(

Note that the first item in the list of labels is a "None" or "Background" label for when no object in predicted. The length of the labels list should be N + 1 where N is the number of objects your model predicts.

3. Create an Object Detection Predictor

Now that you have a model, you can create a FritzVisionObjectPredictor to make predictions. Predictors handle necessary pre- and post-processing.

FritzVisionObjectPredictor predictor = FritzVision.ObjectDetection.getPredictor(onDeviceModel);

If you did not include the on-device model in your APK bundle, you'll have to load the model before you can get a predictor. To do that, you'll use ObjectDetectionManagedModel object and call FritzVision.ObjectDetection.loadPredictor to start the model download.

FritzVisionObjectPredictor predictor;
ObjectDetectionManagedModel managedModel = FritzVisionModels.getObjectDetectionManagedModel();
FritzVision.ObjectDetection.loadPredictor(managedModel, new PredictorStatusListener<FritzVisionObjectPredictor>() {
public void onPredictorReady(FritzVisionObjectPredictor objectDetectionPredictor) {
Log.d(TAG, "Object Detection predictor is ready");
predictor = objectDetectionPredictor;

4. Create a FritzVisionImage from an image or a video stream

To create a FritzVisionImage from a Bitmap:

FritzVisionImage visionImage = FritzVisionImage.fromBitmap(bitmap);

To create a FritzVisionImage from a media.Image object when capturing the result from a camera, first determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);
// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);
// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
ImageRotation imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);

Finally, create the FritzVisionImage object with the rotation

FritzVisionImage visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);

5. Run prediction - Detect different objects in the image

Next, convert a bitmap into a FritzVisionImage and pass the image into the predictor in order to evaluate the objects in the image:

FritzVisionObjectResult objectResult = objectPredictor.predict(visionImage);

The predict method returns back a FritzVisionObjectResult object that contains the following methods:

FritzVisionObjectResult methods

List<FritzVisionObject> getObjects()Gets a list of all objects detected.
List<FritzVisionObject> getObjectsAboveThreshold(float confidenceThreshold)Gets all objects detected above a certain threshold.
List<FritzVisionObject> getVisionObjectsByClass(String labelName)Gets all detected objects by a class name (e.g person, cat, etc)
List<FritzVisionObject> getVisionObjectsByClass(String labelName, float labelConfidence)Gets all detected objects by a class name (e.g person, cat, etc) and confidence score
List<FritzVisionObject> getVisionObjectsByClasses(List<String> labelNames)Gets all objects matching any of the provided label names
List<FritzVisionObject> getVisionObjectsByClasses(List<String> labelNames, float labelConfidence)Gets all objects matching any of the provided label names and confidence scores

No objects found from predict result

If you follow the steps above to implement your custom object detection model and you aren't seeing any results, consider lowering the confidence threshold for displaying predictions. By default, the Fritz SDK filters out predictions that models are not confident about. If your model is not very accurate, results may have low confidence scores. To decrease the score threshold, create a predictor with the following options:

// Create predictor options
FritzVisionObjectPredictorOptions options = new FritzVisionObjectPredictorOptions()
options.confidenceThreshold = 0.1f; // A low threshold
FritzVisionObjectPredictor predictor = FritzVision.ObjectDetection.getPredictor(onDeviceModel, options);

6. Displaying the result

Create a bitmap with the bounding boxes on the original image:

// Draw the original image that was passed into the predictor
FritzVisionObject visionObject = ...;
Bitmap boundingBoxOnImage = visionImage.overlayBoundingBox(visionObject);

Draw the bounding box on a canvas:

Canvas canvas = ...;
FritzVisionObject visionObject = ...;
// Draws the bounding box on the canvas.

7. Use the record method on the predictor to collect data

The FritzVisionObjectPredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

FritzVisionObjectResult predictedResults = visionPredictor.predict(visionImage);
// Implement your own custom UX for users to annotate an image and store
// that as a FritzVisionObjectResult.
visionPredictor.record(visionImage, predictedResults.toAnnotations(), modifiedResults.toAnnotations())

Advanced Options

You can configure the predictor with FritzVisionObjectPredictorOptions to return specific results that match the options given:

FritzVisionObjectPredictorOptions methods

confidenceThreshold.6Return objects detected above the confidence threshold
labelsMobileNet labelsa list of labels for the model
iouThreshold.2Intersection over union (IOU) used to detect overlapping instances and deduping detections. A higher value will requires a larger overlapping area to filter out the detection.

In order to change model performance for different devices, you may also expose the underlying TensorFlow Lite Interpreter options.

FritzVisionPredictorOptions methods

useGPUfalseReturn labels above the confidence threshold. Please note, this is an experimental option and should not be used in production apps.
useNNAPIfalseUses the NNAPI for running model inference. Please note, this is an experimental option and should not be used in production apps.
numThreads2For CPU Only, run model inference using the specified number of threads

For more details, please visit the Official TensorFlow Lite documentation.


// Create predictor options
FritzVisionObjectPredictorOptions options = new FritzVisionObjectPredictorOptions()
options.confidenceThreshold = 0.7f;
options.numThreads = 2;
// Pass in the options when initializing the predictor
FritzVisionObjectPredictor predictor = FritzVision.ObjectDetection.getPredictor(onDeviceModel, options);