Imagine you have a large infrastructure of various kinds of devices that you need to regularly maintain or ensure they are not dangerous to the surrounding environment.
One way to achieve this is by regularly sending people to every spot to check if everything is fine. This is somehow doable but also quite a time and resources expensive. And if the infrastructure is large enough, you might not be able to cover it whole within a year.
Another way is to automate that process and let the jobs in the cloud verify for you. For that to happen, you will need to do the following:
👉 A quick process on how to obtain pictures of the devices. This can still be done by persons as it is still much faster to do just a picture like to do all device verification processes. It can also be done by photos taken from cars or even drones, in which case it becomes a much faster and more automated pictures gathering process.
👉 Then you need to send all the obtained pictures to one dedicated place in the cloud.
👉 In the cloud, you need an automated job to pick up the pictures and process them through machine-learning models trained to recognize device damages or anomalies.
👉 Finally, the results must be visible to required users so that repair can be scheduled for devices with problems.
Let’s look at how we can achieve anomaly detection from the pictures in the AWS cloud. Amazon has a few pre-built machine-learning models we can use for that purpose.
How to Create a Model for Visual Anomaly Detection
To create a model for visual anomaly detection, you will need to follow several steps:
Step 1: Clearly define the problem you want to solve and the types of anomalies you want to detect. This will help you determine the appropriate testing dataset that you will need to train the model.
Step 2: Collect a large dataset of images representing normal and anomalous conditions. Label the images to indicate which are normal and which contain anomalies.
Step 3: Choose a model architecture that is suitable for the task. This may involve selecting a pre-trained model and fine-tuning it for your specific use case or creating a custom model from scratch.
Step 4: Train the model using the prepared dataset and the selected algorithm. This means using transfer learning to leverage pre-trained models or training the model from scratch using techniques such as convolutional neural networks (CNNs).
How to Train a Machine Learning Model

The process of training AWS machine learning models for visual anomaly detection typically involves several important steps.
#1. Collect the Data
In the beginning, you need to collect and label a large dataset of images that represent both normal and anomalous conditions. The larger the dataset is, the better and more precise the model can be trained. But also, it involves much more time dedicated to training the model.
Usually, you want to have around 1000 pictures in a testing set to have a good start.
#2. Prepare the Data
The picture data must be first pre-processed for the machine learning models to be able to pick them up. Pre-processing can mean various things, like:
- Cleaning the input pictures into separate subfolders, correcting metadata, etc.
- Resizing the pictures to meet the resolution requirements of the model.
- Distributing them into smaller chunks of pictures for more effective and parallel processing.
#3. Select the Model
Now pick the right model to do the right job. Either choose a pre-trained model, or you can create a custom model suitable for the visual anomaly detection on the model.
#4. Evaluate the Results
Once the model processes your data set, you shall validate its performance. Also, you want to check if the results are satisfactory for the needs. This can mean, for example, that the results are correct on more than 99% of the input data.
#5. Deploy the Model
If you are satisfied with the results and performance, deploy the model with a specific version into the AWS account environment so the processes and services can start using it.
#6. Monitor and Improve
Let it run through various test jobs and picture data sets and constantly evaluate whether the required parameters for detection correctness are still in place.
If not, retrain the model by including the new data sets where the model delivered the wrong results.
AWS Machine Learning Models
Now, look at some concrete models you can leverage in the Amazon cloud.
AWS Rekognition

Rekognition is a general-purpose image and video analysis services usable for various use cases, such as face recognition, object detection, and text recognition. Most of the time, you will use the Rekognition model for an initial raw generation of detection results to form a data lake of identified anomalies.
It provides a range of pre-built models you can use without training. Rekognition also delivers real-time analysis of images and videos with high accuracy and low latency.
Here are some typical use cases where Rekognition is a good choice for anomaly detection:
- Have a general-purpose use case for anomaly detection, such as detecting anomalies in images or videos.
- Perform real-time anomaly detection.
- Integrate your anomaly detection model with AWS services like Amazon S3, Amazon Kinesis, or AWS Lambda.
And here are some concrete examples of anomalies you can detect using Rekognition:
- Anomalies in faces, such as detecting facial expressions or emotions outside the normal range.
- Missing or misplaced objects in a scene.
- Misspelled words or unusual patterns of text.
- Unusual lighting conditions or unexpected objects in a scene.
- Inappropriate or offensive content in images or videos.
- Sudden changes in movement or unexpected patterns of motion.
AWS Lookout for Vision

Lookout for Vision is a model specifically designed for anomaly detection in industrial processes, such as manufacturing and production lines. It typically requires some custom code pre-processing and postprocessing of a picture or some concrete cut-out of the picture, usually done using a Python programming language. Most of the time, it specializes in some very special problems in the picture.
It requires custom training on a dataset of normal and anomalous images to create a custom model for anomaly detection. It’s not so real-time focused; rather, it is designed for batch processing of images, focusing on accuracy and precision.
Here are some typical use cases where Lookout for Vision is a good choice if you need to detect:
- Defects in manufactured products or identifying equipment failures in a production line.
- A large dataset of images or other data.
- Real-time anomaly in an industrial process.
- Anomaly integrated with other AWS services, such as Amazon S3 or AWS IoT.
And here are some concrete examples of anomalies that you can detect using Lookout for Vision:
- Defects in manufactured products, such as scratches, dents, or other imperfections, may affect the quality of the product.
- Equipment failures in a production line, such as detecting broken or malfunctioning machinery that, may cause delays or safety hazards.
- Quality control issues in a production line include detecting products that do not meet the required specifications or tolerances.
- Safety hazards in a production line include detecting objects or materials that may pose a risk to workers or equipment.
- Anomalies in a production process, such as detecting unexpected changes in the flow of materials or products through the production line.
AWS Sagemaker

Sagemaker is a fully-managed platform for building, training, and deploying custom machine-learning models.
It’s a much more robust solution. In fact, it provides a way to connect and execute several multistep processes into one chain of jobs following one after another, much like the AWS Step Functions can do.
But since Sagemaker uses ad-hoc EC2 instances for its processing, there is no limit of 15 minutes for single job processing, as in the case of AWS lambda functions in the AWS Step Functions.
You can also do automatic model tuning with Sagemaker, which is definitely a feature that makes it a stand-out option. Finally, Sagemaker can effortlessly deploy the model into a production environment.
Here are some typical use cases where SageMaker is a good choice for anomaly detection:
- A specific use case not covered by pre-built models or APIs, and if you need to build a custom tailor-fit model to your specific needs.
- If you have a large dataset of images or other data. Pre-built models require some pre-processing in such cases, but Sagemaker can do it without it.
- If you need to perform real-time anomaly detection.
- If you need to integrate your model with other AWS services, such as Amazon S3, Amazon Kinesis, or AWS Lambda.
And here are some typical anomalies detections that Sagemaker is capable of performing:
- Fraud detection in financial transactions, for example, unusual spending patterns or transactions outside of the normal range.
- Cybersecurity in network traffic, like unusual patterns of data transfer or unexpected connections to external servers.
- Medical diagnosis in medical images, such as detecting tumors.
- Anomalies in equipment performance, such as detecting changes in vibration or temperature.
- Quality control in manufacturing processes, such as detecting defects in products or identifying deviations from the expected quality standards.
- Unusual patterns of energy usage.
How to Incorporate the Models Into Serverless Architecture

A trained machine-learning model is a cloud service that does not use any cluster servers in the background; thus, it can be easily included in an existing serverless architecture.
Automation is done via AWS lambda functions, connected into a job of multiple steps inside an AWS Step Functions service.
Typically, you need initial detection just after collecting the pictures and their pre-processing on the S3 bucket. That’s where you will generate atomic anomaly detection on the input pictures and save the results into a data lake, for example, represented by the Athena database.
In some cases, this initial detection is not enough for your concrete use case. You might need another, more detailed detection. For example, the initial (e.g., Recognition) model can detect some problem on the device, but it’s not possible to reliably identify what kind of problem that is.
For that, you might need another model with different capabilities. In such a case, you can run the other model (e.g., Lookout for Vision) on the subset of pictures where the initial model identified the problem.
This is also a good way to save some costs, as you don’t need to run the second model on a whole set of pictures. Instead, you run it only on the meaningful subset.
AWS Lambda functions will cover all such processing using Python or Javascript code inside. It’s only up to the nature of the processes and how many AWS lambda functions you will need to include inside a flow. The 15-minute limit for the maximal duration of an AWS lambda call will determine how many steps such a process needs to contain.
Final Words
Working with cloud machine learning models is a very interesting job. If you look at it from the perspective of skills and technologies, you will find out you need to have a team with a large variety of skills.
The team needs to understand how to train a model, be it pre-built or created from scratch. This means a lot of mathematics or algebra is involved in balancing the reliability and performance of the results.
You also need some advanced Python or Javascript coding skills, database, and SQL skills. And after all the content work is done, you need DevOps skills to plug it into a pipeline that will make it an automated job ready for deployment and execution.
Defining the anomaly and training the model is one thing. But it’s a challenge to integrate it all into one functional team that can process the results of the models and save the data in an effective and automated way for serving them to the end users.
Next, check out all about facial recognition for businesses.