The Web is full of malicious pages. Unfortunately, these can exist on your client/vendor sites as well.
No business today is without some integration that feeds off, or provides inputs to, a client’s or vendor’s website. Of course, your business won’t exist without these services, but sometimes it is a threat because of these services. The external sites you interact with can have malicious content on them (whether installed on purpose or compromised by a third party), and if that content finds its way to the predetermined place, the consequences can be disastrous.
Thankfully, it’s possible to scan URLs quickly and easily through APIs. You can scan not just web pages, but also files that are provided to you for download. Let’s look at some of the API tools that help you do this. And oh, since these are APIs, your developer’s efforts will be served much better if you ask them to build a website scanner tool using these APIs.
You can trust Geekflare
At Geekflare, trust and transparency are paramount. Our team of experts, with over 185 years of combined experience in business and technology, tests and reviews software, ensuring our ratings and awards are unbiased and reliable. Learn how we test.
Google Web Risk
It’s no surprise that a web page checker would come from the company that practically owns the Internet (all its web pages, I mean). Google Web Risk is pretty straightforward.
Using the API is super-easy as well. To check a single page using the command line, simply send a request as follows:
curl -H "Content-Type: application/json" "https://webrisk.googleapis.com/v1beta1/uris:search?key=YOUR_API_KEY&threatTypes=MALWARE&uri=http%3A%2F%2Ftestsafebrowsing.appspot.com%2Fs%2Fmalware.html"
If the request was successful, the API answers back with the type of vulnerability on the page:
{
"threat": {
"threatTypes": [
"MALWARE"
],
"expireTime": "2019-07-17T15:01:23.045123456Z"
}
}
As you can see, the API confirms that the page is known to contain malware.
Do note that the Google Web Risk API doesn’t perform on-demand diagnostics on a URL or file of your choice. It consults a blacklist maintained by Google based on the search findings and reports and reports whether the URL is in that blacklist or not. In other words, if this API says a URL is safe, it’s safe to assume that it’s pretty safe, but there are no guarantees.
VirusTotal
VirusTotal is another cool service that you can use to scan not just URLs, but also individual files (in that sense, I put it above Google Web Risk in terms of usefulness). If you’re itching to try out the service, just head over to the website, and right on the home page, there’s an option to get going.
While VirusTotal is available as a free platform built and curated by a vibrant community, it does offer a commercial version of its API. Here’s why you’d want to pay for the premium service:
- Flexible request rate and daily quota (as opposed to the mere four requests per minute for the public API)
- The submitted resource gets scanned by VirusTotal by their antivirus, and additional diagnostic information is returned.
- Behaviour-based information about files you submit (the files will be placed in different sandboxed environments to monitor suspicious activities)
- Query the VirusTotal files database for various parameters (complex queries are supported)
- Strict SLA and response times (files submitted to VirusTotal via the public API get queued up and take a considerable time for analysis)
If you go for the private VirusTotal API, it can be one of the best investments you ever made in a SaaS product for your enterprise.
Scanii
Another recommendation for security-scanner APIs is Scanii. It’s a simple REST API that can scan submitted documents/files for the presence of threats. Think of it as an on-demand virus scanner that can be run and scaled effortlessly!
Here are the goodies Scanii offers:
- Able to detect malware, phishing scripts, spam content, NSFW (Not Safe For Work) content, etc.
- It is built on Amazon S3 for easy scaling and zero-risk file storage.
- Detect offensive, unsafe, or potentially dangerous text in over 23 languages.
- A simple, no-frills, focused approach to API-based file scanning (in other words, no unnecessarily “helpful” features)
The real good thing is that Scanii is a meta engine; that is, it doesn’t perform scans on its own but uses a set of underlying engines do to the legwork. It’s a great asset as you don’t have to be tied to a particular security engine, which means no need to worry about broken API changes and whatnot.
I see Scanii as a massive boon to platforms that depend on user-generated content. Another use case is that of scanning files generated by a vendor service that you cannot trust 100%.
Metadefender
For some organizations, scanning files and web pages at a single endpoint is not enough. They have a complex information flow, and none of the endpoints can be compromised. For those use cases, Metadefender is the ideal solution.
Think of Metadefender as a paranoid gatekeeper that sits between your core data assets and everything else, including the network. I say “paranoid” because that’s the design philosophy behind Metadefender. I can’t describe this better than them, so here goes:
Most cyber security solutions rely upon detection as their core protective function. MetaDefender data sanitization does not rely on detection. It assumes all files could be infected and rebuilds their content using a secure and efficient process. It supports more than 30 file types, and outputs safe and usable files. Data sanitization is extremely effective in preventing targeted attacks, ransomware, and other types of known and unknown malware threats.
There are some neat features that Metadefender offers:
- Data Loss Prevention: In simple terms, this is the ability to override and safeguard sensitive information detected inside file contents. For example, a PDF receipt with the credit card number visible will be obfuscated by Metadefender.
- Deploy locally or in the cloud (depending on how paranoid you are!).
- Look right through 30+ types of archiving formats (zip, tar, rar, etc.), and 4,500 file type spoofing tricks.
- Multi-channel deployments — secure just files, or go ham with email, network and login control.
- Custom workflows to apply different types of scanning pipelines based on custom rules.
Metadefender includes 30+ engines but abstracts them away nicely, so you never have to think about them. If you’re a medium- to a large-sized enterprise that just can’t afford security nightmares, Metadefender is a great option.
Urlscan.io
If you’re mostly dealing with web pages and have always wanted a more in-depth look into what they’re doing behind the scenes, Urlscan.io is an excellent weapon in your arsenal.
The amount of information Urlscan.io dumps out is nothing short of impressive. Among other things, you get to see:
- A total number of IP addresses contacted by the page.
- List of geographies and domains the page sent information to.
- Technologies used on the front-end and backend of the site (no accuracy claims are made, but it’s alarmingly accurate!).
- Domain and SSL certificate information
- Detailed HTTP interactions along with request payload, server names, response times, and much more.
- Hidden redirects and failed requests
- Outgoing links
- JavaScript analysis (global variables used in the scripts, etc.)
- DOM tree analysis, forms content, and more.
The API is simple and straightforward, allowing you to submit a URL for scanning, as well as checking the scan history of that URL (scans performed by others, that is). All in all, Urlscan.io provides a wealth of information for any concerned business or individual.
SUCURI
SUCURI is a well-known platform when it comes to online scanning of websites for threats and malware. What you may not know is that they have a REST API as well, allowing the same power to be harnessed programmatically.
There isn’t much to talk about here, except that the API is simple and works well. Of course, Sucuri isn’t limited to a scanning API, so while you’re at it, I’d recommend you check out some of its powerful features like server-side scanning (basically, you provide the FTP credentials, and it logs in and scans all the files for threats!).
Quttera
Our last entry in this list is Quttera, which offers something slightly different. Rather than scanning the domain and submitting pages on demand, Quttera can also perform continuous monitoring, helping you avoid zero-day vulnerabilities.
The REST API is simple and powerful and can return a few more formats than JSON (XML and YAML, for example). Full multithreading and concurrency are supported in scans, allowing you to run multiple exhaustive scans in parallel. Since the service runs in real-time, it’s invaluable to companies that are into mission-critical offerings where downtime means demise.
Can we not scan websites for malicious pages manually?
It might seem that a competent developer should be able to scan pages for vulnerabilities. Unfortunately, this is not even close to reality for many reasons:
- Developers do not specialize in detection/security. Their expertise is in building complex software by putting together many smaller sub-systems; in other words, they simply do not have the skillset.
- Even if you were to come across a developer talented enough, the task would simply be too much. A typical, feature-rich web page contains thousands of lines of code — stitching them all together for working out the bigger picture as well as the tiny loopholes is nothing short of a nightmare. You might as well command someone to eat an entire elephant for lunch!
- To reduce page load times, websites often compress and minify their CSS and JavaScript files. This results in such a soupy mess of code that it’s perfectly impossible to read.
If this still looks readable, it’s because the good souls there decided to preserve the variable names to a large context. Try the source code for jQuery, which someone can host on their website and tamper with (two lines somewhere down this mess):
Not to mention, that the source is close to 5,000 lines of code. 😎
This is only a single script we’re talking of. A web page typically has 5-15 scripts attached, and it’s likely that you’re working with 10-20 web pages in total. Imagine having to do this every day . . . Or worse, a few times a day!
Conclusion
Security scan API like the ones covered in this article are simply an extra line of defense (or caution, if you will). Just like an antivirus, there’s a lot these can do, but there’s no way these can provide a fail-proof scan method. That’s simply because a program written with malicious intent is the same for the computer as one written for positive impact — they both ask for system resources and make network requests; the devil lies in the context, which isn’t for computers to work out successfully.
That said, these APIs do provide a robust security cover that is desirable in most cases — both for external websites and your own!