AWS enhances scanners with 'Textract' cloud OCR

Amazon Web Services is angling to give document management specialists a run for their money with a new optical character recognition (OCR) solution.

It’s called Amazon Textract, a service that automatically extracts text and data from scanned documents. As is to be expected, AWS reckons it’s much more than a simple OCR solution.

OCR software sometimes requires users to customise rules or workflows for each document or form they scan to adjust for different document types and formats.

AWS has whipped out its machine learning prowess to overcome those manual procceses by using ML to instantly read documents and extract text and data without any manual effort or coding knowledge. The company even reckons it can automatically detect the layout and key elements on the page like tables and understands the relationship of that data within the document.

Users can also create smart search indexes, build automated approval workflows and maintain compliance with document archival standards by flagging data that requires redaction.

Of course, AWS also touted its “very low cost” and speed, being able to process “millions of document pages in hours.” There’s no public pricing for Textract yet, but AWS said customers will pay by what they use with no upfront commitment or long-term contract.

Textract went live in a handful of AWS regions in the US and Europe late in May 2019. There's no word on when it will debut in regions with less latency, but it's not as if that's ever stopped disruptors from putting the cloud to work.

CRN therefore imagines that document-management-centric partners will do well to consider how AWS' latest offering might impact their businesses, and the vendor partners that make them possible.