Simple text extraction for PDF documents
We are going to walk through the simple steps to extract some data from an Invoice file we have been sent.
In this case we will use a manual template designed in KeyParse.AI.
Log in
I’ll assume you can sign in to the KeyParse.AI Application if not you can sign up for a free 30 day trial
Upload File

We need to get the file into the system so from the main page of the application we click the Uploads Link under the Personal Menu.
We drag a file onto the upload zone (where it says Drop files here to upload) and click Upload.
Once uploaded and processed you will see the Document appear in the file list below the Upload section.
Click the document name to open it.
Mark out the fields to extract
Now we can see the contents of the file that we would like to get data from.

To mark out the fields we can rely on a feature in KeyParse.AI that allows us to select the blocks of text we are interested in and also link the label and value fields together.
Here we have clicked the Invoice Number block and are hovering over the related value INV-3340. We click the value and that gives us two highlighted blocks.

Now click Convert to Field on the toolbar – followed by Link Fields

Now these two fields are linked together and when parsing documents we will look for the field with the label Invoice Number and then for the data field on its right. This is fairly robust if the field moves down or up across different documents. It will adapt if the field is below it as well.
We now continue to markup and link the remaining key-value pairs we care about.

Now we have the fields we want to parse created we click the Save button to save this template to the document.
Then click Export on the toolbar

This takes us to the Export Data page, if this page detects a in-line template on the document it will select Document Template as the Extract Template.
I could choose formats like CSV, XML or Excel but in this case I’ll select JSON and click Extract.

Now we can see the extracted data which can be copied to the clipboard via the Copy to ClipBoard link or downloaded via the Download button.

Manual templates are fast and simple to create.
You can reuse inline document templates by saving them as reusable templates from the Upload file list.

And of course you can drive this process via Power Automate which we will look at in the next post.