Simple data extraction from PDF documentsBlog

We will show how simple it is to extract key data from a PDF file uploaded to KeyParse.AI

post-thumb

BY Colin Byrne / ON Jul 18, 2024

Simple text extraction for PDF documents

We are going to walk through the simple steps to extract some data from an Invoice file we have been sent.

In this case we will use a manual template designed in KeyParse.AI.

Log in

I’ll assume you can sign in to the KeyParse.AI Application if not you can sign up for a free 30 day trial

Upload File

KeyParse.AI Upload file into Directory

We need to get the file into the system so from the main page of the application we click the Uploads Link under the Personal Menu.

We drag a file onto the upload zone (where it says Drop files here to upload) and click Upload.

Once uploaded and processed you will see the Document appear in the file list below the Upload section.

Click the document name to open it.

Mark out the fields to extract

Now we can see the contents of the file that we would like to get data from.

KeyParse.AI Uploaded file contents

To mark out the fields we can rely on a feature in KeyParse.AI that allows us to select the blocks of text we are interested in and also link the label and value fields together.

Here we have clicked the Invoice Number block and are hovering over the related value INV-3340. We click the value and that gives us two highlighted blocks.

Two highlighted blocks

Now click Convert to Field on the toolbar – followed by Link Fields

newly linked fields

Now these two fields are linked together and when parsing documents we will look for the field with the label Invoice Number and then for the data field on its right. This is fairly robust if the field moves down or up across different documents. It will adapt if the field is below it as well.

We now continue to markup and link the remaining key-value pairs we care about.

newly linked fields

Now we have the fields we want to parse created we click the Save button to save this template to the document.

Then click Export on the toolbar

Export Button

This takes us to the Export Data page, if this page detects a in-line template on the document it will select Document Template as the Extract Template.

I could choose formats like CSV, XML or Excel but in this case I’ll select JSON and click Extract.

Select Json and Extract

Now we can see the extracted data which can be copied to the clipboard via the Copy to ClipBoard link or downloaded via the Download button.

Key information in Json format

Manual templates are fast and simple to create.

You can reuse inline document templates by saving them as reusable templates from the Upload file list.

Save inline template as reusable template

And of course you can drive this process via Power Automate which we will look at in the next post.

Share:
We use cookies and similar technologies to provide certain features, enhance the user experience and deliver content that is relevant to your interests. Depending on their purpose, analysis and marketing cookies may be used in addition to technically necessary cookies. By clicking on "Agree and continue", you declare your consent to the use of the aforementioned cookies. Here you can make detailed settings or revoke your consent (in part if necessary) with effect for the future. For further information, please refer to our Privacy Policy .