As many other employees, my colleagues and I had to do our expenses manually. Luckily, we have K2 and, therefore, we automated the process. We created an Expense Claim App with the workflow in the background. However, it still takes a lot of time to submit the request. That is why I have been challenged by one of my colleagues to add the OCR functionality. I accepted the challenge and here is the article, which will quickly guide you, how to integrate a K2 Expense Claim App with ABBYY OCR. As a result, you will get a tool, that can scan your receipts and auto-fill the rest of your fields with the recognized values. And before I proceed any further, this is the final result, how all this can work on a mobile device.
The architecture of the whole integration looks pretty much like this:
And now let’s describe in detail all the components.
1. OCR SDK Web API
I spent some time searching the web and doing tests with different OCR services, that provide a similar functionality. Some of them were ‘K2 integration ready’, but returned some bad results. Almost all of them could recognize the image, but parsing and returning the structured set of data was a real challenge for them. Eventually, I decided to proceed with ABBYY OCR SDK, because it demonstrated the best results, recognizing and processing the images, including German receipts. Frankly speaking, I was amazed, how accurate the results are.
The documentation for the OCR SDK is detailed and contains code examples for all major languages and platforms. You can easily create a trial account and try the functionalities of the ABBYY OCR yourself.
I concentrated on the processReceipt method, that I used in my integration. In order to get the results of the OCR, I have to make at least 3 calls:
a) POST /processReceipts
This call will load the image, create a processing task for the image with the specified parameters, and pass the task for processing. It returns you the taskId value, which can be used in the next call to check the status and/or get the result.
b) GET /getTaskStatus
This call returns the current status of the task and the URL of the result of processing for completed tasks. If the task is not completed by the time of the call, you will have to wait for at least 2 seconds and make the call again.
c) GET resultURL
This call returns a detailed response about your receipt. The resultUrl is obtained from the /getTaskStatus method, when the OCR API finished processing the image.
All the methods, that have to be called, look really simple. However, there is a major drawback, which does not allow K2 integration without additional development. All the methods return the data in XML format. Therefore, I created a middle layer with the help of the NodeJS and ExpressJS.
2. Middle Layer: REST API
The REST API works as a simple middle layer in the K2 Expense Claim App with ABBYY OCR. It does not store or cache anything. The core function is to convert data and to make 3 OCR API calls in 1 go. The source code of the Web API can be found on my GitHub – k2-expenses-receipts-ocr. I am not describing in details all pieces of code, that I used. Please, check my github repo, and if something is not clear, contact me directly.
The main points I had to implement in the middle layer, so that everything works smooth, are the following ones:
Convert ‘multipart/form-data’ into the MemoryStorage (i.e. a Buffer object)
When REST ServiceBroker sends a file, the content-type is ‘multipart/form-data’. This is not something ABBYY OCR API can work with. Therefore, I had to convert it and I did this with a Multer middleware for ExpressJs.
Convert XML response to JSON
Handle Authentication in a proper way
By default OCR SDK uses Basic authentication. And this is great, cause we can use Static authentication in our K2 Rest ServiceBroker. However, the first call, that the ServiceBroker makes, is always anonymous. And only after it gets the response 401.Unauthorized, which contains the supported ways of authentication in the headers (e.g. WWW-Authenticate=Basic realm=”401″), the K2 ServiceBroker sends the configured Static credentials in the Authorization header. This behavior was handled correctly in the REST API by checking the Authorization header and, if it is empty or contains something different from Basic one, the appropriate response is sent to K2 REST ServiceBroker.
Combine 3 different calls within 1
This functionality still requires a bit of testing and probably additional tweaks. The NodeJs by default is asynchronous. Therefore, I implemented an asynchronous way to handle 3 consecutive ABBYY OCR API calls:
- send the image to API OCR;
- wait for 2 sec and check status (if not ready, wit for another 2 sec etc.);
- when the task is processed, get the parsed results.
After I created the REST API and tested locally, I deployed it to Heroku, and my REST API automatically went live. Moreover, I was able to configure automatic deployment to Heroku platform, as soon as I commit the changes to a specific GitHub branch.
And the last part – is a K2 Expense Claim Application.
3. K2 Expense Claim Application with OCR
Inside K2 you basically have to do the following:
- Register REST Service Instance
- Generate Expense Claim App
- Modify the app, adding 1 SMO call.
REST Service Instance
In order to register a Service Instance, you can use the Swagger file, which is available inside the same GitHub repo. After registration, you will have a number of ServiceObjects, which you can use to build the methods you need.
Expense Claim App Generation
Very simple and user-friendly wizard, which can help you quickly generate standard applications in K2 Cloud or K2 Five.
Add 1 SMO call
The last thing, that you have to do, is to create some behavior in your form/view to start and handle receipts and image processing. For the demo I created a simple button, clicking which, the image is sent into the SMO and the SMO returns the results.
And this is it. All you need to do is to test your K2 Expense Claim App with ABBYY OCR
I hope this post can be helpful and also demonstrates the cool features of K2. Also, you need to pay attention, that the current approach is also suitable for K2 Cloud, which is really cool and removes the borders between K2 Five and K2 Cloud platforms.