Using ML, OCR, and RPA to Automate the Processing of Financial Reports

Finance

Google Cloud

Python

Tensorflow

AI / ML

Description

Brief results of the collaboration:

A provider of investment management services turned to Altoros to automate manual aggregation of financial reports.
The company cut time spent on analyzing each document from 12 minutes to 10 seconds.
Achieving 99% of precision, the delivered solution enabled the customer to optimize its analyst team by focusing it on more important business tasks.

The customer

The company is involved in investment management, helping organizations to allocate their financial assets to gain value. Headquartered in Boston, the customer has affiliates in London, Singapore, Tokyo, and Sydney. Operating globally, the company serves customers across 25 countries in Europe, Asia, the Middle East, North America, and Australia.

The need

To find an optimal investment opportunity, the company was manually analyzing publicly available financial reports. Turning to Altoros, the customer wanted to automate the process of recognizing and extracting explicit tables of contents (ToCs) from reports in a PDF format.

The challenge

Under the project, the team at Altoros had to address the following issues:

The entries in tables of contents greatly varied from company to company, so engineers at Altoros needed to achieve unification for better recognition of the contents.
In many cases, it was impossible to extract text from a PDF file directly. So, developers at Altoros needed to rely on object recognition (OCR), treating a PDF as an image, and parse text from it.

The solution

At the preprocessing stage, our engineers parsed PDF files into symbols to recover text in a human-readable format, as well as extract such geometrical and formatting features of text lines as fonts, coordinates, etc. Using a classifier trained with scikit-learn, TensorFlow, and XGBoost, experts at Altoros were able to extract pages containing tables of contents.

Our team also built another classifier to extract ToCs from files’ metadata, which was present in 10% of the documents.

In order to detect a table of contents in a file, developers at Altoros trained a classifier with a subset of document bounding boxes, which label the areas containing tables of contents. While parsing, our team employed different features based on the styles of ToCs. For each text line, there were calculated and stored all the potentially relevant features.

Then, engineers at Altoros identified the exact page to which a ToC entry referred to. The extracted table of contents had a page number sequence, and the algorithms created by our experts detected the difference between a ToC page number and its actual page number in the PDF file.

Finally, developers at Altoros implemented a searchable database to easily access and search through the information contained in PDF reports.

The outcome

Partnering with Altoros, the customer automated manual processing of financial reports, cutting time spent on each document analysis from 12 minutes to 10 seconds. Achieving 99% of precision, the delivered solution enabled the customer to optimize its analyst team by focusing it on more important business tasks.

Technology stack

Programming language

Python

Technologies

TensorFlow, scikit-learn, XGBoost,Google BigQuery, Google Dataproc,tesseract, pdfminer

Database

Google Cloud Storage

Contact us

Jan-Terje Nordlien

Daglig leder

jan-terje@altoros.no +47 21 92 93 00

Altoros Norge AS
Org.nr.: 894 684 992
Tordenskiolds gate 2,
0160 Oslo

Using ML, OCR, and RPA to Automate the Processing of Financial Reports

Description

The customer

The need

The challenge

The solution

The outcome

Technology stack

You May Also Like

Automation of In-field Job Planning and Performance Optimization

Call Recording, Analytics, and Workforce Optimization Solution

Highly Scalable System for DNA Analysis

A Highly Secure Smart Home System Wins a Kickstarter Funding

The Image Recognition System

Integrated logistics solutions to the offshore industry

LikeFolio: Best Practices of Cloud and Ruby Development for Application Optimization

Software for Selecting and Mixing Paint

Software Suite for Mobile Technicians and Field Service Management

The System for Emergency Control Centers

The Cloud-based Document Exchange System

The Marketing Information Messaging System

The NuoDB Migrator for Moving SQL Data to a NoSQL Database

Toyota Automates Its System for Holding Tenders

Warehouse Workload Monitoring Application

Web-Based Personal Styling

Web-Based System for Retailers

A Blockchain-Based Platform for Automating Bond Issuing Worth $10M

Contact us

Description

The customer

The need

The challenge

The solution

The outcome

Technology stack

You May Also Like

Automation of In-field Job Planning and Performance Optimization

Call Recording, Analytics, and Workforce Optimization Solution

Highly Scalable System for DNA Analysis

A Highly Secure Smart Home System Wins a Kickstarter Funding

The Image Recognition System

Integrated logistics solutions to the offshore industry

LikeFolio: Best Practices of Cloud and Ruby Development for Application Optimization

Software for Selecting and Mixing Paint

Software Suite for Mobile Technicians and Field Service Management

The System for Emergency Control Centers

The Cloud-based Document Exchange System

The Marketing Information Messaging System

The NuoDB Migrator for Moving SQL Data to a NoSQL Database

Toyota Automates Its System for Holding Tenders

Warehouse Workload Monitoring Application

Web-Based Personal Styling

Web-Based System for Retailers

A Blockchain-Based Platform for Automating Bond Issuing Worth $10M

Contact us

Altoros Personvernregler

1. Innledning

2. Informasjon vi samler inn om brukere av nettstedet vårt

2.1. Vi samler inn følgende datakategorier:

2.2. Hvordan behandler vi innsamlede data?

2.2.1. Analysepartnere

2.2.2. Annonsepartnere

2.2.3. Andre widgets og skript levert av tredjepartspartnere

2.3. Formål og lovhjemmel for databehandling

2.4. Dataoppbevaringsperiode

2.5. Data mottakere

3. Data vi samler inn fra våre nettskjemaer

3.1. Vi samler inn følgende datakategorier

3.2. Hvordan behandler vi innsamlede data?

3.3. Formål og lovhjemmel for databehandling

3.4. Dataoppbevaringsperiode

3.5. Data mottakere

4. Data vi samler inn gjennom e-poster, messengere, widgets og telefoner

4.1. Vi samler inn følgende datakategorier

4.2. Hvordan behandler vi innsamlede data?

4.3. Formål og lovhjemmel for databehandling

4.4. Dataoppbevaringsperiode

4.5. Data mottakere

5. Data vi samler inn om du er vår kunde

5.1. Vi samler inn følgende datakategorier

5.2. Hvordan behandler vi innsamlede data?

5.3. Formål og lovhjemmel for databehandling

5.4. Dataoppbevaringsperiode

5.5. Data mottakere

6. Data vi samler inn fra deltakere på våre arrangementer

6.1. Vi samler inn følgende datakategorier

6.2. Hvordan behandler vi innsamlede data?

6.3. Formål og lovhjemmel for databehandling

6.4. Dataoppbevaringsperiode

6.5. Data mottakere

7. Generell databehandling og dataoppbevaring

8. Dine rettigheter

9. Datasikkerhet og beskyttelse

10. Dataoverføring utenfor EØS

11. Generell beskrivelse