Just Analytics Blog | Performance Management News, Views and Op-ed

Web Scraping and Custom Entity extraction using NLP and RPA

Written by JUST ANALYTICS TEAM | Mar 29, 2020 8:21:31 AM

Lots of valuable information lies in unstructured data sources like web pages, articles, pdfs, emails and images. Extracting these effectively can aid critical operations in many ways. For the ongoing COVID-19 crisis, we have built a mechanism to extract patient level details from the updates released by Ministry of Health on their website every day and further built a consolidated report of all the patients in a PowerBI dashboard. 

Note that not every update in the image above contains details about new cases. So we used NLP to first select the relevant updates and then parsed them to extract the right entities and link them. Here is the architecture we followed -

On top of this we have built a simple UI for stakeholders to search COVID-19 case by case details.

To visualize this information extracted, we have used PowerBI to build a simple dashboard to view patients by age, gender, nationality etc.

 

To understand this use case further, here is a quick demo

 

If this is something of interest to you, do get in touch with us today for a 1-1 free discovery workshop!