Diy Platform App Store Index Entity Extractor
Document Extraction Model Builder
Document Extraction is a method of extracting fields/entities from documents Example Company name, Person Name, Dates, Location, Contact details, Prices etc from PDFs etc.
Using this application you can build a custom Document Extraction model trained on your own data and use the model to extract information from different types documents.
How to build your custom Document Extraction application
Training Data Preparation
Upload your documents and create a dataset under Datasets. The application supports below document types:
- Image: jpeg, png, tif
Note: Contact sales@datoin.com for other document types
If your documents are already searchable-PDFs then you can skip the Preprocessing step.
1. Preprocessing
The documents need to be converted to searchable-PDFs before using them in the model builder application.
- Go to the app store, build a PDF Converter app by searching for Searchable PDF Converter
- You will be redirected to the app settings page where you can configure and run the app
- Under Quick Settings,
3.1. Select dataset containing your documents and click Save
3.2. Provide a name to the converted dataset - Now click on Run Now to start the app. The app may run for 5 to 20 minutes (dependending on the size of input data) and then the converted searchable-PDF dataset will be created.
For more details on this preprocessing step, visit here
2. Document Annotation
The entities present in the searchable-PDF documents needs to be labeled manually to create the training data. Datoin's Annotation Tool will help you prepare your training data easily.
- Click on the searchable-PDF dataset
- Click on annotate button on a document, this will open your document in Datoin's Annotation Tool
- Annotate the entities that you want the NER model to extract using the labels on the right-panel
- Click Save
- Follow the above steps and annotate all your documents
Now you have prepared your training data for training an NER model! Now let us build the NER model.
Build and use Named Entity Recognition app
- Go to the app store, create a Named Entity Recognition Builder app by searching for Named Entity Recognition (NER) Model Builder
- You will be redirected to the app settings page where you can configure and run the app
- Under Quick Settings,
3.1. Select the training dataset containing the annotated documents and click Save
3.2. Provide a name to the NER model - Now click on Run Now to start the app. The app may run for 10 to 30 minutes (dependending on the size of input data) and then the trained model will be saved.
- Use this model to predict the classes of new data
5.1. Click on Build App on the model to build the inference app using that model
5.2. Provide an input document and click on Get Results. The extracted entities from the document will be displayed on the right side of the screen.
Results
Extract the entities from the given input document will be produced as results.
Conclusion
Go ahead and build your first Named Entity Recognition Model (NER) Builder