<div> <ul class="breadcrumb"> <li><a href="/">Home</a></li> <li><a href="/diy-platform">DIY Platform </a></li> <li><a href="/diy-platform/app-store-index/">App Store </a></li> <li>Named Entity Recognition (NER) Model Builder </li> </ul> </div> # Document Extraction Model Builder Document Extraction is a method of extracting fields/entities from documents Example Company name, Person Name, Dates, Location, Contact details, Prices etc from PDFs etc. Using this application you can build a custom Document Extraction model trained on your own data and use the model to extract information from different types documents. ## How to build your custom Document Extraction application ![Screen Shot 2019 07 26 At 1 01 18 Pm](/uploads/ocr-datoin/screen-shot-2019-07-26-at-1-01-18-pm.png "Screen Shot 2019 07 26 At 1 01 18 Pm") ### **Training Data Preparation** Upload your documents and create a dataset under **Datasets**. The application supports below document types: - **PDF** - **Image**: *jpeg*, *png*, *tif* *Note:* Contact sales@datoin.com for other document types > If your documents are already [searchable-PDFs](https://www.filecenterdms.com/kb-what-is-a-searchable-pdf.html) then you can skip the **Preprocessing** step. ### 1. Preprocessing The documents need to be converted to [searchable-PDFs](https://www.filecenterdms.com/kb-what-is-a-searchable-pdf.html) before using them in the model builder application. 1. Go to the app store, build a PDF Converter app by searching for *Searchable PDF Converter* 2. You will be redirected to the app settings page where you can configure and run the app 3. Under **Quick Settings**, 3.1. Select dataset containing your documents and click **Save** 3.2. Provide a name to the *converted dataset* 4. Now click on Run Now to start the app. The app may run for 5 to 20 minutes (dependending on the size of input data) and then the converted *searchable-PDF* dataset will be created. For more details on this preprocessing step, [visit here](/diy-platform/app-store-index/searchable-pdf-convertor) ### 2. Document Annotation The entities present in the **searchable-PDF** documents needs to be labeled manually to create the training data. **Datoin's Annotation Tool** will help you prepare your training data easily. 1. Click on the *searchable-PDF* dataset 2. Click on *annotate* button on a document, this will open your document in Datoin's Annotation Tool 3. Annotate the entities that you want the NER model to extract using the labels on the right-panel 4. Click **Save** 5. Follow the above steps and annotate all your documents Now you have prepared your training data for training an NER model! Now let us build the NER model. ### Build and use Named Entity Recognition app 1. Go to the app store, create a Named Entity Recognition Builder app by searching for *Named Entity Recognition (NER) Model Builder* 2. You will be redirected to the app settings page where you can configure and run the app 3. Under **Quick Settings**, 3.1. Select the training dataset containing the [annotated documents](#2-document-annotation) and click **Save** 3.2. Provide a name to the NER model 4. Now click on Run Now to start the app. The app may run for 10 to 30 minutes (dependending on the size of input data) and then the trained model will be saved. 5. Use this model to predict the classes of new data 5.1. Click on **Build App** on the model to build the inference app using that model 5.2. Provide an input document and click on **Get Results**. The extracted entities from the document will be displayed on the right side of the screen. ### Results Extract the entities from the given input document will be produced as results. ## Conclusion Go ahead and build your first [Named Entity Recognition Model (NER) Builder](https://app.datoin.com/app-store)