A Beginner’s Introduction to NER (Specific Entity Recognition) – News Couple
ANALYTICS

A Beginner’s Introduction to NER (Specific Entity Recognition)


This article was published as part of the Data Science Blogathon.

summary

This article will give you a brief idea about named entity recognition, which is a common method used to identify entities in a text document. This article is aimed at beginners in the field of NLP. By the end of the article, pre-trained NER models were implemented to demonstrate a practical use case. Happy learning!

Why NER?

picture 1

By observing the image above, you may have some ideas about what the NER model does. The form can find different entities present in the text such as people, dates, organizations, and locations. Thus NER helps in adding more meaning to the text document. In simple words, you can say that it is extracting information. There are many use cases for named entity recognition, some of which are:

1. Customer Support

Customer Support

picture 2

Every company has customer support systems available. Every day, they have to deal with a large number of customer requests which may range from installation, maintenance, complaint and troubleshooting of a particular product. NER helps in identifying and understanding the type of order placed by the customer. Moreover, this helps the company to build an automated system that identifies the incoming request using NER and sends it to the respective support desk.

2. Filter CVs to find suitable candidates for a job role

Filter CVs to find suitable candidates for a job role

Picture 3: https://www.rchilli.com/blog/a-complete-guide-to-resume-screening

Do you think that all resumes submitted while applying for a particular job are read by the recruitment team? Well, the truth is that only 25 percent of resumes are read by people. The rest is filtered by an automated system. If you have previously attended a resume-building workshop, the mentor may have emphasized keeping essential skills as a separate section of the resume. Also, we may advise you to add only the essential skills related to the required job. This is because the NER model in the automated system may have been specifically trained to identify specific skill sets as entities. If a particular resume has the required number of existing entities, it qualifies for the next stage. So, if you are not aware of this process, try to customize your resume accordingly while applying for the job in question.

3. Entity identification from electronic health care data

    Entity recognition of electronic health care data

Picture 4: https://www.persistent.com/blogs/building- calling-entity-recognition-models-for-healthcare/

NER models can be used to build robust medical systems that are able to correctly identify symptoms in patients’ electronic health care data and diagnose their disease based on symptoms. If you look at the image above, you can understand how perfectly the NER model can identify symptoms, diseases, and chemicals that were present in a given person’s health care data.

Those were some of the applications where NER has been used in real world scenarios. Next, we will check out different types of NER systems.

Various NER systems

1. Dictionary-based systems

This is the simplest NER approach. Here we will have a dictionary containing a set of vocabulary. In this approach, basic string matching algorithms are used to check if an entity occurs in the given text of the items in the vocabulary. This method has limitations as it is required to update and maintain the dictionary used by the system.

2. Rule-based systems

Here, the model uses a predefined set of rules to extract the information. Two main types of grammar are used, pattern-based grammar, which depends on the morphological pattern of the words used, and context-based grammar, which depends on the context of the word used in the given text document. A simple example of a context-based rule is “If a person’s address is followed by a proper name, then this appropriate noun is the person’s name”.

3. Systems based on machine learning

ML-based systems use statistics-based models to reveal entity names. These models attempt to provide a feature-based representation of the observed data. With this approach, many of the limitations of dictionary and rule-based methods are resolved by recognizing the name of the current entity even with small spelling differences.

There are two main phases as we use an ML-based solution for NER. The first stage involves training the ML model on annotated documents. The time it takes to train a model will vary depending on the complexity of the model we are building. In the next stage, the trained model can be used to illustrate the primary documents.

Two phases of NER

Image 5: https://www.researchgate.net/publication/233912242_Biomedical_Named_Entity_Recognition_A_Survey_of_Machine-Learning_Tools

4. Deep Learning Approaches

NER . deep learning models

Picture 6: https://arxiv.org/pdf/1812.09449.pdf

In recent years, models based on deep learning are being used to build modern NER systems. There are many advantages to using DL techniques over the methods previously discussed. With the DL approach, the input data is mapped to a non-linear representation. This approach helps in learning the complex relationships present in the input data. Another advantage is that we can avoid a lot of time and resources being spent on engineering the features required by other traditional methods.

Next, we are going to try some tools to extract NER.

standford yoke tags

It is one of the standard tools used to identify named entities. There are basically three types of forms for identifying named entities. They are:

1. Three typical categories recognize organizations, people, and locations.

2. A four-category model that identifies various people, organizations, locations, and entities.

3. A seven-category model that identifies people, organizations, locations, money, time, percentages, and dates.

We’ll use a four-category model.

1. Download the StanfordNER zip file using the following commands

!pip3 install nltk==3.2.4
!wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
!unzip stanford-ner-2015-04-20.zip

2. Download the form

from nltk.tag.stanford import StanfordNERTagger
jar = "stanford-ner-2015-04-20/stanford-ner-3.5.2.jar"
model = "stanford-ner-2015-04-20/classifiers/"
st_4class = StanfordNERTagger(model + "english.conll.4class.distsim.crf.ser.gz", jar, encoding='utf8')

3. While testing the form, I had taken a news snippet from the Indian Express

example_document=""'Deepak Jasani, Head of retail research, HDFC Securities, said: “Investors will look to the European Central Bank later Thursday for reassurance that surging prices are just transitory, and not about to spiral out of control. In addition to the ECB policy meeting, investors are awaiting a report later Thursday on US economic growth, which is likely to show a cooling recovery, as well as weekly jobs data.”.'''

4. Provide the news article for the form

st_4class.tag(example_document.split())
Example document for NER

Spacey Pipelines for NER

Spacey basically contains three English language pipelines that are optimized for the CPU to recognize the specific entity. they are

a) en_core_web_sm

b) en_core_web_md

c) en_core_web_lg

The above models are listed in ascending order according to their size with SM, MD, and LG denoting small, medium, and large models, respectively. Let’s try NER using the applet model.

1. First, let’s download the template

import spacy
 import spacy.cli 
spacy.cli.download("en_core_web_sm")

2. Download the form

sp_sm = spacy.load('en_core_web_sm')

3. Create a function to output the entities recognized by the model.

def spacy_large_ner(document):
  return (ent.text.strip(), ent.label_) for ent in sp_lg(document).ents
spacy_large_ner(example_document)
Recognized Entities

Here GPE stands for Geopolitical Entity.

conclusion

In short, the article covered the basics of recognizing named entities and their use cases. You can also try the pre-trained model above with different examples. Furthermore, as an educational next step, you can try to create custom NER models for the purposes of your specific field. Several modern, pre-trained NER models are available, and you can adjust them to suit your use. Spacey is a good library for building and using custom NER models. Please check the spacy documentation you provided in the reference. Remember, too, that NER is not limited to English. BERT-based multilingual models can be used for NER tasks in different languages.

references

1. Survey on Deep Learning for
Recognition of the entity identified by Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li.

2. Identification of Named Biomedical Entities: A Survey of Machine Learning Tools by David Campos, Sergio Matos, and Jose Luis Oliveira

3. https://colab.research.google.com/github/mohammedterry/NLP_for_ML/blob/master/NER.ipynb#scrollTo=ghnQyFifqqeX

4. Spacey documentation – https://spacy.io/usage/spacy-101.

Cover image source: https://twitter.com/huggingface/status/1230870653194121216

About the author

Hi guys, I’m Adwait Dathan. Currently, I’m doing my own MTech in Artificial Intelligence and Data Science. Please let me know your suggestions and feedback in the comments section. Happy learning!!

Image Sources

  1. Picture 1: https://nlpcloud.io/nlp-named-entity-recognition-ner-api.html
  2. Photo 2: https://www.thirdrocktechkno.com/blog/customer-support-as-a-backbone-of-company-infrastructure/





Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button