Q1.What do you mean by NLP?
Natural Language Processing is a programmed way to understand or consider the natural languages and remove necessary information from such data by applying machine learning Algorithms.
Q2.Tell me the significance of TF-IDF?
TFIDF refers to term frequency opposite document occurrence. In information retrieval, TFIDF is an arithmetical statistic that is planned to reproduce how significant a word is to a text in a compilation.
Q3.What do you mean by tokenization in NLP?
Natural Language Processing aims to plan computers to route large amounts of natural language data. Tokenization in NLP means the technique of dividing the text into a variety of tokens. You can think of a coupon in the shape of the word. Just like a word forms into a sentence.
Q4.Define Pragmatic Analysis?
The pragmatic analysis is a significant task in NLP for interpreting knowledge that is laying exterior a given document. The plan of implementing pragmatic analysis is to spotlight on exploring a diverse aspect of the document or text in a language. The pragmatic analysis permits software applications for the serious interpretation of the real-world data to know the definite meaning of sentences and words.
Q5.Tell me the steps involved in solving an NLP Problem?
The following steps involved are:
- Gather the text from the obtainable dataset or by web scraping
- Apply stemming and lemmatization for text crackdown
- Apply characteristic engineering techniques
- Embed using word2vec
- Train the built model using neural networks
- Assess the model’s performance
- Make suitable changes in the model.
- Deploy the replica
Q6. What Is Nlp?
Natural Language Processing or NLP is an automated way to understand or analyze the natural languages and extract required information from such data by applying machine learning Algorithms.
Q7.List Some Components Of Nlp?
Below are the few major components of NLP.
- Entity extraction:
- It involves segmenting a sentence to identify and extract entities, such as a person (real or fictional), organization, geographies, events, etc.
- Syntactic analysis:
- It refers to the proper ordering of words.
- Pragmatic analysis:
- Pragmatic Analysis is part of the process of extracting information from text.
Q8.. List Some Areas Of Nlp?
- Natural Language Processing can be used for
- Semantic Analysis
- Automatic summarization
- Text classification
- Question Answering
Q9.What does a NLP pipeline consist of?
- • Text gathering(web scraping or available datasets)
- • Text cleaning(stemming, lemmatization)
- • Feature generation(Bag of words)
- • Embedding and sentence representation(word2vec)
- • Training the model by leveraging neural nets or regression techniques
- • Model evaluation
- • Making adjustments to the model
- • Deployment of the model.
Q10. What is Parsing in the context of NLP?
Parsing a document means to working out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from handparsed sentences to try to produce the most likely analysis of new sentences
Q11. What is Named Entity Recognition(NER)?
Named entity recognition is a method to divide a sentence into categories. Neil Armstrong of the US had landed on the moon in 1969 will be categorized as Neil Armstrong- name; The US – country;1969 – time(temporal token).
The idea behind NER is to enable the machine to pull out entities like people, places, things, locations, monetary figures, and more.
Q13. Name some popular models other than Bag of words?
Latent semantic indexing, word2ve ( There is more to explore about NLP. Advancements like Google’s BERT, where a transformer network is preferred to CNN or RNN.)
Q14.Explain briefly about word2vec ?
Word2Vec embeds words in a lower-dimensional vector space using a shallow neural network. The result is a set of word-vectors where vectors close together in vector space have similar meanings based on context, and word-vectors distant to each other have differing meanings.
For example, apple and orange would be close together and apple and gravity would be relatively far. There are two versions of this model based on skip-grams (SG) and continuous-bag-of-words (CBOW).
Q15. What are some popular Python libraries used for NLP ?
Stanford’s CoreNLP, SpaCy , NLTK and TextBlob.
Q16. What is Latent Semantic Indexing?
Latent semantic indexing is a mathematical technique to extract information from unstructured data. It is based on the principle that words used in the same context carry the same meaning.
In order to identify relevant (concept) components, or in other words, aims to group words into classes that represent concepts or semantic fields, this method applies Singular Value Decomposition to the Term-Document matrix. As the name suggests this matrix consists of words as rows and document as columns. LSI is computation heavy when compared to other models. But it equips an NLP model with better contextual awareness, which is relatively closer to NLU.
Q17. How is feature extraction done in NL?
The features of a sentence can be used to conduct sentiment analysis or document classification. For example if a product review on Amazon or a movie review on IMDB consists of certain words like ‘good’, ‘great’ more, it could then be concluded/classified that a particular review is positive.
Bag of words is a popular model which is used for feature generation. A sentence can be tokenized and then a group or category can be formed out of these individual words, which further explored or exploited for certain characteristics(number of times a certain word appears etc).