Posts

Showing posts from May, 2020

NLP - Preprocessing of Text

Image
As a data scientist, we may use NLP for sentiment analysis (classifying words to have positive or negative connotation) or to make predictions in classification models, among other things. Typically, whether we’re given the data or have to scrape it, the text will be in its natural human format of sentences, paragraphs, tweets, etc. From there, before we can dig into analyzing, we will have to do some cleaning to break the text down into a format the computer can easily understand. These steps are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools. After a text is obtained, we start with text normalization. Text normalization includes:   - Tokenization - Remove stop words - Remove sparse terms and particular words - Stemming Tokenization: It is the process of breaking up the original raw text into component pieces or otherwise known as tokens.   Remove stop words: “Stop words” a...

Natural Language Processing- Basics

Image
Natural language Processing: Natural Language Processing is an area of Computer Science and Artificial Intelligence concern of the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. - Often when performing analysis, lots of data is numerical, such as sales numbers, physical measurements,  quantifiable categories. - Computers are very good at handling direct numerical information. - But what about do we do about text data? As humans we can tell that there is a plethora of information      inside of a text documents. - But a computer needs specialized processing techniques in order to understand raw text data. - Text data is highly unstructured and can also be in multiple languages! - Natural Language Processing attempts to use a variety of techniques in order to create some sort of   structure out of raw text data. - Some example use cases of natural l...