Posts

NLP - Preprocessing of Text

Image
As a data scientist, we may use NLP for sentiment analysis (classifying words to have positive or negative connotation) or to make predictions in classification models, among other things. Typically, whether we’re given the data or have to scrape it, the text will be in its natural human format of sentences, paragraphs, tweets, etc. From there, before we can dig into analyzing, we will have to do some cleaning to break the text down into a format the computer can easily understand. These steps are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools. After a text is obtained, we start with text normalization. Text normalization includes:   - Tokenization - Remove stop words - Remove sparse terms and particular words - Stemming Tokenization: It is the process of breaking up the original raw text into component pieces or otherwise known as tokens.   Remove stop words: “Stop words” a...

Natural Language Processing- Basics

Image
Natural language Processing: Natural Language Processing is an area of Computer Science and Artificial Intelligence concern of the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. - Often when performing analysis, lots of data is numerical, such as sales numbers, physical measurements,  quantifiable categories. - Computers are very good at handling direct numerical information. - But what about do we do about text data? As humans we can tell that there is a plethora of information      inside of a text documents. - But a computer needs specialized processing techniques in order to understand raw text data. - Text data is highly unstructured and can also be in multiple languages! - Natural Language Processing attempts to use a variety of techniques in order to create some sort of   structure out of raw text data. - Some example use cases of natural l...

The idea of Neural Network

Image
The basic idea behind a neural network is to simulate lots of densely interconnected brain cells inside a computer so you can get it to learn things, recognize patterns, and make decisions in a human-like way.   The amazing thing about a neural network is that you don't have to program it to learn explicitly: it learns all by itself, just like a brain! It's important to note that neural networks are (generally) software simulations: they're made by programming very ordinary computers, working in a very traditional fashion with their ordinary transistors and serially connected logic gates, to behave as though they're built from billions of highly interconnected brain cells working in parallel. No-one has yet attempted to build a computer by wiring up transistors in a densely parallel structure exactly like the human brain. Building Dense Layer from scratch: Single Layer Neural Network: A single hidden layer is fed into a single output layer. The stat...

Importance of Activation Functions

Image
The purpose of activation functions is to introduce non-linearities into the network. Non-linear function allows to approximate arbitrarily complex functions that make neural network extremely powerful. For example, when a trained network with weights W and the network has only two inputs- x 1  ,  x 2  and we pass it through a non-linearity. Before applying non-linearity, If we feed in with a new input x 1 = -1 and x 2 = 2, the idea can be generalized a little bit more if we compute the line, we get minus -6. When we apply a sigmoid non-linearity, it collapses between 0 and 1. Sigmoid function results anything greater than 1 as above 0.5 and anything less than 1 as below 0.5. The reason why we use non-linear activation function is when deal with network thousands or millions of parameters and dimensional spaces then visualizing these type of plots becomes extremely difficult.

Introduction to Deep Learning

Image
Deep learning is an incredibly powerful tool   of Machine Learning that is based on learning data representations instead of task-specific algorithms. Deep Learning uses networks where data transforms through a number of layers before producing the output. Traditional machine learning algorithms typically try to define as the set of rules or features in the data and these are usually hand-engineered that’s why it tends to be brittle in practice. For example, if you want to perform facial detection, the first thing you have to do is classify or recognize mouth, eyes, ears and all in the image, if you find everything then you can say there is a face in the image. To recognize each thing you have to define a set of features. The key idea of deep learning is that you will need to learn the features just from raw data, you just have to take a bunch of images of faces and then the deep learning algorithm is going to develop some hierarchical representation of first dete...