Introduction to Data Structures
A data structure can be defined as a structure that holds data. Up to this point, you should know about the different types of data that are available when we program. This data is critical in performing different operations and calculations. But so far, this data had been used without allocating any structure to it. Your programs had data and it was not organized. Has there been any data that you came across that had any structure to it? The basic data, having a bit of structure to it, that you came across was an array.
An array is a continuous sequence of one type of data. That data taken as a whole is known as an array and it is used as such. If you want to declare an array of integers of size 10, you will write it as:
int data[10];
Here, we have declared an array of integers, which is of size 10. This means that in the memory location, there are 10 integers, one after another, in a sequence and that collection is collectively called data. Now if we want to initialize this array, we can write:
for(int i = 0; i<10;i++>
{
data[i] = 0;
}
This code will initialize all the integers in the data array with the value 0. You can clearly see that, due to the presence of an array, the initialization of data can be easily done in a loop by writing a few lines of code. Thus having a proper structure for the data can be beneficial in performing computations. There are many other data structures that can be created and used in different programs. Some data structures have a specific name and are commonly used in programs. Others can be created on the basis of need, thus simplifying the way a person manages the data. These types of data are commonly known as Abstract Data Types or ADTs. But before we learn about them, we should discuss a specific construct of C/C++ and other object oriented languages, which allows the making of such ADTs possible. This is commonly known as a class.
Class & Structures
A class in C++ is an encapsulation of data members and functions that manipulate the data. What that means is, it is a structure that contains data which is hidden from the users and provides appropriate functions for changing the data under predefined conditions. A class can be defined like this:
class DataManager
{
private:
int x;
int y;
public:
DataManager() //Constructor
{
x = 0;
y = 0;
}
~ DataManager() //destructor
{
}
int Add()
{
return x+y;
}
};
struct DataManager
{
private:
int x;
int y;
public:
DataManager() //Constructor
{
x = 0;
y = 0;
}
~ DataManager() //destructor
{
}
int Add()
{
return x+y;
}
};
Now to explain this class and a structure:
In this class there are two variables x, y holding the data. There is a method/operation that performs the addition of both x and y and returns their sum. This operation is class specific and cannot be used without this class. Then there is a constructor method which, as you can see, has the same name as the class and does not return anything. There is also another function which has the same name as the class but a ~ comes before the name; this is known as a destructor. Whenever a class is declared a constructor is used to initialize its variables; a destructor is used to free up memory when the class variable is destroyed (more on that later).
Here, we have actually declared our own data type, known as DataManager. We can create its variables by simply writing:
DataManager manager;
Whenever we declare a variable of our defined class, that variable is normally referred as an object. So we can say that when we declare a class, we are defining an object which we are going to use in our program. Classes are the backbone of Object Oriented Programming, but here we are going to use them for defining Abstract Data Types.
Abstract Data Types
An abstract data type (ADT) is characterized by the following properties:
1. It exports a type.
Meaning that once an ADT has been defined, then variables can be declared with that ADT as the type
2. It exports a set of operations. This set is called interface.
It contains a set of functions that are made public for the users to use.
3. Operations of the interface are the one and only access mechanism to the type's data structure.
The defined functions are the only way by which that ADT can be used and manipulated
4. Axioms and preconditions define the application domain of the type.
There is a set of preconditions which define how the type should be used and what sort of conditions would produce what sort of a data.
In a high level language, say Java, ADTs are already specified, but since we are going to work in C++, we are going to declare the ADTs ourselves. The first step in this direction is to decide what we are going to keep as the internal storage container? We are going to start off with arrays and then later, move on to structures for declaring our internal storage containers. Keep in mind that for an array, the memory that has been allocated in the beginning, remains the same. Thus, we are going to find a work around this limitation in due time.
Bag
§ items can be added, removed, accessed
§ no implied order to the items
§ duplicates allowed
Set
§ same as a bag, except duplicate elements not allowed
§ It supports operations such as union, intersection, difference, subset
Lists
Items have a proper position in a list. A list can be of two types
1. Array List
2. Linked List
There is no specific way by which the data in a list can be accessed.
Stack
A collection of data where only the last element of the inserted data can be accessed.
§ Push data
§ Pop data
§ An indicator that always points to the top most element
§ Commonly known as a LIFO (Last In First Out) ADT
Queue
In a queue, the item that has been inserted in the beginning is accessed first
§ Insert data (Enqueue)
§ Remove data (Dequeue)
§ Commonly known as a FIFO (First In First Out) ADT
§ Keeps track of the first and last added item
There are more ADTs available, but we are going to keep the focus on Stack and Queue for now. We will start off by first implementing Stack and Queue by using an array and later implement the same by using a linked list.
Arrays
An ordered data item is called an array. It is a convenient way of accumulating data of the same type. All the data can be referenced by a single name and the individual data can be accessed by the appropriate index at which that data is present.
Declaring an array
An array is declared just like any other variable, the only difference being that the name is followed by the amount of data that variable is going to store. An integer array called data of length ‘10’ can be declared as:
int data[10];
It should be noted that every array begins with the index 0 and ends with the index n-1 where n is the length of the array. So for example, if I want to access the 5th element of the data array I will call data[4] and similarly, if I want to access the last element, I will call data[9] as the total length is 10. Calling data[10] will cause an error as it does not exist since it is referring to the 11th value of the data array which is actually not present.
Initializing an Array
You can initialize an array with different values like this:
int data[10] = {0,1,2,3,4,5,6,7,8,9};
but this can become troublesome for arrays with thousands of data. In such a case a simple for loop can suffice:
for(int i = 0; i<10;i++>
{
data[i] = 0;
}
This array can be graphically represented as:
| data[0] | 0 |
| data[1] | 0 |
| data[2] | 0 |
| data[3] | 0 |
| data[4] | 0 |
| data[5] | 0 |
| data[6] | 0 |
| data[7] | 0 |
| data[8] | 0 |
| data[9] | 0 |
Multidimensional Arrays
Sometimes you need to store in multi-dimensions, such is the case with matrices. This can easily be done by declaring a multidimensional array:
int matrix[2][2];
this matrix can be initialized as:
int matrix[2][2] = {1,0,0,1};
to make it look clearer, we can write this as:
int matrix[2][2] = { {1,0},{0,1}};
Graphically, it can be shown like this:
| matrix[0][0] = 1 | matrix[0][1] = 0 |
| matrix[1][0] = 0 | matrix[1][1] = 1 |
Here we have declared an identity matrix. We can also use nested for loops for initializing matrices:
for(int i = 0; i<10;i++>
{
for(int j = 0; j<10;j++>
{
data[i][j] = 0;
}
}
Arrays as Pointers
So far we have discussed the usage of arrays. Now we discuss what an array really is and how it can be used as a pointer in different situations.
Using an array name as a pointer
An array name is really a pointer to the first element of the array. For example, the following is legal.
int b[100]; // b is an array of 100 ints.
int* p; // p is a pointer to an int.
p = b; // Assigns the address of first element of b to p.
p = &b[0]; // Exactly the same assignment as above.
Array name is a const pointer
When you declare an array, the name is a pointer, which cannot be altered. In the previous example, you could never make this assignment. This means that the name of the array is actually pointing towards the address of the first element of the array series.
p = b; // Legal -- p is not a constant.
b = p; // ILLEGAL because b is a constant.
Pointer arithmetic
"Meaningful" arithmetic operations are allowed on pointers.
Add or subtract integers to/from a pointer. The result is a pointer.
Subtract two pointers to the same type. The result is an int.
Multiplying, adding two pointers, etc. don't make sense.
Pointer addition and element size
When you add an integer to a pointer, the integer is multiplied by the element size of the type that the pointer points to.
// Assume sizeof(int) is 4.
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b. i.e., &b[0]
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). i.e., &b[1]
Equivalence of subscription and dereference
Because of the way C/C++ uses pointers and arrays, you can reference an array element either by subscription or * (the unary dereference operator).
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b i.e., &b[0]
*p = 14; // Same as b[0] = 14
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). Ie, &b[1]
*p = 22; // Same as b[1] = 22;
Note: Incrementing a pointer using the unary ++ operator, either pre- or post-, increments the address it stores by the amount sizeof(type) where "type" is the type of the object pointed to. (i.e. 4 for an int)
Array of pointers
We can also define an array of pointers like this:
int* arrayOfPointers[10];
Each element of this array can point to an integer memory location. That is, every item of this array will store an address to an integer.
Pointers and Dynamic Allocation of Memory
int* a = NULL; // Pointer to int, initialize to nothing.
int n; // Size needed for array
cin >> n; // Read in the size
a = new int[n]; // Allocate n ints and save ptr in a.
delete [] a; // When done, free memory pointed to by a.
a = NULL; // Clear a to prevent using invalid memory reference.
It must be kept in mind that the same pointer can be used to point towards a single integer value like this:
int* a = new int; //Allocating memory to the pointer
*a = 10; //Putting the value 10 at the memory location
delete a; //Deallocating memory, there are no square brackets
a = NULL; //Clearing a
Parallel Array
In computing , a parallel array is a data structure for representing arrays of records. It keeps a separate, homogeneous array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory. For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.
int ages[] = {0, 17, 2, 52, 25};
char *names[] = {"None", "Mike", "Billy", "Tom", "Stan"};
int parent[] = {0 /*None*/, 3 /*Tom*/, 1 /*Mike*/, 0 /*None*/, 3 /*Tom*/};
for(i = 1; i <= 4; i++)
{
printf("Name: %s, Age: %d, Parent: %s \n", names[i], ages[i], names[parent[i]]);
No comments:
Post a Comment