[Algorithm](EN) Find top n elements using insertion sort in C/C++

Update(2019.07.25): Add insertions codes

Use insertion sort to find top n elements in array.


Environment and Prerequisite

  • C++


Insertion Sort?

Concept

  • Insertion Sort: Traverse all elements from front to end and insert each element to front sorted array part. As you can see below picture, consider the front array value is sorted. Then traverse all rest elements and insert each element to front sorted array part.
  • Time Complexity: O(n^2), Space Complexity: O(1)(doesn’t need extra spaces)
  • Below GIF shows insertion sort.


Find top N elements in datas.

Solution

  • There are many solutions.
  • We can use heap, selection after quick sorting… etc there many solutions. However when it is a case that select top n elements after all processings are done it doesn’t need to sort all elements. We only need top n elements so insertion sort could the best solution.
  • If it is not a case that all processings are done, heap could be the solution. That is why we call heap as priority queue.


Implmentation Goal

  • We will find top n elements in nodes which have string key and integer value.
  • Sorting rule will be 1. Integer value 2. Lexicographical order(if integer value are same)


Implementation

Common

  • Node is a datas that need to be sorted. Node have string key and integer value.
  • nodes: array with Node data type.
  • ranks: array which has top n elements of nodes.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];


String Comparison

  • Common string compare and copy function.
void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}


Compare Function

  • i and j each represents index of nodes array.
  • Compare rule: 1. Integer value 2. Lexicographical order(if integer value are same)
  • if i is bigger than j then it returns 1, it they are same then returns 0 if not then return -1
// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}


Init

  • Init nodes array and ranks array values.
void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}


Insert node data to ranks array

  • Basically codes are base on insertion sort.
  • There are insertion sort 1 and insertion sort 2. First one is find place from back to front and second one is find place from front to back. First one is a bit simple.
  • Doing insertion sort in ranks array when node_id in nodes array passes to function.
  • Return if node_id of node is smaller than last element of ranks array.
  • In first case, if it is included in top N elements then find place from back by comparing.
  • In second case, if it is included in top N elements then find place from front by comparing and move rest elements in turn.
void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

    // insertion sort 1
	ranks[NUM_OF_RANKS - 1] = node_id;
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int tmp = ranks[rank_index];
		int j = rank_index - 1;

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
    /*
	// insertion sort 2
	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}*/
}


Main Function

  • First, sort NUM_OF_RANKS elements. At first, there are trash values in ranks array so it needs to be sorted NUM_OF_RANKS elements.
  • Call find_ranks(node_id) function and print results.
int main() {

	init();

	// Initial array is blank
	// Add NUM_OF_RANKS's amount elements.

	// Add NUM_OF_RANKS elements.
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {
		ranks[node_id] = node_id;
	}

	// insertion sort 1
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int j = rank_index - 1;
		int tmp = ranks[rank_index];

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	for (int node_id = 1; node_id < NUM_OF_RANKS; ++node_id) {
		for (int rank_index = node_id; rank_index < NUM_OF_RANKS; ++rank_index) {

			// if nodes value of index node_id is bigger than current ranks[rank_index]
			if (my_compare(ranks[rank_index], node_id) < 0) {
				// move each values from that point
				for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
					ranks[k] = ranks[k - 1];
				}
				// insert node_id to ranks array
				ranks[rank_index] = node_id;
				break;
			}
		}
	}
	*/
	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d: \n", i + 1);
		printf("ID: %d\n", ranks[i]);
		printf("Value: %d\n", nodes[ranks[i]].value);
		printf("Key: %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];

void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}

// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}

void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}

void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

    // insertion sort 1
	ranks[NUM_OF_RANKS - 1] = node_id;
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int tmp = ranks[rank_index];
		int j = rank_index - 1;

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}*/
}

int main() {

	init();

	// Initial array is blank
	// Add NUM_OF_RANKS's amount elements.

	// Add NUM_OF_RANKS elements.
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {
		ranks[node_id] = node_id;
	}

	// insertion sort 1
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int j = rank_index - 1;
		int tmp = ranks[rank_index];

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	for (int node_id = 1; node_id < NUM_OF_RANKS; ++node_id) {
		for (int rank_index = node_id; rank_index < NUM_OF_RANKS; ++rank_index) {

			// if nodes value of index node_id is bigger than current ranks[rank_index]
			if (my_compare(ranks[rank_index], node_id) < 0) {
				// move each values from that point
				for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
					ranks[k] = ranks[k - 1];
				}
				// insert node_id to ranks array
				ranks[rank_index] = node_id;
				break;
			}
		}
	}
	*/
	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d: \n", i + 1);
		printf("ID: %d\n", ranks[i]);
		printf("Value: %d\n", nodes[ranks[i]].value);
		printf("Key: %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Reference