[Algorithm](EN) Find top n elements using insertion sort in C/C++

Use insertion sort to find top n elements in array.


Environment and Prerequisite

  • C++


Insertion Sort?

Concept

  • Insertion Sort : Traverse all elements from front to end and insert each element to front sorted array part. As you can see below picture, consider the front array value is sorted. Then traverse all rest elements and insert each element to front sorted array part.
  • Time Complexity : O(n^2), Space Complexity : O(1)(doesn’t need extra spaces)
  • Below GIF shows insertion sort.


Find top N elements in datas.

Solution

  • There are many solutions.
  • We can use heap, selection after quick sorting… etc there many solutions. However when it is a case that select top n elements after all processings are done it doesn’t need to sort all elements. We only need top n elements so insertion sort could the best solution.
  • If it is not a case that all processings are done, heap could be the solution. That is why we call heap as priority queue.


Implmentation Goal

  • We will find top n elements in nodes which have string key and integer value.
  • Sorting rule will be 1. Integer value 2. Lexicographical order(if integer value are same)


Implementation

Common

  • Node is a datas that need to be sorted. Node have string key and integer value.
  • nodes : array with Node data type.
  • ranks : array which has top n elements of nodes.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];


String Comparison

  • Common string compare and copy function.
void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}


Compare Function

  • i and j each represents index of nodes array.
  • Compare rule : 1. Integer value 2. Lexicographical order(if integer value are same)
  • if i is bigger than j then it returns 1, it they are same then returns 0 if not then return -1
// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}


Init

  • Init nodes array and ranks array values.
void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}


Insert node data to ranks array

  • Doing insertion sort in ranks array when node_id in nodes array passes to function.
  • Return if node_id of node is smaller than last element of ranks array.
  • If it is included in top n elements(bigger than last element of ranks array), then find its position in ranks array and insert it. Push other elements in ranks array if needed.
void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}
}


Main Function

  • First, sort NUM_OF_RANKS elements. At first, there are trash values in ranks array so it needs to be sorted NUM_OF_RANKS elements.
  • Call find_ranks(node_id) function and print results.
int main() {

	init();

	// insert NUM_OF_RANKS size of data to ranks array
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {

		for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {
			// if ranks array is empty then insert node_id
			if (ranks[rank_index] == -1) {
				ranks[rank_index] = node_id;
				break;
			}
			else {
				// if nodes value of index node_id is bigger than current ranks[rank_index]
				if (my_compare(ranks[rank_index], node_id) < 0) {
					// move each values from that point
					for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
						ranks[k] = ranks[k - 1];
					}
					// insert node_id to ranks array
					ranks[rank_index] = node_id;
					break;
				}
			}
		}
	}

	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d : \n", i + 1);
		printf("ID : %d\n", ranks[i]);
		printf("Value : %d\n", nodes[ranks[i]].value);
		printf("Key : %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];

void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}

// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}

void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}

void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}
}

int main() {

	init();

	// insert NUM_OF_RANKS size of data to ranks array
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {

		for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {
			// if ranks array is empty then insert node_id
			if (ranks[rank_index] == -1) {
				ranks[rank_index] = node_id;
				break;
			}
			else {
				// if nodes value of index node_id is bigger than current ranks[rank_index]
				if (my_compare(ranks[rank_index], node_id) < 0) {
					// move each values from that point
					for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
						ranks[k] = ranks[k - 1];
					}
					// insert node_id to ranks array
					ranks[rank_index] = node_id;
					break;
				}
			}
		}
	}

	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d : \n", i + 1);
		printf("ID : %d\n", ranks[i]);
		printf("Value : %d\n", nodes[ranks[i]].value);
		printf("Key : %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Reference

[Linux] 쉘 스크립트에서 멀티프로세스(혹은 스레드) 기능 사용하기

> 백그라운드로 명령어를 실행해서 병렬적으로 실행되는 멀티 프로세스 환경을 만들어보자.## 환경- Linux 기반 시스템- Bash shell(/bin/bash)## 멀티프로세스? 병렬처리? 멀티스레드? 백그라운드 프로세스?- 여기서 진행할 방식...… Continue reading