Update(2019.07.25): Add insertions codes

Use insertion sort to find top n elements in array.


Environment and Prerequisite

  • C++


Insertion Sort?

Concept

  • Insertion Sort: Traverse all elements from front to end and insert each element to front sorted array part. As you can see below picture, consider the front array value is sorted. Then traverse all rest elements and insert each element to front sorted array part.
  • Time Complexity: O(n^2), Space Complexity: O(1)(doesn’t need extra spaces)
  • Below GIF shows insertion sort.


Find top N elements in datas.

Solution

  • There are many solutions.
  • We can use heap, selection after quick sorting… etc there many solutions. However when it is a case that select top n elements after all processings are done it doesn’t need to sort all elements. We only need top n elements so insertion sort could the best solution.
  • If it is not a case that all processings are done, heap could be the solution. That is why we call heap as priority queue.


Implmentation Goal

  • We will find top n elements in nodes which have string key and integer value.
  • Sorting rule will be 1. Integer value 2. Lexicographical order(if integer value are same)


Implementation

Common

  • Node is a datas that need to be sorted. Node have string key and integer value.
  • nodes: array with Node data type.
  • ranks: array which has top n elements of nodes.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];


String Comparison

  • Common string compare and copy function.
void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}


Compare Function

  • i and j each represents index of nodes array.
  • Compare rule: 1. Integer value 2. Lexicographical order(if integer value are same)
  • if i is bigger than j then it returns 1, it they are same then returns 0 if not then return -1
// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}


Init

  • Init nodes array and ranks array values.
void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}


Insert node data to ranks array

  • Basically codes are base on insertion sort.
  • There are insertion sort 1 and insertion sort 2. First one is find place from back to front and second one is find place from front to back. First one is a bit simple.
  • Doing insertion sort in ranks array when node_id in nodes array passes to function.
  • Return if node_id of node is smaller than last element of ranks array.
  • In first case, if it is included in top N elements then find place from back by comparing.
  • In second case, if it is included in top N elements then find place from front by comparing and move rest elements in turn.
void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

    // insertion sort 1
	ranks[NUM_OF_RANKS - 1] = node_id;
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int tmp = ranks[rank_index];
		int j = rank_index - 1;

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
    /*
	// insertion sort 2
	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}*/
}


Main Function

  • First, sort NUM_OF_RANKS elements. At first, there are trash values in ranks array so it needs to be sorted NUM_OF_RANKS elements.
  • Call find_ranks(node_id) function and print results.
int main() {

	init();

	// Initial array is blank
	// Add NUM_OF_RANKS's amount elements.

	// Add NUM_OF_RANKS elements.
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {
		ranks[node_id] = node_id;
	}

	// insertion sort 1
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int j = rank_index - 1;
		int tmp = ranks[rank_index];

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	for (int node_id = 1; node_id < NUM_OF_RANKS; ++node_id) {
		for (int rank_index = node_id; rank_index < NUM_OF_RANKS; ++rank_index) {

			// if nodes value of index node_id is bigger than current ranks[rank_index]
			if (my_compare(ranks[rank_index], node_id) < 0) {
				// move each values from that point
				for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
					ranks[k] = ranks[k - 1];
				}
				// insert node_id to ranks array
				ranks[rank_index] = node_id;
				break;
			}
		}
	}
	*/
	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d: \n", i + 1);
		printf("ID: %d\n", ranks[i]);
		printf("Value: %d\n", nodes[ranks[i]].value);
		printf("Key: %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_KEY_LEN 13
#define MIN_KEY_LEN 5
#define MAX_NUM_OF_DATA 5000
#define MAX_VALUE_SIZE 5000
#define NUM_OF_RANKS 5

using namespace std;

struct Node {
	char key[MAX_KEY_LEN];
	int value;
};

// data
Node nodes[MAX_NUM_OF_DATA];

// array for saving top NUM_OF_RANKS of nodes index
int ranks[NUM_OF_RANKS];

void my_str_cpy(char * dest, const char * src) {
	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';
}

int my_str_cmp(const char * str1, const char * str2) {
	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++; str2++;
	}
	return *str1 - *str2;
}

// compare nodes[i] and nodes[j] value ans key.
// nodes[i] > nodes[j] => return 1;
// nodes[i] == nodes[k] => return 0;
// nodes[i] < nodes[j] => return -1;
int my_compare(int i, int j) {
	if (nodes[i].value > nodes[j].value ||
		(nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) < 0)) {
		return 1;
	}
	else if (nodes[i].value == nodes[j].value && my_str_cmp(nodes[i].key, nodes[j].key) == 0) {
		return 0;
	}
	else {
		return -1;
	}
}

void init() {

	// ranks array init
	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		ranks[i] = -1;
	}

	// for random value
	srand(time(NULL));

	// init integer values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		nodes[i].value = rand() % MAX_VALUE_SIZE + 1;
	}

	// init string key values
	for (int i = 0; i < MAX_NUM_OF_DATA; ++i) {
		for (int j = 0; j < MAX_KEY_LEN - 1; ++j) {
			nodes[i].key[j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		nodes[i].key[rand() % MIN_KEY_LEN + MIN_KEY_LEN] = '\0';
	}
}

void find_ranks(int node_id) {

	// return if nodes[node_id]'s value is smaller than last element of ranks array
	if (my_compare(ranks[NUM_OF_RANKS - 1], node_id) > 0) {
		return;
	}

    // insertion sort 1
	ranks[NUM_OF_RANKS - 1] = node_id;
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int tmp = ranks[rank_index];
		int j = rank_index - 1;

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	// traverse ranks array
	for (int rank_index = 0; rank_index < NUM_OF_RANKS; ++rank_index) {

		// if nodes value of index node_id is bigger than current ranks[rank_index]
		if (my_compare(ranks[rank_index], node_id) < 0) {
			// move each values from that point
			for (int j = NUM_OF_RANKS - 1; j > rank_index; --j) {
				ranks[j] = ranks[j - 1];
			}
			// insert node_id to ranks array
			ranks[rank_index] = node_id;
			break;
		}
	}*/
}

int main() {

	init();

	// Initial array is blank
	// Add NUM_OF_RANKS's amount elements.

	// Add NUM_OF_RANKS elements.
	for (int node_id = 0; node_id < NUM_OF_RANKS; ++node_id) {
		ranks[node_id] = node_id;
	}

	// insertion sort 1
	for (int rank_index = 1; rank_index < NUM_OF_RANKS; ++rank_index) {
		int j = rank_index - 1;
		int tmp = ranks[rank_index];

		while(j >= 0 && my_compare(ranks[j], tmp) < 0){
			ranks[j+1] = ranks[j];
			j--;
		}
		ranks[j+1] = tmp;
	}
	/*
	// insertion sort 2
	for (int node_id = 1; node_id < NUM_OF_RANKS; ++node_id) {
		for (int rank_index = node_id; rank_index < NUM_OF_RANKS; ++rank_index) {

			// if nodes value of index node_id is bigger than current ranks[rank_index]
			if (my_compare(ranks[rank_index], node_id) < 0) {
				// move each values from that point
				for (int k = NUM_OF_RANKS - 1; k > rank_index; --k) {
					ranks[k] = ranks[k - 1];
				}
				// insert node_id to ranks array
				ranks[rank_index] = node_id;
				break;
			}
		}
	}
	*/
	// Find top NUM_OF_RANKS elements in nodes
	for (int node_id = NUM_OF_RANKS; node_id < MAX_NUM_OF_DATA; ++node_id) {
		find_ranks(node_id);
	}

	for (int i = 0; i < NUM_OF_RANKS; ++i) {
		printf("%d: \n", i + 1);
		printf("ID: %d\n", ranks[i]);
		printf("Value: %d\n", nodes[ranks[i]].value);
		printf("Key: %s\n", nodes[ranks[i]].key);
	}
	return 0;
}


Reference

Implement hash table in C/C++


Environment and Prerequisite

  • C++
  • Understanding of hash function and linked list


Hash Function

What is Hash Function?

  • Hash Function: Hash function is any function that can be used to map data of arbitrary size onto data of a fixed size. Sometimes hash function result could be same. In this case we call this as Collision.(H(s1) = H(s2))
  • In below picture, blue things on left are keys and each key goes into hash function and result into right side hashe values. Later on in hash table values on right side will be used as hash table array index.
  • Consider H() as hash function, then H(Jonh Smith) = 02


Hash Collision

  • Case which different key results in same hash value.
  • Consider H() as hash function and s1 and s2 as different string, then H(s1) = H(s2)
  • Solution for collision: Chaining or Open Addressing


Example

  • Below sample is simple hash function which get string and return integer value;
  • On internet there are many hash function implementations. Shift operation and prime number is widely used in hash function.
int hash(const char * str) {
	int hash = 401;
	int c;

	while (*str != '\0') {
		hash = ((hash << 4) + (int)(*str)) % MAX_TABLE;
		str++;
	}

	return hash % MAX_TABLE;
}


Hash Table

  • Data structure which save pair data key and value
  • Like below usage, it is a data structure use Lisa Smith as key and save 521-8976 as value.
  • There are already hash table implementations in each language like map in C++ and dictionary in python.
  • Theoretically, accessing time complexity is O(c). Hash table use more memory but take advantage of accessing time.
  • Use Chaining or Open Addressing for collision


Implementation

  • In this post, I use Chaining for collision.
  • Use Singly Linked List for Chaining

Common

  • Hash table implementation using linked list
  • Node is for data with key and value
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_TABLE 5 // size of table
#define MAX_KEY 8 // include null
#define MAX_DATA 12 // num of datas for hash table
#define DELETE_COUNT 6 // num of datas for deletion
#define FIND_COUNT 8 // num of datas for finding

struct Node {
	char key[MAX_KEY];
	int value;
	Node * next;
};

Node * tb[MAX_TABLE]; // hash table
char keys[MAX_DATA][MAX_KEY]; // keys
int values[MAX_DATA]; // values


Init Function

  • Make key-value pairs using random function.
void init() {

	// hash table initiation
	for (int i = 0; i < MAX_TABLE; ++i) {
		Node * cur = tb[i];
		Node * tmp;
		while (cur != NULL) {
			tmp = cur;
			cur = cur->next;
			free(tmp);
		}
		tb[i] = NULL;
	}

	// srand and seed for random function
	srand(time(NULL));

	// init values
	for (int i = 0; i < MAX_DATA; ++i) {
		values[i] = rand() % 100 + 1;
	}

	// init keys
	for (int i = 0; i < MAX_DATA; ++i) {
		for (int j = 0; j < MAX_KEY - 1; ++j) {
			keys[i][j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		keys[i][MAX_KEY - 1] = '\0';
	}

}


String Function

  • Compare and copy function.
void my_str_cpy(char * dest, const char * src) {

	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';

}

int my_str_cmp(const char * str1, const char * str2) {

	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++;
		str2++;
	}
	return *str1 - *str2;

}


Hash Function

  • Just use same function as above
  • Use shift operatoin and prime number
  • We need to divide it by MAX_TABLE for hash table array.
int hash(const char * str) {
	int hash = 401;
	int c;

	while (*str != '\0') {
		hash = ((hash << 4) + (int)(*str)) % MAX_TABLE;
		str++;
	}

	return hash % MAX_TABLE;
}


Add

  • Simpel linked list hash add implementation
  • If there is same key in hash table, then replace its value.
void add(const char * key, int value) {

	Node * new_node = (Node *)malloc(sizeof(Node));
	my_str_cpy(new_node->key, key);
	new_node->value = value;
	new_node->next = NULL;

	int index = hash(key);

	// insert if first is NULL
	if (tb[index] == NULL) {
		tb[index] = new_node;
	}
	// traverse list one by one
	// change duplicated value if not then add it to front

	else {

		Node * cur = tb[index];

		while (cur != NULL) {

			// if key is duplicated, then replace its value
			if (my_str_cmp(cur->key, key) == 0) {
				cur->value = value;
				return;
			}

			cur = cur->next;
		}

		// add to front if it is not duplicated
		new_node->next = tb[index];
		tb[index] = new_node;
	}
}


Find Value

  • Use linked list for finding.
  • If there is matching key, then save it to val and return true.
  • If there is no matching key, then return false
bool find(const char * key, int * val) {

	int index = hash(key);

	Node * cur = tb[index];

	// Find key by traversing list one by one
	while (cur != NULL) {
		if (my_str_cmp(cur->key, key) == 0) {
			*val = cur->value;
			return true;
		}
		cur = cur->next;
	}

	return false;

}


Deletion

  • Delete node if key is matched.
  • Just check the first of list. It makes easy for implementation.
bool destroy(const char * key) {

	int index = hash(key);

	// check first of list
	if (tb[index] == NULL) {
		return false;
	}

	// check first element
	if (my_str_cmp(tb[index]->key, key) == 0) {
		Node * first = tb[index];
		tb[index] = tb[index]->next;
		free(first);
		return true;
	}

	// others
	else {

		Node * cur = tb[index]->next;
		Node * prev = tb[index];

		while (cur != NULL && my_str_cmp(cur->key, key) != 0) {
			prev = cur;
			cur = cur->next;
		}

		if (cur == NULL) return false;

		prev->next = cur->next;
		free(cur);
		return true;
	}
}


Print

  • Print all key-value pairs while traversing all table and lists.
void print_hash() {

	for (int i = 0; i < MAX_TABLE; ++i) {
		if (tb[i] != NULL) {

			printf("index: %d\n", i);

			Node * cur = tb[i];

			while (cur != NULL) {
				printf("{ %s, %d }, ", cur->key, cur->value);
				cur = cur->next;
			}
			printf("\n");
		}
	}

}


Main

  • Test add, find and destroy functions.
int main() {

	char tmp_key[MAX_KEY];
	init();

	// Add

	printf("Add to hash table\n");
	for (int i = 0; i < MAX_DATA; ++i) {
		add(keys[i], values[i]);
	}

	print_hash();


	printf("\n");

	// Delete

	printf("Deleted keys: ");
	for (int i = 0; i < DELETE_COUNT; ++i) {
		my_str_cpy(tmp_key, keys[rand() % MAX_DATA]);
		printf("%s ", tmp_key);
		destroy(tmp_key);
	}
	printf("\n");

	print_hash();


	printf("\n");

	// Find

	int val;
	printf("Found: ");
	for (int i = 0; i < FIND_COUNT; ++i) {
		my_str_cpy(tmp_key, keys[rand() % MAX_DATA]);
		if (find(tmp_key, &val)) {
			printf("{ %s, %d } ", tmp_key, val);
		}
	}
	printf("\n");

	return 0;
}


Code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAX_TABLE 5 // size of table
#define MAX_KEY 8 // include null
#define MAX_DATA 12 // num of datas for hash table
#define DELETE_COUNT 6 // num of datas for deletion
#define FIND_COUNT 8 // num of datas for finding

struct Node {
	char key[MAX_KEY];
	int value;
	Node * next;
};

Node * tb[MAX_TABLE]; // hash table
char keys[MAX_DATA][MAX_KEY]; // keys
int values[MAX_DATA]; // values

void init() {

	// hash table initiation
	for (int i = 0; i < MAX_TABLE; ++i) {
		Node * cur = tb[i];
		Node * tmp;
		while (cur != NULL) {
			tmp = cur;
			cur = cur->next;
			free(tmp);
		}
		tb[i] = NULL;
	}

	// srand and seed for random function
	srand(time(NULL));

	// init values
	for (int i = 0; i < MAX_DATA; ++i) {
		values[i] = rand() % 100 + 1;
	}

	// init keys
	for (int i = 0; i < MAX_DATA; ++i) {
		for (int j = 0; j < MAX_KEY - 1; ++j) {
			keys[i][j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		keys[i][MAX_KEY - 1] = '\0';
	}

}

void my_str_cpy(char * dest, const char * src) {

	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';

}

int my_str_cmp(const char * str1, const char * str2) {

	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++;
		str2++;
	}
	return *str1 - *str2;

}

int hash(const char * str) {
	int hash = 401;
	int c;

	while (*str != '\0') {
		hash = ((hash << 4) + (int)(*str)) % MAX_TABLE;
		str++;
	}

	return hash % MAX_TABLE;
}

void add(const char * key, int value) {

	Node * new_node = (Node *)malloc(sizeof(Node));
	my_str_cpy(new_node->key, key);
	new_node->value = value;
	new_node->next = NULL;

	int index = hash(key);

	// insert if first is NULL
	if (tb[index] == NULL) {
		tb[index] = new_node;
	}
	// traverse list one by one
	// change duplicated value if not then add it to front

	else {

		Node * cur = tb[index];

		while (cur != NULL) {

			// if key is duplicated, then replace its value
			if (my_str_cmp(cur->key, key) == 0) {
				cur->value = value;
				return;
			}

			cur = cur->next;
		}

		// add to front if it is not duplicated
		new_node->next = tb[index];
		tb[index] = new_node;
	}
}

bool find(const char * key, int * val) {

	int index = hash(key);

	Node * cur = tb[index];

	// Find key by traversing list one by one
	while (cur != NULL) {
		if (my_str_cmp(cur->key, key) == 0) {
			*val = cur->value;
			return true;
		}
		cur = cur->next;
	}

	return false;

}

bool destroy(const char * key) {

	int index = hash(key);

	// check first of list
	if (tb[index] == NULL) {
		return false;
	}

	// check first element
	if (my_str_cmp(tb[index]->key, key) == 0) {
		Node * first = tb[index];
		tb[index] = tb[index]->next;
		free(first);
		return true;
	}

	// others
	else {

		Node * cur = tb[index]->next;
		Node * prev = tb[index];

		while (cur != NULL && my_str_cmp(cur->key, key) != 0) {
			prev = cur;
			cur = cur->next;
		}

		if (cur == NULL) return false;

		prev->next = cur->next;
		free(cur);
		return true;
	}
}

void print_hash() {

	for (int i = 0; i < MAX_TABLE; ++i) {
		if (tb[i] != NULL) {

			printf("index: %d\n", i);

			Node * cur = tb[i];

			while (cur != NULL) {
				printf("{ %s, %d }, ", cur->key, cur->value);
				cur = cur->next;
			}
			printf("\n");
		}
	}

}

int main() {

	char tmp_key[MAX_KEY];
	init();

	// Add

	printf("Add to hash table\n");
	for (int i = 0; i < MAX_DATA; ++i) {
		add(keys[i], values[i]);
	}

	print_hash();


	printf("\n");

	// Delete

	printf("Deleted keys: ");
	for (int i = 0; i < DELETE_COUNT; ++i) {
		my_str_cpy(tmp_key, keys[rand() % MAX_DATA]);
		printf("%s ", tmp_key);
		destroy(tmp_key);
	}
	printf("\n");

	print_hash();


	printf("\n");

	// Find

	int val;
	printf("Found: ");
	for (int i = 0; i < FIND_COUNT; ++i) {
		my_str_cpy(tmp_key, keys[rand() % MAX_DATA]);
		if (find(tmp_key, &val)) {
			printf("{ %s, %d } ", tmp_key, val);
		}
	}
	printf("\n");

	return 0;
}


Verification

  • Modify hash function name(hash->my_hash)
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <iostream>
#include <map>
#include <string>

#define TOTAL_TEST_CASE 100

#define MAX_TABLE 300 // size of table
#define MAX_KEY 8 // include null
#define MAX_DATA 2000 // num of datas for hash table
#define DELETE_COUNT 1000 // num of datas for deletion

using namespace std;

struct Node {
	char key[MAX_KEY];
	int value;
	Node * next;
};

Node * tb[MAX_TABLE]; // hash table
char keys[MAX_DATA][MAX_KEY]; // keys
int values[MAX_DATA]; // values

void init() {

	// hash table initiation
	for (int i = 0; i < MAX_TABLE; ++i) {
		Node * cur = tb[i];
		Node * tmp;
		while (cur != NULL) {
			tmp = cur;
			cur = cur->next;
			free(tmp);
		}
		tb[i] = NULL;
	}

	// srand and seed for random function
	srand(time(NULL));

	// init values
	for (int i = 0; i < MAX_DATA; ++i) {
		values[i] = rand() % 100 + 1;
	}

	// init keys
	for (int i = 0; i < MAX_DATA; ++i) {
		for (int j = 0; j < MAX_KEY - 1; ++j) {
			keys[i][j] = rand() % 26 + 97; // ASCII 97 ~ 122
		}
		keys[i][MAX_KEY - 1] = '\0';
	}

}

void my_str_cpy(char * dest, const char * src) {

	while (*src != '\0') {
		*dest = *src;
		dest++; src++;
	}
	*dest = '\0';

}

int my_str_cmp(const char * str1, const char * str2) {

	while (*str1 != '\0' && (*str1 == *str2)) {
		str1++;
		str2++;
	}
	return *str1 - *str2;

}

int my_hash(const char * str) {
	int hash = 401;

	while (*str != '\0') {
		hash = ((hash << 4) + (int)(*str)) % MAX_TABLE;
		str++;
	}

	return hash % MAX_TABLE;
}

void add(const char * key, int value) {

	Node * new_node = (Node *)malloc(sizeof(Node));
	my_str_cpy(new_node->key, key);
	new_node->value = value;
	new_node->next = NULL;

	int index = my_hash(key);

	// insert if first is NULL
	if (tb[index] == NULL) {
		tb[index] = new_node;
	}
	// traverse list one by one
	// change duplicated value if not then add it to front
	else {

		Node * cur = tb[index];

		while (cur != NULL) {

			// if key is duplicated, then replace its value
			if (my_str_cmp(cur->key, key) == 0) {
				cur->value = value;
				return;
			}

			cur = cur->next;
		}
		// add to front if it is not duplicated
		new_node->next = tb[index];
		tb[index] = new_node;
	}
}

bool find(const char * key, int * val) {

	int index = my_hash(key);

	Node * cur = tb[index];

	// Find key by traversing list one by one
	while (cur != NULL) {
		if (my_str_cmp(cur->key, key) == 0) {
			*val = cur->value;
			return true;
		}
		cur = cur->next;
	}

	return false;

}

bool destroy(const char * key) {

	int index = my_hash(key);

	// check first of list
	if (tb[index] == NULL) {
		return false;
	}

	// check first element
	if (my_str_cmp(tb[index]->key, key) == 0) {
		Node * first = tb[index];
		tb[index] = tb[index]->next;
		free(first);
		return true;
	}

	// others
	else {

		Node * cur = tb[index]->next;
		Node * prev = tb[index];

		while (cur != NULL && my_str_cmp(cur->key, key) != 0) {
			prev = cur;
			cur = cur->next;
		}

		if (cur == NULL) return false;

		prev->next = cur->next;
		free(cur);
		return true;
	}
}

void print_hash() {

	for (int i = 0; i < MAX_TABLE; ++i) {
		if (tb[i] != NULL) {

			printf("index: %d\n", i);

			Node * cur = tb[i];

			while (cur != NULL) {
				printf("{ %s, %d }, ", cur->key, cur->value);
				cur = cur->next;
			}
			printf("\n");
		}
	}

}

int main() {

	int test_case = 1;
	int correct = 0;


	for (test_case = 1; test_case <= TOTAL_TEST_CASE; ++test_case) {

		init();

		bool is_equal = true;

		map<string, int> m;
		map<string, int>::iterator it;


		for (int i = 0; i < MAX_DATA; ++i) {
			add(keys[i], values[i]);
		}
		for (int i = 0; i < MAX_DATA; ++i) {
			if (m.count(keys[i]) == 0) {
				m.insert(make_pair(keys[i], values[i]));
			}
			else {
				m[keys[i]] = values[i];
			}

		}

		for (int i = 0; i < MAX_DATA; ++i) {

			int tmp;
			find(keys[i], &tmp);

			if (m[keys[i]] != tmp) {
				is_equal = false;
			}
		}

		char tmp_key[MAX_KEY];
		for (int i = 0; i < DELETE_COUNT; ++i) {
			my_str_cpy(tmp_key, keys[rand() % MAX_DATA]);
			destroy(tmp_key);
			m.erase(tmp_key);
		}

		for (int i = 0; i < MAX_DATA; ++i) {

			int tmp = -1;

			if (find(keys[i], &tmp) == false && m.count(keys[i]) == 0) {
				continue;
			}

			if (find(keys[i], &tmp) == true && m.count(keys[i]) == 1 && m[keys[i]] == tmp) {
				continue;
			}
			else {
				is_equal = false;
			}

		}

		if (is_equal) correct++;

	}

	printf("Total: %d / %d\n", correct, TOTAL_TEST_CASE);

	return 0;
}


Reference

clock() 함수를 이용해 코드의 실행 시간을 측정해보자.


환경

  • C, C++


방법과 예제

사용법

  • clock(): time.h에 들어있는 함수로 프로그램에 의해 프로세서가 소비된 시간을 반환하는 함수입니다. 프로세서가 측정한 프로그램 실행시간이라 볼 수 있습니다.
  • clock_t: clock ticks의 자료를 담고 있는 자료형으로 clock()의 반환형입니다.
  • CLOCKS_PER_SEC: 초당 clock ticks의 수를 나타낸 매크로로 시스템에 따라 기본 값이 다르며 시간을 표시하기 위해 아래 예제처럼 사용합니다.
#include <stdio.h>
#include <time.h> // time.h 헤더 파일을 include 해줘야합니다.

int main(){

    clock_t start = clock(); // 시작 시간 저장

    // 필요한 코드들 작성

    clock_t end = clock(); // 코드가 끝난 시간 저장

    // 걸린 시간 출력
    // 단위: 초(second)
    // CLOCKS_PER_SEC로 나눠줘야 초단위로 나옵니다.
    printf("Time: %lf\n", (double)(end - start)/CLOCKS_PER_SEC);

    return 0;
}


예제

  • 이중 for문을 이용한 시간 측정
#include <stdio.h>
#include <time.h> // time.h 헤더 파일을 include 해줘야합니다.

#define ITERATION_TIME 100000

int main (){

    clock_t start = clock(); // 시작 시간 저장

    for(int i = 0; i < ITERATION_TIME; ++i){
        for(int j = 0; j < ITERATION_TIME; ++j)
            int do_something;
    }

    clock_t end = clock(); // 코드가 끝난 시간 저장

    // 걸린 시간 출력
    // 단위: 초(second)
    // CLOCKS_PER_SEC로 나눠줘야 초단위로 나옵니다.
    printf("Time: %lf\n",(double)(end - start)/CLOCKS_PER_SEC);

}
  • 결과
$ g++ practice_board.cpp -o practice_board
$ ./practice_board
Time: 18.825036


참고자료

Measure code running time by using clock() function in c or cpp.


Environment and Prerequisite

  • C, C++


Usage and Example

Usage

  • clock(): It is a function in time.h which returns the processor time consumed by the program. We can consider it as processor running time consumed by program
  • clock_t: Return type of clock() function. It involves data type of clock ticks.
  • CLOCKS_PER_SEC: It is a macro which represents clock ticks per second. Its value depends on system. Use this for human-readable time.
#include <stdio.h>
#include <time.h> // include time.h

int main(){

    clock_t start = clock(); // save start time

    // Code Here!!!

    clock_t end = clock(); // save end time

    // Print measured time
    // Unit: second
    // Dividing a count of clock ticks by this expression yields the number of seconds.
    printf("Time: %lf\n", (double)(end - start)/CLOCKS_PER_SEC);

    return 0;
}


Example

  • Measuring nested loop iteration time.
#include <stdio.h>
#include <time.h> // include time.h

#define ITERATION_TIME 100000

int main (){

    clock_t start = clock(); // save start time

    for(int i = 0; i < ITERATION_TIME; ++i){
        for(int j = 0; j < ITERATION_TIME; ++j)
            int do_something;
    }

    clock_t end = clock(); // save end time

    // Print measured time
    // Unit: second
    // Dividing a count of clock ticks by this expression yields the number of seconds.
    printf("Time: %lf\n",(double)(end - start)/CLOCKS_PER_SEC);

}
  • Result
$ g++ practice_board.cpp -o practice_board
$ ./practice_board
Time: 18.825036


Reference

누적 합(Prefix sum)을 이용해서 구간 합을 구해보자


환경 및 선수조건

  • C, C++


문제

문제 링크

  • 백준 11659번 문제
  • 문제 링크: https://www.acmicpc.net/problem/11659
  • 문제: 수 N개가 주어졌을 때, i번째 수부터 j번째 수까지 합을 구하는 프로그램을 작성하시오.


접근법 및 풀이법

(방법1)

  • N개의 수를 배열에 입력받고 i부터 j까지의 합을 입력이 올 때마다 계산해서 반환
  • 시간복잡도: O(n^2)

(방법2)

  • Prefix sum을 구해두고 sum(i)(처음부터 i까지의 합) - sum(j-1)(처음부터 j-1까지의 합)을 통해서 반환
  • 시간복잡도: O(n)


누적합(Prefix Sum, Cumulative Sum)이란?

정의

  • 누적합(Cumulative Sum) 또는 부분합(Prefix Sum): x0, x1 … xN까지 수가 있을 때 y0=x0, y1=x0+x1 … yN=x0+x1+…+xN처럼 해당 N까지의 합을 의미합니다.
y0 = x0
y1 = x0 + x1
y2 = x0 + x1 + x2
...
yN = x0 + x1 + ... + xN

누적합의 성질

  • 특정 위치 i부터 j까지의 합은 아래의 공식으로 나타낼 수 있습니다.
S[j] - S[i-1] = a[i] + a[i+1] + ... + a[j-1] + a[j]


풀이

방법1 - O(n^2)

  • 매번 요청이 올때마다 합을 구합니다.
#include <cstdio>

#define MAX_N 100001
#define MAX_M 100001

using namespace std;

int value[MAX_N];

int main(){

    int N, M;

    scanf("%d %d", &N, &M);

    for(int i = 1; i <= N; ++i){
        scanf("%d", &value[i]);
    }

    while(M--){
        int from, to;
        scanf("%d %d", &from, &to);

        int sum = 0;

        for(int i = from; i <= to; ++i){
            sum += value[i];
        }
        printf("%d\n", sum);
    }

    return 0;
}

방법2 - O(n)

  • 매번 요청이 올때마다 미리 구해놓은 sum배열을 이용해 값을 바로 반환합니다.
#include <cstdio>

#define MAX_N 100001
#define MAX_M 100001

using namespace std;

int value[MAX_N];
int sum[MAX_N];

int main(){

    int N, M;

    scanf("%d %d", &N, &M);

    for(int i = 1; i <= N; ++i){
        scanf("%d", &value[i]);
        sum[i] = sum[i-1] + value[i];
    }

    while(M--){
        int from, to;
        scanf("%d %d", &from, &to);
        printf("%d\n", sum[to] - sum[from - 1] );
    }


    return 0;
}


참고자료