Pages

Tuesday, September 20, 2011

Understand Hashmap Better


Many of us use Map in our code frequently. but very little of us know how the map is different than Array or linkedlist and works internally.
Well, we will explore how HashMap works. all of us know that map stores key value pairs, and we also know that it is better to use map when we need frequent search an item in collection. lets take an example that if we have a 1000 words put in collection and we want to find if a given particular word is present in collection or not, it would be inefficient if we tried to find sequentially looping through all 1000 words and find the matching word.map uses hashing technique which more efficiently narrow down the search and match the word.
.
what is hashing then?
.
Hashing uses a technique called as hashcode. It is nothing but using some algorithm to map object data to some representative integer value, that help to narrow down the search for finding object in collection. lets find how this mechanism works.
So to begin with we will take a simple example of small collection of strings like country name and its currency. assume that we have 5 countries and their currency need to be stored in collection/list. following is the data that we need to keep in map.

.
CountryCurrency
IndiaRupee
FranceEuro
CubaPeso
SwitzerlandSwiss Franc
DenmarkKrone

As I explained earlier that hascode uses a algorithm to find representative integer value of string, in our example we will take simple algorithm as count the length of string. so to put item in hashmap we will calculate it hashcode and put them in corresponding index.
Now assume that map uses two arrays to store key and value pairs respectively in each array as shown below. every item will be placed based on its calculated hashcode as an index item.


HashMap
(hashcode=keylength) IndexKeysArrayValueArrays
1
2
3
4CubaPeso
5
6FranceEuro
7DenmarkKrone
8
9
10BangladeshTaka
11SwitzerlandSwiss Franc

So if we want to find Denmark item in list we will simply count its length which is (7) and get item out from hash table from index 7. so simple isn't it. it also works much faster than comparing with other string. you will wonder how this will work if we have more than one key with same hashcode? i.e. if we have collision? If you noticed that we have wasted few space in the list. you will also have a question now what if we have a word more than has 100 characters long and rest are small? we will end up using lot of waste space.

.
Java uses the same structure as we have seen above, but instead of keeping items in a array it has array referenced to linklist, that means every item in a array points to a linklist called bucket. Now, as you know that two unequal object in Java very much can have equal hashcode, since hashcode is same, bucket location would be same and collision occurs in HashMap, and HashMap uses a linked list to store data objects, so they will be stored in next node of linked list, to explain same example above assume that if we have to find Denmark in map, it will first find its hashcode which is 7 and then it will traverse the linklist at index 7 for matching correct pair of key with its value with method key.equals().


.
Now to avoid empty space in the array, map tries to evenly distribute objects in the list. so it takes the hashcode of the key object and applies its own logic to identify bucket location for storing.please note that it stores whole key value pair object in the linklist, so that it can find it easily afterwords while retrieving. Its makes more clear now how HashMap stores Key value pairs objects, but one doubts arise how HashMap handles if array size grows? objects are stored in bucket based on the length of array to evenly distribute the load so that retrieval should be faster, so how it works if hashmap exceeds its given threshold load factor?
If the size of map exceeds given threshold defined by load factor i.e. if the load factor is .75%, it will try to re-size the map once it is filled by 75% java does this by creating new array of buckets by twice the size of previous map and then start putting all items to new map by calculating new bucket location, this method is called rehashing.
There are still more things to explore in map. Now it is much better that we know how HashMap stores objects internally and has logic of retrieval.

Lastly, If possible before closing, click on ad to support me.

6 comments:

  1. Thanks for sharing !!!
    This is very useful information...

    ReplyDelete
  2. Very nice... I knew that some hashing mechanism is used to store the values, but what is the use of hashing, I always wondered that. Thanks for putting light into the black box... :-)

    ReplyDelete
  3. Very well explained ! Thanks a lot!!!

    ReplyDelete
  4. Thanks for great explanation !!

    ReplyDelete