Lab: 35 - Hash Map

Assigned
Monday, 20 November 2023

Overview

The goal of this lab is to implement linear probing, which is used as a technique for resolving collisions when accessing or inserting elements in a hash map.

Do not use java.util.Map in your implementations.

Preparation

Before you start the lab use the Closed Hashing written by David Galles at the University of San Francisco.

There is no starter code for this lab.

Counting Letters

(1 point) Write a program in a file called CountLetters.java that reports the occurrence of each of the 26 letters used in English and found in the /usr/share/dict/words file. The program should display one letter and its count on a separate line, e.g., a: 205807. Note that letters are not case-sensitive. In addition, the program should ignore any character that is not an English letter. Remember that UNICODE (which Java uses) recognizes thousands of characters as a letter in some language. You will not be able to rely upon the Java library function isAlphabetic.

Use an array to maintain these counts. How can you map English letters to the indices of this array?

Can the above program be applied to files with arbitrary character sets used in other languages, given the fixed array size that you used in the above exercise?

Counting Characters

The Pair Class

The pair class represents a key-value pair.

public class Pair {

  private char key;
  private int value;


  public Pair(char key, int value) {

	this.key = key;
	this.value = value;

  }

  //TODO: Implement getters and setters
}

Because of the limited array size used in the Counting Letters exercise, the provided solution does not work for counting the occurrence of arbitrary characters. A technique to address this problem is to determine the array index (hash code) through computing the mod of the character value by the length of the array. That way, every element has a proper index in the array. However, this technique needs to account for collisions that arise from inserting characters that map to the same index.

  • (2 points) Write a class called CountChars.java that maps arbitrary characters to their counts. Your class should support the following methods:
    • CountChars( ): creates a new empty array. Each array entry stores a key-value pair represented by the Pair class. (Use the Pair class below or write your own. Note that your Pair class does not need to be generic.) Implement proper getter and setter methods inside your Pair class. The initial size of the hash array is 26.
    • int find(char ch): returns the index of an entry in the array. For the key ch, the entry contains either a Pair or null. This method scans the array until it finds an open spot for placing a Pair or it finds an entry that already contains the Pair.
    • Integer put(char ch, int v): associates key ch with its value v in the map. It overwrites the value for the prior entry with ch, if it exists, and returns the old v. It returns null if there is not a mapping for ch.
    • Integer get (char ch): returns a value for key ch if it exists in the map. This method returns null if there is not a mapping for ch.
    • rehash( ): doubles the hash array length if there is not an unused entry in the array for placing a new key-value Pair.
    • Pair[] getArray( ): returns the array that contains the key-value Pairs.

To resolve collisions, use the linear probing method. Do not use java.util.Map in your implementations.

  • (1 point) Create a Java project and a client program. Your client program should read a file and count the number of occurrences of each character. This program should:
    • Read a line from a file.
    • Process the line one character at a time.
    • Check if the hash array already has an entry for that charater (using the get method).
    • Put the character and the (updated) count into the hash array.
    • It might need to call rehash as part of putting in a character for the first time.

Test your program by displaying characters and their counts for /usr/share/dict/words. You may want to try it with a smaller file of your own creation first so that you know how many instances should be found for each character.

Lab Submission

Submit the source files CountLetters.java and CountChars.java, your tester class and output after running your program.

Remember: Do not use magic numbers, write your code anticipating errors and print user-friendly error messages, all your public methods should be well documented (use Javadoc comments). Please include comments or organize your code and report in a way that will help the grader to find the answer to the exercises or posted questions.

Acknowledgements

This lab is based on a material made by Prof. Peter-Michael Osera.