Data Structures- Hashmaps, Sets, Hash Tables, Hashing and Collisions
Observing hashmaps with python dictionaries
- What is a Hashtable/Hashmap?
- What is Hashing and Collision?
- What is a Set?
- Dictionary Example
- Hacks
- Taylor Swift Extra Credit
What is a Hashtable/Hashmap?
A hashtable is a data structure that with a collection of key-value pairs, where each key maps to a value, and the keys must be unique and hashable.
- In Python there is a built in hashtable known as a dictionary.
The primary purpose of a hashtable is to provide efficient lookup, insertion, and deletion operations. When an element is to be inserted into the hashtable, a hash function is used to map the key to a specific index in the underlying array that is used to store the key-value pairs. The value is then stored at that index. When searching for a value, the hash function is used again to find the index where the value is stored.
The key advantage of a hashtable over other data structures like arrays and linked lists is its average-case time complexity for lookup, insertion, and deletion operations.
- The typical time complexity of a hashtable is O(1), also considered constant time.
What is Hashing and Collision?
Hashing is the process of mapping a given key to a value in a hash table or hashmap, using a hash function. The hash function takes the key as input and produces a hash value or hash code, which is then used to determine the index in the underlying array where the value is stored. The purpose of hashing is to provide a quick and efficient way to access data, by eliminating the need to search through an entire data structure to find a value.
However, it is possible for two different keys to map to the same hash value, resulting in a collision. When a collision occurs, there are different ways to resolve it, depending on the collision resolution strategy used.
Python's dictionary implementation is optimized to handle collisions efficiently, and the performance of the dictionary is generally very good, even in the presence of collisions. However, if the number of collisions is very high, the performance of the dictionary can degrade, so it is important to choose a good hash function that minimizes collisions when designing a Python dictionary.
What is a Set?
my_set = set([1, 2, 3, 2, 1])
print(my_set)
# What do you notice in the output?
# It doesn't print out the number again if it is repeated, and already mentioned once
# Why do you think Sets are in the same tech talk as Hashmaps/Hashtables?
# There will never be a duplicate in either.
from IPython.display import HTML, display
from pathlib import Path
from PIL import Image as pilImage
from io import BytesIO
import base64
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def scale_image(img):
baseWidth = 125
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
image['width'], image['height'] = img.size
image['pixels'] = image['width'] * image['height']
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
image['scaled_width'], image['scaled_height'] = img.size
image['scaled_pixels'] = image['scaled_width'] * image['scaled_height']
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data(images = [{'source': "Playboi Carti", 'label': "Whole Lotta Red", 'file': "WLR.png"}])
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("-----Album Information-----")
print(image['label'])
print(image['source'])
print("December 25, 2020")
print("")
print("-----Album Cover-----")
display(HTML(image['html']))
print("")
print("-----Tracklist-----")
WLR = {
"title": "Whole Lotta Red",
"artist": "Playboi Carti",
"year": 2020,
"genre": ["Rap", "HipHop"],
"tracks": {
1: "Rockstar Made",
2: "Go2DaMoon",
3: "Stop Breathing",
4: "Beno !",
5: "JumpOutTheHouse",
6: "M3tamorphosis",
7: "Slay3r",
8: "No Sl33p",
9: "New Tank",
10: "Teen X",
11: "Meh",
12: "Vamp Anthem",
13: "New N3on",
14: "Control",
15: "Punk Monk",
16: "On That Time",
17: "King Vamp",
18: "Place",
19: "Sky",
20: "Over",
21: "ILoveUIHateU",
22: "Die4Guy",
23: "Not PLaying",
24: "F33l Lik3 Dyin"
}
}
# What data structures do you see?
# I recognize a dictionary because it contains the curlyย brackets, integers, and strings.
# Printing the dictionary
print(WLR)
print(WLR.get('tracks'))
# or
print(WLR['tracks'])
print(WLR.get('tracks')[3])
# or
print(WLR['tracks'][9])
WLR["producer"] = set(['Playboi Carti', 'Pierre Bourne', 'Maaly Raw', 'Wheezy', 'F1lthy'])
# What can you change to make sure there are no duplicate producers?
# add 'set' in front of the list of producers
# Printing the dictionary
print(WLR)
WLR["tracks"].update({21: "ILoveUIHateU"})
WLR["genre"].append("Vampโผ๏ธ๐ฉธ")
# How would add an additional genre to the dictionary, like electropop?
# add 'append' and then the genre you would like to add
# Printing the dictionary
print(WLR)
# for k,v in WLR.items(): # iterate using a for loop for key and value
# print(str(k) + ": " + str(v))
# Write your own code to print tracks in readable format
for k,v in WLR["tracks"].items():
print(k,":",v)
# def search():
# search = input("What would you like to know about the album?")
# if WLR.get(search.lower()) == None:
# print("Invalid Search")
# else:
# print(WLR.get(search.lower()))
# search()
# This is a very basic code segment, how can you improve upon this code?
# We can add a while loop so that if the user misinputs something, then it will ask them to keep entering until it has been valid data within the dictionary
def search():
while True:
search = input("What would you like to know about the album?")
if not search:
print("Please enter a valid search term.")
elif search.lower() not in WLR:
print("Sorry, we could not find any information about that. Please try again.")
else:
print(WLR.get(search.lower()))
break
search()
Hacks
- Answer ALL questions in the code segments
- Create a diagram or comparison illustration (Canva).
- What are the pro and cons of using this data structure?
- Dictionary vs List
- Expand upon the code given to you, possible improvements in comments
-
Build your own album showing features of a python dictionary
-
For Mr. Yeung's class: Justify your favorite Taylor Swift song, answer may effect seed
Taylor Swift Extra Credit
My favorite Taylor Swift song is "Shake It Off," which was released in 2014, from the album '1989'. My taste of music is more upbeat and catchy music with strong lyrical music, and Taylor Swift killed it with this song, as she shows traits of resilience and self confidence.
Swift talks about the negative comments she received in her life and expresses it through this song, encouraging her audience to "shake it off" and not be let down of yourself through other people's words. This is very reasonable for any audience, but as a highschooler, I feel that this song truly important as you can receive lots of hate from other students, but you have to not let it affect you, which is something I have learned from this song. "Shake It Off" encourages listeners to keep going and not give up, which can be a powerful message for young people who are still figuring out their place in the world, such as highschoolers like me.
Overall, "Shake It Off" is my favorite Taylor Swift song because of its catchy beat, uplifting message, and relatable themes of self-confidence and resilience. This song was one of the first song's I've listened to from a young little kid and is the primary reason of why I enjoy listening to music nowadays.