%load_ext pretty_jupyter
# Import Notebook Libraries
from plotly.offline import plot
import pandas as pd
from IPython.display import display, HTML
import plotly.express as px
import plotly.graph_objects as go
import plotly.io
from pretty_jupyter.helpers import matplotlib_fig_to_html
plotly.io.renderers.default = 'notebook_connected'

Introduction

"The Hogwarts Bestiary," is a unique project where data analysis meets The Wizarding World.

Here, we dive into the fascinating traits and abilities of magical creatures, sorting them into the four illustrious Hogwarts Houses: Gryffindor, Hufflepuff, Ravenclaw, and Slytherin, and investigating any associations between the creatures in each house. This endeavor combines my very millennial obsession passion for the Harry Potter universe with data analysis, offering a fresh perspective on these enchanting beings.

Behind the Magic: Crafting the Dataset

Our journey begins with a dataset as unique as the creatures it contains. Compiled from the Ministry of Magic's Classification List, this dataset is a labor of love, featuring creatures and all of their features and traits. Each entry was meticulously researched by myself, ensuring a rich tapestry of data that forms the foundation of our analysis.

You can view and download the dataset from the GitHub repository here.

#Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import ast
import random
from ast import literal_eval
import numpy as np
from typing import List, Dict
import plotly.graph_objects as go
import networkx as nx
from typing import List, Dict, Tuple
# Load the dataset
dataset_path = 'updated_new_creatures_dataset.xlsx'
creatures_df = pd.read_excel(dataset_path)

Exploratory Data Analysis: Deciphering the Traits of Pre-Sorted Creatures

Here, we begin by examining creatures whose houses are already known. Our goal is to dissect their characteristics, understand the nuances of their magical abilities, and derive a set of keywords that could serve as criteria for sorting the remaining creatures in our dataset.

# Load the dataset
dataset_path = 'updated_new_creatures_dataset.xlsx'
creatures_df = pd.read_excel(dataset_path)

Creature Profiles: The Pre-Sorted Ensemble

To enhance our understanding, let's meet some of these pre-sorted creatures up close. Each profile provides a window into the creature's world, offering insights into why they might embody the spirit of their respective houses.

Unraveling Creature Traits and Features

With a focus on the pre-sorted creatures, we will identify and visualize the top 5 classes in the Magical Ability Categories, the top 5 specific abilities and powers in the Magical Abilities and Powers, and the top 5 symbolic meanings in each house.

# Function to preprocess and count occurrences in list-like columns
def preprocess_and_count(data, column_name):
    # Convert string representation of lists to actual lists
    data[column_name] = data[column_name].apply(ast.literal_eval)
    # Flatten the list of lists and count occurrences
    flattened_list = [item for sublist in data[column_name] for item in sublist]
    return Counter(flattened_list)

# Filter data for each house and count occurrences 
gryffindor_data = data[data['House Assignment'] == 'Gryffindor']

Magical Classes

We'll start with the analysis of the Magical Ability Categories for each house, which tells us the general class of magic a creature exhibits.

# Count occurrences in 'Magical Ability Categories'
gryffindor_counts = preprocess_and_count(gryffindor_data, 'Magical Ability Categories')
# Get top 5 Magic Classes for each house
gryffindor_top5 = dict(gryffindor_counts.most_common(5))
import pandas as pd

# Load the dataset
file_path = 'updated_new_creatures_dataset.xlsx'
data = pd.read_excel(file_path)
# Function to preprocess and count occurrences in list-like columns
def preprocess_and_count(data, column_name):
    # Convert string representation of lists to actual lists
    data[column_name] = data[column_name].apply(ast.literal_eval)
    # Flatten the list of lists and count occurrences
    flattened_list = [item for sublist in data[column_name] for item in sublist]
    return Counter(flattened_list)

# Filter data for each house and count occurrences in 'Magical Ability Categories'
gryffindor_data = data[data['House Assignment'] == 'Gryffindor'].copy()
slytherin_data = data[data['House Assignment'] == 'Slytherin'].copy()
ravenclaw_data = data[data['House Assignment'] == 'Ravenclaw'].copy()
hufflepuff_data = data[data['House Assignment'] == 'Hufflepuff'].copy()

gryffindor_counts = preprocess_and_count(gryffindor_data, 'Magical Ability Categories')
slytherin_counts = preprocess_and_count(slytherin_data, 'Magical Ability Categories')
ravenclaw_counts = preprocess_and_count(ravenclaw_data, 'Magical Ability Categories')
hufflepuff_counts = preprocess_and_count(hufflepuff_data, 'Magical Ability Categories')

gryffindor_top5 = dict(gryffindor_counts.most_common(5))
slytherin_top5 = dict(slytherin_counts.most_common(5))
ravenclaw_top5 = dict(ravenclaw_counts.most_common(5))
hufflepuff_top5 = dict(hufflepuff_counts.most_common(5))

# Combine all top 5s into a single list for plotting
categories = list(gryffindor_top5.keys()) + list(slytherin_top5.keys()) + list(ravenclaw_top5.keys()) + list(hufflepuff_top5.keys())
counts = list(gryffindor_top5.values()) + list(slytherin_top5.values()) + list(ravenclaw_top5.values()) + list(hufflepuff_top5.values())
colors = ['#b71c1c', '#880e4f', '#d32f2f', '#e57373', # Gryffindor colors
          '#1b5e20', '#2e7d32', '#388e3c', '#81c784', # Slytherin colors
          '#0d47a1', '#1565c0', '#1976d2', '#64b5f6', # Ravenclaw colors
          '#f57f17', '#f9a825', '#fbc02d', '#fff176'] # Hufflepuff colors
# Set up the figure for multiple radial charts
fig, axs = plt.subplots(2, 2, figsize=(16, 16), subplot_kw=dict(polar=True))
axs = axs.flatten()  # Flatten to 1D array for easier iteration

# Data for plotting
houses_data = [
    (gryffindor_top5, '#611010', 'Gryffindor'),
    (slytherin_top5, '#04393b', 'Slytherin'),
    (ravenclaw_top5, '#1d0e7a', 'Ravenclaw'),
    (hufflepuff_top5, '#816300', 'Hufflepuff')
]

# Function to add radial charts to subplots
def add_radial_chart(data, ax, color, title):
    categories = list(data.keys())
    values = list(data.values())
    num_vars = len(categories)
    
    angles = np.linspace(0, 2 * np.pi, num_vars, endpoint=False).tolist()
    # Complete the loop
    values += values[:1]
    angles += angles[:1]
    
    ax.fill(angles, values, color=color, alpha=0.25)
    ax.plot(angles, values, color=color, linewidth=2)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(categories)
    ax.set_yticklabels([])
    ax.set_title(title, size=18, color=color, y=1.1)

# Plot each house's data on a separate subplot
for i, (data, color, title) in enumerate(houses_data):
    add_radial_chart(data, axs[i], color, title)

plt.tight_layout(pad=5)
plt.show()
  • Gryffindor creatures tend to excel in "Strength and Agility", highlighting their bravery and physical prowess.
  • Slytherin emphasizes "Toxic and Poisonous" abilities, aligning with their cunning and sometimes dangerous nature.
  • Ravenclaw’s creatures show a propensity for "Strength and Agility" as well as "Stealth and Detection Magic", reflecting the house’s affinity for wit.
  • Hufflepuff showcases "Healing Magic" and "Unique Magic", indicating a nurturing aspect and appreciation for all creatures' unique qualities

Specific Abilities and Powers

Next, we'll analyze and visualize the top 5 abilities and powers per house from the Magical Abilities and Powers column. This will give us an insight into the specific, unique abilities of the creatures assigned to each house.

# Count occurrences in 'Magical Abilities and Powers' for each house
gryffindor_abilities_counts = preprocess_and_count(gryffindor_data, 'Magical Abilities and Powers')
# Get top 5 abilities and powers for each house
gryffindor_abilities_top5 = dict(gryffindor_abilities_counts.most_common(5))
# Count occurrences in 'Magical Abilities and Powers' for each house
gryffindor_abilities_counts = preprocess_and_count(gryffindor_data, 'Magical Abilities and Powers')
slytherin_abilities_counts = preprocess_and_count(slytherin_data, 'Magical Abilities and Powers')
ravenclaw_abilities_counts = preprocess_and_count(ravenclaw_data, 'Magical Abilities and Powers')
hufflepuff_abilities_counts = preprocess_and_count(hufflepuff_data, 'Magical Abilities and Powers')

# Get top 5 abilities and powers for each house
gryffindor_abilities_top5 = dict(gryffindor_abilities_counts.most_common(5))
slytherin_abilities_top5 = dict(slytherin_abilities_counts.most_common(5))
ravenclaw_abilities_top5 = dict(ravenclaw_abilities_counts.most_common(5))
hufflepuff_abilities_top5 = dict(hufflepuff_abilities_counts.most_common(5))

gryffindor_abilities_top5, slytherin_abilities_top5, ravenclaw_abilities_top5, hufflepuff_abilities_top5


# Create a graph
G = nx.Graph()

# Add nodes with the ability as the node ID and the count as an attribute
for house, top5 in zip(['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff'],
                       [gryffindor_abilities_top5, slytherin_abilities_top5, ravenclaw_abilities_top5, hufflepuff_abilities_top5]):
    for ability, count in top5.items():
        G.add_node(ability, count=count)
        G.add_edge(house, ability)

# Position nodes using the spring layout
pos = nx.spring_layout(G, seed=42)

# Draw the graph
plt.figure(figsize=(12, 8))
nx.draw_networkx_edges(G, pos, alpha=0.5)
colors = ['#611010', '#04393b', '#1d0e7a', '#816300']
labels = {}

# Draw nodes for each house's abilities and powers
for i, house in enumerate(['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff']):
    house_nodes = [node for node in G.neighbors(house)]
    nx.draw_networkx_nodes(G, pos, nodelist=house_nodes, node_color=colors[i], label=house, alpha=0.8)
    nx.draw_networkx_labels(G, pos, labels={node: node for node in house_nodes}, font_size=8)
    labels[house] = house

# Draw house nodes
nx.draw_networkx_nodes(G, pos, nodelist=['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff'], node_size=0)
nx.draw_networkx_labels(G, pos, labels=labels, font_size=10, font_weight='bold')

plt.title("Top 5 Abilities and Powers per House", fontsize=14)
plt.axis('off')
plt.legend(loc="upper left")
plt.show()
  • Common Powers: "Strength" is a shared ability across all houses except Slytherin, suggesting a common appreciation for resilience across the houses.
  • Unique Powers: Each house has unique abilities not shared with others, like "Fire Breathing" for Gryffindor and "Human Speech" for Slytherin, reflecting the distinct characteristics of the creatures associated with each house.
  • Power Distribution: The distribution of powers across the houses reflects their respective focuses—Gryffindor and Ravenclaw value power and intellect, Slytherin values cunning and control, and Hufflepuff values healing and nature.

Cultural Meanings

Next, we'll proceed with analyzing and visualizing the top 5 symbols per house in the "Cultural Symbolism" column. This will give us insight into the cultural meanings of the creatures within each house.

# Function to count occurrences of single-value entries in a column
def count_occurrences(data, column_name):
    return data[column_name].value_counts().to_dict()

# Count occurrences in 'Cultural Symbolism'
gryffindor_symbols_counts = count_occurrences(gryffindor_data, 'Cultural Symbolism')

# Get top 5 symbols 
gryffindor_symbols_top5 = dict(sorted(gryffindor_symbols_counts.items(), key=lambda item: item[1], reverse=True)[:5])
# Function to count occurrences of single-value entries in a column
def count_occurrences(data, column_name):
    return data[column_name].value_counts().to_dict()

# Count occurrences in 'Cultural Symbolism' for each house
gryffindor_symbols_counts = count_occurrences(gryffindor_data, 'Cultural Symbolism')
slytherin_symbols_counts = count_occurrences(slytherin_data, 'Cultural Symbolism')
ravenclaw_symbols_counts = count_occurrences(ravenclaw_data, 'Cultural Symbolism')
hufflepuff_symbols_counts = count_occurrences(hufflepuff_data, 'Cultural Symbolism')

# Get top 5 symbols for each house
gryffindor_symbols_top5 = dict(sorted(gryffindor_symbols_counts.items(), key=lambda item: item[1], reverse=True)[:5])
slytherin_symbols_top5 = dict(sorted(slytherin_symbols_counts.items(), key=lambda item: item[1], reverse=True)[:5])
ravenclaw_symbols_top5 = dict(sorted(ravenclaw_symbols_counts.items(), key=lambda item: item[1], reverse=True)[:5])
hufflepuff_symbols_top5 = dict(sorted(hufflepuff_symbols_counts.items(), key=lambda item: item[1], reverse=True)[:5])
# Prepare data for bubble chart
symbols = list(set(list(gryffindor_symbols_top5.keys()) + list(slytherin_symbols_top5.keys()) + list(ravenclaw_symbols_top5.keys()) + list(hufflepuff_symbols_top5.keys())))
house_names = ['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff']
colors = ['#611010', '#04393b', '#1d0e7a', '#816300']

symbol_counts_per_house = [gryffindor_symbols_top5, slytherin_symbols_top5, ravenclaw_symbols_top5, hufflepuff_symbols_top5]

# Initialize plot
fig, ax = plt.subplots(figsize=(12, 8))

# Plot each house's symbols
for i, (house, counts, color) in enumerate(zip(house_names, symbol_counts_per_house, colors)):
    x = [i]*len(counts)  # Same x coordinate for each house's symbols
    y = [symbols.index(sym) for sym in counts.keys()]  # Y coordinate based on the symbol
    sizes = [count*300 for count in counts.values()]  # Bubble size based on count
    ax.scatter(x, y, s=sizes, color=color, alpha=0.6, label=house)

# Customizing the plot
ax.set_xticks(range(len(house_names)))
ax.set_xticklabels(house_names)
ax.set_yticks(range(len(symbols)))
ax.set_yticklabels(symbols)
ax.legend()

plt.title('Top 5 Cultural Symbols per House', fontsize=14)
plt.xlabel('House')
plt.ylabel('Cultural Symbol')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.show()

The visualization presents a bubble chart showing the top five cultural symbols associated with each of the four Hogwarts houses. The size of each bubble represents the frequency or significance of each cultural symbol within each house.

  • Shared Symbols: "Power" and "Fear" are shared by Gryffindor and Slytherin, suggesting a shared impact and presence that command respect or intimidation, albeit manifested differently.
  • Unique Symbols: Each house has unique cultural symbols not seen in others, highlighting the distinct identities and values each house represents.
  • Uncommon Pairings : Some symbols like "Servitude" for Ravenclaw and "Danger" for Hufflepuff are less traditionally associated with these houses, which may indicate a more nuanced interpretation or an aspect not commonly acknowledged.

Universal Themes

Analyzing Broader Symbolism to reveal how creatures symbolize more general or universal themes.

# Filter the dataset for the four houses
houses = ['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff']
filtered_data = data[data['House Assignment'].isin(houses)]

# Explore 'Broader Symbolism' for each house
symbolism_data = filtered_data.groupby('House Assignment')['Broader Symbolism'].value_counts().unstack(fill_value=0)

# Identify the top 5 features of 'Broader Symbolism' per house
top_symbolism_per_house = symbolism_data.apply(lambda x: x.nlargest(5), axis=1)
# Filter the dataset for the four houses
houses = ['Gryffindor', 'Slytherin', 'Ravenclaw', 'Hufflepuff']
filtered_data = data[data['House Assignment'].isin(houses)]

# Explore 'Broader Symbolism' for each house
symbolism_data = filtered_data.groupby('House Assignment')['Broader Symbolism'].value_counts().unstack(fill_value=0)

# Identify the top 5 features of 'Broader Symbolism' per house
top_symbolism_per_house = symbolism_data.apply(lambda x: x.nlargest(5), axis=1)

# Prepare colors for each house
house_colors = {
    'Gryffindor': ['#611010', '#430b2b', '#b10025', '#FFC6C4', '#E63939'],
    'Slytherin': ['#04393b', '#004826', '#1d9c34', '#2E8B57', '#74C365'],
    'Ravenclaw': ['#1d0e7a', '#09557b', '#0f7277', '#3C7EBB', '#7BAFD4'],
    'Hufflepuff': ['#816300', '#d9c336', '#c8c193', '#FFE303', '#EDDA74']
}

# Extract the top 'Broader Symbolism' data for each house
houses_data = {house: top_symbolism_per_house.loc[house].dropna().sort_values(ascending=False) for house in house_colors}

# Plotting pie charts for each house with the correct color mapping
fig, axs = plt.subplots(2, 2, figsize=(14, 14))
axs = axs.flatten()

for i, (house, color_scheme) in enumerate(house_colors.items()):
    house_data = houses_data[house]
    axs[i].pie(house_data, labels=house_data.index, colors=color_scheme[:len(house_data)], autopct='%1.1f%%', startangle=90)
    axs[i].set_title(house)

plt.tight_layout()
plt.show()

These pie charts give us insights into the general themes prevalent within each house's creatures:

  • Gryffindor: Aggression and Dominance (30%) suggests a proclivity towards assertiveness and leadership, reflecting Gryffindor's characteristic bravery and assertiveness.
  • Slytherin: Danger and Threat (45.5%) indicates a significant emphasis on the capability to intimidate or be perceived as dangerous, perhaps as a defense mechanism or a means of asserting oneself.
  • Ravenclaw: Wisdom and Knowledge (37.5%) directly corresponds with Ravenclaw's dedication to learning and understanding, which is central to the house's identity.
  • Hufflepuff: Mischief and Trickery (30%) reflects a surprising emphasis on playful cleverness, suggesting that Hufflepuff's friendly nature may include a propensity for lighthearted pranks or intelligent strategy.

Overall Insights

Gryffindor

  • Creatures associated with Gryffindor carry symbols of bravery, such as "Aggression and Dominance," and virtues like "Loyalty and Nobility," resonating with the house's heraldry of the lion. The traits are reflected in both their magical abilities and broader symbolism, emphasizing Gryffindor's bold and courageous spirit.

Slytherin

  • Slytherin's creatures are marked by themes of "Danger and Threat," aligning with the house's complex reputation for ambition and cunning, as evidenced by their magical abilities like "Petrification" and "Venomous" attacks. "Aggression and Dominance" are also key themes, reflecting Slytherin's ambitious and sometimes intimidating nature.

Ravenclaw

  • Wisdom is central to Ravenclaw, evident in both the magical abilities like "Teleportation" and "Invisibility," reflecting intellectual escapism and introspection, and in broader symbolism, highlighting "Wisdom and Knowledge." The creatures' attributes also allude to "Mystery and Secrecy," consistent with the house's affinity for the enigmatic.

Hufflepuff

  • Hufflepuff's creatures reflect an unexpected range, with a significant portion characterized by "Mischief and Trickery," suggesting an often-overlooked cleverness. Additionally, "Guardianship and Protection" reflects their nurturing nature, while "Loyalty and Nobility" emphasize this house's affinity towards friendship and companionship.

Shared Characteristics

  • Common abilities like "Strength" span across the houses, indicating a universal appreciation for resilience and fortitude.
  • Each house exhibits its form of "Aggression and Dominance," whether it's through bravery, ambition, intellect, or protectiveness.
  • Cultural symbols and broader themes often overlap, such as "Power" and "Fear," suggesting that while houses have unique identities, they share certain fundamental qualities.

In summary, the visualizations present a comprehensive understanding of how the creatures embody the core characteristics of their respective houses, while also sharing universal qualities that unite the members of all four houses.


The Sorting Hat's Logic: Matching Creatures to Houses

Just as Hogwarts students are sorted based on their qualities, our magical creatures find their places in houses that reflect their most prominent traits. Gryffindor celebrates strength and nobility, Hufflepuff values companionship and loyalty, Ravenclaw prizes intelligence and wisdom, and Slytherin admires power and cunning. This section details our methodology, drawing parallels between a creature's characteristics and a house's essence.

Scoring Algorithm

Sorting will involve using matching keywords and defining a scoring function to match to creatures to each house. The creature will be assigned to the house for which it scores the highest based on the criteria.

def score_creature_for_review(creature_attributes, house_criteria):
    # Initialize scores dictionary with all houses set to zero
    scores = {house: 0 for house in house_criteria}

    # Iterate over each house and their specific sorting criteria
    for house, criteria in house_criteria.items():
        # Check each attribute (like Cultural Symbolism) against the creature's attributes
        for attribute, keywords in criteria.items():
            # Ensure the creature has this attribute
            if attribute in creature_attributes:
                # Retrieve the attribute's value; ensure it's in list format
                creature_feature = creature_attributes[attribute]
                if not isinstance(creature_feature, list):
                    creature_feature = [creature_feature]
                # Increment the house's score based on matching keywords
                for keyword in keywords:
                    scores[house] += sum(keyword.lower() in feature.lower() for feature in creature_feature)

    # Determine the highest score across all houses
    max_score = max(scores.values())
    # Identify if there is a tie by finding houses with the maximum score
    tied_houses = [house for house, score in scores.items() if score == max_score]

    # Return "Review Needed" if there's a tie, otherwise return the winning house
    return "Review Needed" if len(tied_houses) > 1 else tied_houses[0]
creatures_df = pd.read_excel('updated_new_creatures_dataset.xlsx')
# Define sorting criteria for each house
house_criteria = {
    "Hufflepuff": {
        "Cultural Symbolism": ["gaurdianship", "affection", "loyalty", "kindness"],
        "Magical Abilities and Powers": ["camouflage", "healing", "herbs"],
        "Broader Symbolism": ["Guardianship and Protection", "Loyalty and Nobility"],
        "Magical Ability Categories": ["Healing Magic", "Herbs and Potions", "Unique Magic"],
    },
    "Ravenclaw": {
        "Cultural Symbolism": ["wisdom", "elusiveness", "intelligence", "intuition"],
        "Magical Abilities and Powers": ["divination", "flight", "strength"],
        "Broader Symbolism": ["Wisdom and Knowledge", "Mystery and Secrecy"],
        "Magical Ability Categories": ["Divination", "Sensory and Perception", "Knowledge and Intelligence"],
    },
    "Slytherin": {
        "Cultural Symbolism": ["fear", "power", "cunning", "ambition"],
        "Magical Abilities and Powers": ["strength", "venomous", "flight"],
        "Broader Symbolism": ["Danger and Threat", "Aggression and Dominance"],
        "Magical Ability Categories": ["Toxic and Poisonous", "Strength and Agility", "Dark Magic"],
    },
    "Gryffindor": {
        "Cultural Symbolism": ["majesty", "honor", "bravery", "nobility"],
        "Magical Abilities and Powers": ["flight", "strength", "fire breathing", "magic resistance"],
        "Broader Symbolism": ["Otherworldly Qualities", "Aggression and Dominance"],
        "Magical Ability Categories": ["Strength and Agility", "Offensive Magic", "Defensive Magic"],
    }
}

# Filter the dataset for creatures with "To be assigned"
creatures_to_sort = creatures_df[creatures_df['House Assignment'] == "To be assigned"]

# Define the function that handles ties by assigning "Review Needed"
def score_creature_for_review(creature_attributes, house_criteria):
    scores = {house: 0 for house in house_criteria}
 
    for house, criteria in house_criteria.items():
        for attribute, keywords in criteria.items():
            if attribute in creature_attributes:
                creature_feature = creature_attributes[attribute]
                if not isinstance(creature_feature, list):
                    creature_feature = [creature_feature]
                for keyword in keywords:
                    scores[house] += sum(keyword.lower() in feature.lower() for feature in creature_feature)
    
    max_score = max(scores.values())
    tied_houses = [house for house, score in scores.items() if score == max_score]
    
    return "Review Needed" if len(tied_houses) > 1 else tied_houses[0]

# Apply the sorting function only to creatures needing assignment
for index, row in creatures_to_sort.iterrows():
    creature_attributes = {
        "Cultural Symbolism": [row['Cultural Symbolism'].lower()],
        "Magical Abilities and Powers": ast.literal_eval(row['Magical Abilities and Powers']),
        "Broader Symbolism": [row['Broader Symbolism']],
        "Magical Ability Categories": ast.literal_eval(row['Magical Ability Categories'])
    }
    assigned_house = score_creature_for_review(creature_attributes, house_criteria)
    # Update the original DataFrame with the new assignments
    creatures_df.at[index, 'House Assignment'] = assigned_house

Results:

The scoring function proved to be very succesfull! Only 27 creatures remain unsorted with Review Needed, indicating an overlap of traits and features. To resolve this, we'll create a secondary sorting criteria using the exact same method, a sorting algorithm for creatures only with Review Needed in their housing assignments. This secondary function follows the same pattern as the initial scoring function but uses different criteria tailored for resolving ties.

# Creating a secondary scoring function
def secondary_scoring_function(creature_attributes, secondary_criteria):
    # Initialize scores with all houses set to zero based on secondary criteria
    scores = {house: 0 for house in secondary_criteria}

    # The following lines follow the same logic as the primary scoring function
    # Here, creature attributes are matched against the secondary criteria
    # to compute scores. (This is indicated by the ellipsis "...")
    ...

# Applying the secondary scoring function to creatures marked as "Review Needed"
for index, row in creatures_df[creatures_df['House Assignment'] == 'Review Needed'].iterrows():
    creature_attributes = {
        "Cultural Symbolism": [row['Cultural Symbolism'].lower()],
        "Magical Abilities and Powers": ast.literal_eval(row['Magical Abilities and Powers']),
        "Broader Symbolism": [row['Broader Symbolism']],
        "Magical Ability Categories": ast.literal_eval(row['Magical Ability Categories'])
    }
    assigned_house = secondary_scoring_function(creature_attributes, secondary_criteria)
    # Update the DataFrame with the newly assigned house
    creatures_df.at[index, 'House Assignment'] = assigned_house
#Creating a secondary scoring function
def secondary_scoring_function(creature_attributes, secondary_criteria):
    scores = {house: 0 for house in secondary_criteria}
    
    for house, criteria in secondary_criteria.items():
        for attribute, keywords in criteria.items():
            if attribute in creature_attributes:
                creature_feature = creature_attributes[attribute]
                # Convert single items into lists for uniform processing
                if not isinstance(creature_feature, list):
                    creature_feature = [creature_feature]
                # Score based on the presence of any keywords in the creature's attributes
                for keyword in keywords:
                    # Skip empty criteria
                    if keyword == "":
                        continue
                    scores[house] += sum(keyword.lower() in feature.lower() for feature in creature_feature)
    
    max_score = max(scores.values())
    tied_houses = [house for house, score in scores.items() if score == max_score]
    
    # Handle a persistent tie or if no criteria matched (max_score == 0)
    if len(tied_houses) > 1 or max_score == 0:
        return "Review Needed"
    else:
        return tied_houses[0]

# Secondary criteria for "Review Needed" creatures
secondary_criteria = {
    "Hufflepuff": {
        "Cultural Symbolism": ["lightheartedness", "charm", "insolence", ],
        "Magical Abilities and Powers": ["mimicry", "none(magical', 'but no specific abilities)", "healing"],
        "Broader Symbolism": ["Emotional Bonds", "Mischief and Trickery", "Fertility and Growth"],
        "Magical Ability Categories": [""],
    },
    "Ravenclaw": {
        "Cultural Symbolism": ["transformation", "loyalty"],
        "Magical Abilities and Powers": ["flight", "intelligence", "riddles"],
        "Broader Symbolism": ["Nature and Change"],
        "Magical Ability Categories": ["Weather Magic"],
    },
    "Slytherin": {
        "Cultural Symbolism": ["ferocity", "temptation", "outcast", "deception"],
        "Magical Abilities and Powers": ["stealth", "shape shifting", "venomous", "blood sucking", "produces gas"],
        "Broader Symbolism": ["Adaptability and Survival", "Negativity and Harm"],
        "Magical Ability Categories": ["Sensory and Perception", "Enchantment"],
    },
    "Gryffindor": {
        "Cultural Symbolism": ["beauty", "grace", "violence"],
        "Magical Abilities and Powers": ["speed", "fire dwelling"],
        "Broader Symbolism": [],
        "Magical Ability Categories": ["Defensive Magic"],
    }
}

# Applying the secondary scoring function to creatures marked as "Review Needed"
for index, row in creatures_df[creatures_df['House Assignment'] == 'Review Needed'].iterrows():
    creature_attributes = {
        "Cultural Symbolism": [row['Cultural Symbolism'].lower()],
        "Magical Abilities and Powers": ast.literal_eval(row['Magical Abilities and Powers']),
        "Broader Symbolism": [row['Broader Symbolism']],
        "Magical Ability Categories": ast.literal_eval(row['Magical Ability Categories'])
    }
    assigned_house = secondary_scoring_function(creature_attributes, secondary_criteria)
    creatures_df.at[index, 'House Assignment'] = assigned_house

Model Validation

Ensuring that the sorting logic and scoring functions functioned as intended and that the creatures' assignments are justifiable based on their characteristics.

To do this, we'll be performing consistency checks by selecting a random sample of creatures from each house and verifying that the characteristics of the creature align with the traits specified in the sorting logic.

df = pd.read_excel('all_sorted_creatures.xlsx')
sample_per_house = 3
samples = df.groupby('House Assignment').apply(lambda x: x.sample(n=sample_per_house)).reset_index(drop=True)
df = pd.read_excel('all_sorted_creatures.xlsx')
sample_per_house = 3
samples = df.groupby('House Assignment').apply(lambda x: x.sample(n=sample_per_house)).reset_index(drop=True)
df = pd.read_excel('all_sorted_creatures.xlsx')
sample_per_house = 3
samples = df.groupby('House Assignment').apply(lambda x: x.sample(n=sample_per_house)).reset_index(drop=True)

Final Analysis

Now that we've methodically categorized magical creatures into their respective Hogwarts houses based on distinct traits and characteristics, we move on to our final analysis which focuses on understanding the relationship between these creatures and the defining qualities of Gryffindor, Slytherin, Ravenclaw, and Hufflepuff. We've explored their geographical distributions, sizes, species, and behavioral patterns to uncover how they represent their houses across different cultures and environments. This report will share the key insights from our comprehensive study, highlighting the connections between the magical creatures and their Hogwarts affiliations.

 

Global Distribution - Sorted Creatures and Where to Find Them

 

While most Hogwarts houses are primarily centered in the UK, Slytherin stands out with a strong presence in the United States, highlighting its strategic positioning in one of the world's most influential countries. Slytherin's robust presence in ocean habitats, Africa, and Southeast Asia indicates a keen interest in regions known for their mystical legacies and potent magical creatures, aligning with their ambitious and resourceful nature.

Gryffindor's dominates in Greece, and China, regions tied to ancient myths and legends, which mirror the house’s adventurous and bold nature. Their presence is also notable in Africa, particularly in the Democratic Republic of the Congo, emphasizing their explorative spirit.

Ravenclaw is prominently present in India, the Iberian Peninsula, the Middle East, and Northern Europe, with a focus on Norway. These areas are key centers of historical and cultural knowledge, aligning with Ravenclaw’s intellectual values. Their activities in Greece and Egypt also highlight their pursuit of ancient wisdom.

Hufflepuff shows a unique affinity for creatures most notably in The United Kingdom and Ireland, a land rich in folklore and magical history that resonates with their appreciation for tradition and the natural world. Their significant presence in Austria and Belgium points to a connection with the heart of Europe, where many magical botanicals and creatures are found, reflecting Hufflepuff’s dedication to nurturing and community.

This narrative effectively captures the distinct geographical influences and interests of each Hogwarts house, highlighting their unique characteristics and how these align with their values and pursuits within the magical world.

Size and Species - Sorted Creatures and What They Look Like

The Scale of Sorted Creatures

An exploration of the varying creature sizes within the Houses, hinting at both the literal measurement of size and the metaphorical scale of their magical presence.

  • Slytherin shows a balanced distribution across all size categories, with a slight preference towards larger sizes.

  • Gryffindor tends to favor larger sizes, particularly in the 'Large' and 'Gigantic' categories, indicating a potential focus on grander or more physically robust creatures.

  • Ravenclaw exhibits a strong presence in the 'Small' and 'Large' categories, but less so in the 'Gigantic' category, suggesting a preference for moderately sized to larger creatures, but rarely the largest.

  • Hufflepuff has a significant skew towards smaller creatures, with the 'Small' category being predominantly higher than others. This might indicate a focus on more compact or manageable creatures capable of fitting into any type of environment.

Signature House Species - Is There a Connection?

An investigation into the distinct species residing within each Hogwarts House, revealing not only the diversity of magical creatures but also the unique connection they have with each House's symbolism.

Slytherin

  • Most Prominent Species: Reptile
  • Count in House: 7
  • Insight: Reptiles are the most notable species in Slytherin, echoing the house's emblem, the serpent.

Gryffindor

  • Most Prominent Species: Dragon
  • Count in House: 9
  • Insight: Dragons are the most prominent species in Gryffindor, aligning with the house's element of fire.

Ravenclaw

  • Most Prominent Species: Bird
  • Count in House: 6
  • Insight: Birds are the most significant species in Ravenclaw, which is fitting given the house's association with air, and its emblem, the eagle.

Hufflepuff

  • Most Prominent Species: Being
  • Count in House: 9
  • Insight: Beings hold the highest count in Hufflepuff, reflecting the house's inclusive nature and its emphasis on community. This suggests that Beings are likely highly integrated into the house's activities and values.

Thematic Connections - Are Behavioral Characteristics Shared?

Slytherin

  • Prominent Themes: Danger & Threat, Aggression & Dominance
  • Insight: The creatures associated with Slytherin display traits that emphasize risk and the ability to confront or impose threats, alongside a tendency towards assertiveness and establishing dominance.

Gryffindor

  • Prominent Themes: Aggression & Dominance, Loyalty Nobility
  • Insight: Gryffindor's creatures are characterized by a bold and valorous nature, showing both a capacity for leadership and a deep-seated loyalty, suggesting they possess both bravery and a strong sense of honor.

Ravenclaw

  • Prominent Themes: Wisdom & Knowledge, Negativity & Harm
  • Insight: Creatures in Ravenclaw are distinguished by their association with intelligence. The presence of negativity and harm suggests these creatures might also exhibit traits of being aloof or expressing negative emotions, potentially as a byproduct of their intelligence or sensitivity.

Hufflepuff

  • Prominent Themes: Guardianship & Protection, Mischief & Trickery
  • Insight: The creatures sorted into Hufflepuff show a protective nature, indicative of a nurturing and caring disposition. The element of mischief and trickery suggests these creatures also have a clever and adaptable side, capable of playful or cunning behavior.

Final Findings and Conclusions

Our analysis of the magical creatures sorted into Hogwarts houses provides a nuanced view of the alignment of magical life forms with the values and cultural landscapes of each house.

Slytherin:

  • The creatures associated with Slytherin, primarily reptiles, are found in regions like the United States, The Oceans, Africa, and Southeast Asia. These locations are not just strategic but deeply connected to indigenous myths and legends about mystical creatures. This aligns with Slytherin's traits of ambition and a knack for leveraging historical and mythical narratives to bolster their influence and mystique. The size distribution shows a balanced preference, slightly skewed towards larger creatures, highlighting Slytherin's appreciation for power and the formidable presence of these beings. The themes of danger, aggression, and dominance resonate with Slytherin's reputation for assertiveness and a strategic approach to challenges.

Gryffindor:

  • The creatures aligned with Gryffindor, notably dragons, are predominantly located in regions with a rich heritage of myths and legends surrounding heroes and epic battles, such as Africa, Greece, and China. The presence of Gryffindor creatures in such areas underlines the house’s appreciation for storied histories and their affinity for places where tales of courage and valor are celebrated. These creatures are mostly found in the 'Large' and 'Gigantic' categories, underscoring Gryffindor's attraction to formidable and awe-inspiring beasts that symbolize strength and heroism. The thematic focus on aggression and dominance, combined with loyalty and nobility, captures the essence of the boldness and the noble heart of their house.

Ravenclaw:

  • Ravenclaw's creatures, primarily birds, are prominently located in areas rich in intellectual heritage and historical learning, such as Egypt, the Iberian Peninsula, and India. These regions share a commonality in their profound contributions to the early foundations of knowledge, philosophy, and science. Egypt’s ancient libraries and scholarly traditions, the Iberian Peninsula's historical role in the European Renaissance, and India's ancient advancements in mathematics and philosophy exemplify the type of intellectual environments that resonate with Ravenclaw's values of wisdom and learning. The size distribution among Ravenclaw creatures, favoring 'Small' to 'Large', suggests a preference for creatures that symbolize thought and agility over brute strength, aligning with Ravenclaw’s cerebral nature. The themes of wisdom and knowledge are prevalent, with an occasional touch of negativity possibly reflecting the complex, often challenging pursuit of intellectual growth.

Hufflepuff:

  • Hufflepuff's smaller, community-oriented beings are most evident in places like Ireland and the United Kingdom, rich in folklore and tradition. This geographic preference highlights Hufflepuff’s values of nurturing and community. Their creatures’ smaller size emphasizes approachability and the close-knit nature of Hufflepuff. The themes of guardianship and protection, along with mischief and trickery, showcase their protective yet playful character.

Overall, the geographical and thematic alignments reveal deep connections between the creatures and their respective houses, highlighting how each house's characteristics are reflected not only in the creatures’ traits but also in their chosen environments. This analysis enhances our understanding of the ecological and cultural ties within the wizarding world, illustrating the rich narrative that magical creatures bring to the lore of Hogwarts.


Up Next: View the Completed Bestiary

Continue exploring magical creatures in our detailed Bestiary, which dives deeper into each specific creature, discussing their unique traits and the role they play in the wizarding world. Discover the intricate details that make each creature unique and their alignment with the different Hogwarts houses.

Explore The Hogwarts Bestiary


Acknowledgments & References

This project was significantly enhanced by a range of sources and tools that supported the data collection and analysis phases. However, it's noteworthy to mention that the dataset was meticulously compiled by me personally. This endeavor involved not only technical web scraping but also a rigorous process of assessment and validation to ensure the accuracy, reliability, and relevance of the information.

All headers, artwork, and dashboards for 'The Hogwarts Bestiary' were created by me. If you wish to use any of these materials, please contact me directly or ensure appropriate credit is given.

I extend my gratitude to the Harry Potter fan community whose insights were invaluable in gathering detailed information on the creatures for the dataset.


Contact Information

Interested in discussing 'The Hogwarts Bestiary' or have questions about the project? Feel free to reach out, and let's chat!

I'm usually available for a chat on weekdays from 9 AM to 5 PM. Looking forward to hearing from you!

!jupyter nbconvert --to html --template pj --embed-images "TheHogwartsBestiary_FullProject.ipynb"