Graph Theory
🧩 Graph Theory
Graph theory is the study of Graphs, which are mathematical structures used to model pairwise relations between objects. In computer science, graphs are the foundation for network routing, social media connections, and task scheduling.
🟢 Part 1: Core Definitions
A graph consists of a set of Vertices (Nodes) and Edges (Relationships).
1. Types of Graphs
- Undirected Graph: Edges have no direction. (e.g., Friendship in Facebook)
- Directed Graph (Digraph): Edges have a direction. (e.g., Followers on Twitter)
- Weighted Graph: Edges have values associated with them. (e.g., Distance in a roadmap)
- Bipartite Graph: A graph whose vertices can be divided into two disjoint sets such that every edge connects a vertex in one set to one in the other.
2. Basic Terminology
- Adjacency: Two nodes are adjacent if they are connected by an edge.
- Degree: The number of edges connected to a node.
- Path: A sequence of edges which join a sequence of vertices.
- Cycle: A path where the starting and ending vertices are the same.
🟡 Part 2: Traversals and Connectivity
How do we search through a graph? The two fundamental algorithms are:
1. Breadth-First Search (BFS)
Explores neighbor nodes first, before moving to the next level neighbors.
- Use Case: Finding the Shortest Path in an unweighted graph.
- Complexity: .
2. Depth-First Search (DFS)
Explores as far as possible along each branch before backtracking.
- Use Case: Checking for Connectivity or finding Cycles.
- Complexity: .
🔴 Part 3: Directed Acyclic Graphs (DAGs)
A DAG is a directed graph with no cycles. This is the single most important graph structure in modern data engineering.
1. Topological Sort
A linear ordering of vertices such that for every directed edge , comes before .
- CS Context: This is used for Task Scheduling. If task B depends on task A, task A must appear first in the topological sort.
2. Dependency Management
Systems like Airflow, Spark, and Git use DAGs to represent work flows.
- Circular Dependency Error: This happens when your DAG has a cycle (e.g., Task A needs B, B needs C, and C needs A). It becomes un-runnable.
💻 Programming Example: Representing a Graph
Graphs are commonly represented using an Adjacency List (memory efficient for sparse graphs).
# Adjacency List representation
graph = {
'A': ['B', 'C'],
'B': ['D'],
'C': ['D'],
'D': []
}
def find_path(graph, start, end, path=[]):
path = path + [start]
if start == end:
return path
for node in graph[start]:
if node not in path:
new_path = find_path(graph, node, end, path)
if new_path: return new_path
return None
# Find path from A to D
print(f"Path A -> D: {find_path(graph, 'A', 'D')}") # ['A', 'B', 'D']Why Graph Theory Matters
- Network Routing: OSPF and BGP use Dijkstra’s algorithm to find paths.
- Recommendation Engines: Analyzing the “distance” between users and items.
- Compilers: Control-flow graphs (CFG) analyze the structure of code to optimize it.