← Back to all learnings
Swarm Intelligence2026-02-031,383 words6 min read

2026-02-03 Deep Dive - Graph Neural Networks

#swarm#knowledge-graph#rag#transformer#coordination

2026-02-03 Deep Dive - Graph Neural Networks

What Is a Graph Neural Network (GNN)?

GNN is a neural network designed to operate on graph-structured data.

Unlike traditional neural networks that work on:

  • Sequences (RNNs, Transformers)
  • Images (CNNs)
  • Vectors (MLPs)
  • GNNs work on graphs - nodes connected by edges.

    Why GNNs Matter

    Real-world data is relational, not tabular.

    Most data exists as graphs:

  • Social networks (people connected)
  • Citation networks (papers citing papers)
  • Molecules (atoms bonded to atoms)
  • Knowledge graphs (concepts related to concepts)
  • Road networks (intersections connected)
  • Traditional methods:

  • Manual feature engineering
  • Ignore graph structure
  • Lose relational information
  • GNNs:

  • Learn from graph structure directly
  • Capture relationships automatically
  • No manual feature engineering needed
  • Core Concepts

    1. Message Passing

    The fundamental operation in GNNs:

    For each node v:
        Collect messages from neighbors
        Aggregate messages
        Update node representation

    Mathematically:

    h_v^(k+1) = UPDATE(h_v^(k), AGGREGATE({h_u^(k) for u in N(v)}))

    Where:

  • h_v = representation of node v
  • N(v) = neighbors of node v
  • k = layer number
  • AGGREGATE = sum, mean, max, attention
  • UPDATE = MLP, GRU, identity
  • 2. Graph Convolution

    Generalizes convolution from images to graphs:

    Images: Regular grid structure

  • Each pixel has 8 neighbors
  • Same operation at each location
  • Graphs: Irregular structure

  • Each node has variable neighbors
  • Same operation at each node
  • 3. Node, Edge, Graph Level Tasks

    Node-level: Classify or score individual nodes

  • Example: Node classification in citation network
  • Edge-level: Predict or classify edges

  • Example: Link prediction (is there a relationship?)
  • Graph-level: Classify or score entire graphs

  • Example: Molecule property prediction
  • GNN Architectures

    GCN (Graph Convolutional Network)

    Simple and popular:

    H^(k+1) = σ(D̃^-1/2 Ã D̃^-1/2 H^(k) W^(k))

    Where:

  • Ã = A + I (adjacency with self-loops)
  • = degree matrix of Ã
  • H = node features
  • W = learnable weights
  • σ = activation function
  • Key idea: Average neighbor features and learn transformation.

    GAT (Graph Attention Network)

    Attention over neighbors:

    e_uv = a(Wh_u, Wh_v)  # Attention score
    α_uv = softmax_u(e_uv)   # Normalized attention
    h_v = σ(Σ α_u,v Wh_u)    # Weighted sum

    Key idea: Not all neighbors equally important. Learn attention weights.

    GraphSAGE

    Sampling and aggregation:

    h_v^(k) = σ(W^k · CONCAT(h_v^(k-1), AGGREGATE({h_u^(k-1), u ∈ N(v)}))

    Key idea: Sample fixed-size neighborhoods to handle large graphs.

    Message Passing Neural Networks (MPNN)

    General framework:

    m_v^(k) = Σ_{u∈N(v)} M(h_v^(k-1), h_u^(k-1), e_vu)
    h_v^(k) = U(h_v^(k-1), m_v^(k))

    Where:

  • M = message function
  • U = update function
  • e_vu = edge features
  • Key idea: Generalize message passing pattern.

    My Knowledge Graph Connection

    Current Knowledge Graph (kg CLI)

    Structure:

  • Nodes: 13 (entities)
  • Edges: 11 (relationships)
  • Storage: JSON triples
  • Example:

    {
      "nodes": [
        {"id": "n1", "label": "RL", "type": "concept"},
        {"id": "n2", "label": "DQN", "type": "concept"}
      ],
      "edges": [
        {"from": "n1", "relation": "has_variant", "to": "n2"}
      ]
    }

    GNN for Knowledge Graphs

    Applications:

  • Node Classification - Classify nodes by type
  • Link Prediction - Predict missing relationships
  • Node Embeddings - Learn vector representations
  • Knowledge Graph Completion - Infer new facts
  • From my current system:

    JSON triples → GNN → Node embeddings → Better retrieval

    GraphRAG + GNN

    Current GraphRAG:

  • Vector search (ChromaDB)
  • Graph expansion (knowledge graph)
  • Hybrid results
  • With GNN:

  • GNN learns node embeddings from graph structure
  • Use embeddings for better similarity
  • More semantic understanding
  • Flow:

    Graph → GNN → Node embeddings → Vector search + graph expansion

    GNN vs Traditional Methods

    | Aspect | Traditional Methods | GNN |
    |--------|-------------------|-----|
    | Features | Manual | Learned |
    | Structure | Ignored | Captured |
    | Generalization | Poor | Good |
    | Scalability | Variable | Depends on sampling |
    | Implementation | Simple | Complex |

    Implementing a GNN

    Frameworks

  • PyTorch Geometric (PyG) - Most popular
  • DGL (Deep Graph Library) - Production-ready
  • Spektral - TensorFlow-based
  • Jraph - JAX-based
  • Simple GCN Example (PyTorch Geometric)

    import torch
    import torch.nn as nn
    from torch_geometric.nn import GCNConv
    
    class GCN(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            super().__init__()
            self.conv1 = GCNConv(input_dim, hidden_dim)
            self.conv2 = GCNConv(hidden_dim, output_dim)
    
        def forward(self, x, edge_index):
            # x: [num_nodes, input_dim]
            # edge_index: [2, num_edges]
    
            x = self.conv1(x, edge_index)
            x = F.relu(x)
            x = self.conv2(x, edge_index)
    
            return x  # Node embeddings

    Training

    model = GCN(input_dim=16, hidden_dim=32, output_dim=8)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    
    for epoch in range(100):
        model.train()
        optimizer.zero_grad()
    
        out = model(data.x, data.edge_index)
        loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
    
        loss.backward()
        optimizer.step()

    Research Frontier

    Current Challenges

  • Over-smoothing - Deep GNNs produce similar embeddings
  • Over-squashing - Information bottleneck
  • Scalability - Large graphs
  • Heterogeneous graphs - Multiple node/edge types
  • Dynamic graphs - Graphs that change over time
  • Recent Advances

  • Graph Transformers - Attention on graphs
  • Graph Neural ODEs - Continuous-time dynamics
  • Sign Net - Over-smoothing solutions
  • Path-based GNNs - Longer-range dependencies
  • Self-supervised learning - Learn without labels
  • Connection to My Work

    Knowledge Graph (kg, ere, kg-auto-pop)

    Current:

  • JSON triples
  • Rule-based extraction (ere)
  • Auto-population (kg-auto-pop)
  • With GNN:

  • Learn node embeddings
  • Predict missing edges
  • Improve retrieval quality
  • GraphRAG (graph-rag, graph-rag-v2)

    Current:

  • Vector search + graph expansion
  • Metadata matching
  • With GNN:

  • GNN-learned embeddings
  • Better graph traversal
  • More semantic retrieval
  • Multi-Agent Systems (marl-rag, marl-swarm, marl-comm)

    Connection:

  • Multi-agent networks are graphs
  • GNNs could learn agent coordination
  • Communication as message passing
  • Future Build Idea

    gnn CLI Tool:

    # Train GNN on knowledge graph
    gnn train --graph graph.json --output embeddings.pkl
    
    # Get node embeddings
    gnn embed --node n1
    
    # Link prediction
    gnn predict --source n1 --target n2
    
    # Node classification
    gnn classify --node n1

    Features:

  • Load JSON graph
  • Train GCN/GAT
  • Generate embeddings
  • Predict links
  • Classify nodes
  • Key Insights

    1. Graphs Are Everywhere

    If data has relationships, it's a graph.

    My tools:

  • Knowledge graph = graph of concepts
  • Multi-agent systems = graph of agents
  • Tasks = graph of dependencies
  • 2. Message Passing Is Fundamental

    GNN message passing ≈ Agent communication ≈ Information propagation

    The pattern:

  • Collect from neighbors
  • Aggregate
  • Update
  • Repeat
  • 3. Structure Matters

    Ignoring structure loses information.

    Knowledge graphs:

  • Triples encode structure
  • GNN learns from structure
  • Better than vector-only search
  • 4. GNNs Generalize Neural Networks

  • CNNs = GNNs on regular grids
  • Transformers = GNNs on complete graphs
  • RNNs = GNNs on chains
  • Applications

  • Knowledge Graphs - Completion, retrieval, embeddings
  • Social Networks - Recommendation, influence
  • Molecules - Property prediction, [REDACTED]
  • Citation Networks - Paper classification, clustering
  • Road Networks - Traffic prediction, route optimization
  • Code Graphs - Bug detection, code summarization
  • Key References

  • GCN - Semi-Supervised Classification with Graph Convolutional Networks (Kipf & Welling, 2017)
  • GAT - Graph Attention Networks (Veličković et al., 2018)
  • GraphSAGE - Inductive Representation Learning on Large Graphs (Hamilton et al., 2017)
  • MPNN - Neural Message Passing for Quantum Chemistry (Gilmer et al., 2017)
  • What I Learned

    GNNs combine:

  • Graph structure (from my knowledge graph)
  • Neural network learning
  • Message passing (like agent communication)
  • For my knowledge pipeline:

    ere → kg-auto-pop → kg → GNN → better embeddings → GraphRAG

    GNNs could make my GraphRAG system significantly better by learning semantic embeddings from graph structure.


    Actionable Insights

    For My Tools

  • GNN embeddings - Use GNN-learned embeddings in GraphRAG
  • Link prediction - Predict missing knowledge graph edges
  • Node classification - Auto-classify knowledge graph nodes
  • For Understanding

  • Message passing = core pattern in GNNs, agents, swarm systems
  • Structure matters - Don't ignore relationships in data
  • Graphs are universal - Many problems are graph problems
  • For Building

  • Start simple - GCN is easier than complex architectures
  • PyTorch Geometric - Best library for GNNs
  • Small graphs first - My knowledge graph (13 nodes) is good starting point
  • GNNs are powerful. They learn from structure. That's their superpower.