Guide
Genetic algorithms explained
Not every optimization problem has a smooth loss landscape you can descend with gradient descent. Scheduling twenty delivery vans across a city, designing a circuit board layout, or tuning twelve discrete hyperparameters at once can involve combinatorial explosions, noisy simulators, or objectives with no closed-form derivative. A genetic algorithm (GA) treats candidate solutions like organisms in an evolving population: evaluate fitness, breed the strong, mutate for diversity, and repeat for many generations until a good enough answer emerges. Inspired by natural selection but engineered for computers, GAs are a practical tool when brute force is impossible and calculus-based methods do not apply. This guide explains encoding, fitness functions, selection, crossover, and mutation, a Harbor Courier route-optimization worked example, a method decision table, common pitfalls, and a production checklist.
What a genetic algorithm optimizes
A GA searches a space of candidate solutions (chromosomes) to maximize or minimize a fitness function f(x). Unlike supervised learning, there is no training dataset of input-output pairs — only a scoring rule that judges how good each candidate is. The search is population-based: dozens or thousands of candidates compete each generation rather than a single point wandering the landscape.
GAs shine on problems where:
- The search space is discrete or mixed (permutations, bit strings, categorical choices).
- The objective is a black box — a simulator, spreadsheet, or legacy system you cannot differentiate.
- Local optima trap hill-climbing methods; population diversity helps escape.
- You need a good-enough solution under a time budget, not a provably global optimum.
They are weaker when gradients are available and cheap (use gradient descent or Bayesian optimization instead), when evaluation is extremely expensive per candidate without surrogate models, or when the problem is small enough for exhaustive search.
Chromosomes, genes, and encoding
A chromosome is one complete candidate solution, encoded as a vector of genes. Encoding is the first design decision and often determines success or failure:
- Binary strings — classic for on/off feature selection or knapsack problems.
- Real-valued vectors — continuous parameters bounded by min/max (mutation adds Gaussian noise).
- Permutations — orderings such as visit sequences in routing; crossover must preserve valid permutations (order crossover, PMX).
- Trees or graphs — used in genetic programming to evolve formulas or programs.
Invalid encodings produce illegal offspring (duplicate cities in a tour, negative inventory). Either design operators that always produce valid chromosomes or penalize invalid ones heavily in the fitness function.
The generational loop
A standard GA repeats these steps until a stopping criterion (max generations, fitness plateau, wall-clock limit):
- Initialize a population of random (or seeded) chromosomes.
- Evaluate fitness for every member.
- Select parents biased toward higher fitness.
- Crossover pairs of parents to produce offspring chromosomes.
- Mutate offspring with small probability to inject diversity.
- Replace the old population (generational GA) or merge with survivors (steady-state).
Selection: who gets to reproduce
Selection balances exploitation (copying good solutions) and exploration (giving weaker members a chance so the population does not collapse to clones). Common methods:
- Tournament selection — pick k random individuals; the fittest wins. Simple, popular, tunable via k.
- Roulette wheel (fitness proportionate) — probability proportional to fitness. Can dominate if one superstar appears early.
- Rank selection — probability by rank, not raw fitness — reduces premature convergence when fitness scales vary wildly.
The exploration-exploitation tension parallels multi-armed bandits and reinforcement learning, but GAs optimize a static objective over a batch of candidates rather than an agent learning online from sequential rewards.
Crossover: recombining building blocks
Crossover mixes genetic material from two parents. For binary strings, single-point crossover swaps tails at a random cut index. For permutations, specialized operators (order crossover, partially mapped crossover) preserve each gene exactly once. Crossover assumes schema — useful partial solutions can be assembled from parents; if genes interact strongly (epistasis), blind crossover breaks good combinations and the GA struggles.
Mutation: preserving diversity
Mutation randomly perturbs genes: flip a bit, swap two cities, nudge a real parameter. Typical mutation rates are low (0.1%–5% per gene) so crossover does most of the work. Without mutation, populations lose alleles and stall on local optima. Too much mutation turns search into random walk.
Elitism and steady-state variants
Elitism copies the top N individuals unchanged into the next generation, guaranteeing the best solution never worsens. Steady-state GAs replace only the weakest members each iteration, maintaining more continuity. Both are common production tweaks atop the textbook generational model.
Worked example: Harbor Courier route ordering
Harbor Courier dispatches three vans each morning to twelve neighborhood drop points. Distances between stops are known; each van must visit four stops and return to the depot. The goal: minimize total driving distance (a capacitated routing simplification). With 12 stops split into three groups of four, the assignment-and-ordering space is far too large to enumerate.
Encoding: a permutation of integers 1–12 representing visit order; a decoder splits the sequence into three segments of four stops for vans A, B, and C.
Fitness: negative total route distance (GA maximizes, so negate cost). Add a large penalty if any van exceeds a four-hour shift limit derived from distance and average speed.
Operators: tournament selection (k=3), order crossover (OX) for permutations, swap mutation on two random positions with 2% probability per offspring. Population size 200, elitism keeps top 10, run 500 generations or until fitness plateaus 50 generations.
Result: after roughly three minutes of evaluation on a laptop, the GA found a routing plan 18% shorter than the dispatcher's manual heuristic. It was not provably optimal — a mixed-integer solver might squeeze another few percent — but the GA required no gradient, handled the discrete permutation structure natively, and was easy to extend when Harbor added a thirteenth stop (re-run with updated distance matrix).
When to use a GA vs other optimizers
| Method | Best for | Weak when |
|---|---|---|
| Genetic algorithm | Discrete/combinatorial search, black-box objectives, many local optima | Smooth differentiable losses, tiny search spaces, need provable optimality |
| Gradient descent | Differentiable neural nets, convex or smooth losses | Permutations, integer-only choices, non-differentiable simulators |
| Grid / random search | Low-dimensional hyperparameter sweeps, simple baselines | High-dimensional or structured search spaces |
| Bayesian optimization | Expensive evaluations (hours per trial), continuous HP tuning | Very high dimension, highly multimodal discrete spaces without good kernels |
| Simulated annealing / hill climbing | Single-solution refinement, moderate dimension | Deep local optima without population diversity |
| Mixed-integer solver (MIP) | Routing/scheduling with linear constraints, optimality proofs | Nonlinear black-box objectives, very large instances |
Common pitfalls
- Bad encoding — crossover produces illegal solutions faster than selection can fix them.
- Fitness scaling bugs — negative or zero fitness breaks roulette selection; normalize or use tournament selection.
- Premature convergence — population becomes identical too early; increase mutation, diversify initialization, or use niching for multimodal objectives.
- Expensive fitness without caching — re-evaluating identical chromosomes wastes compute; memoize fitness by chromosome hash.
- Ignoring constraints — penalize violations smoothly; infeasible solutions should not accidentally win because penalties are too weak.
- Comparing one random seed — GAs are stochastic; report median and variance over multiple runs.
- Using a GA when a specialized solver exists — classical OR tools often beat generic GAs on standard routing if you can formulate the problem.
Practical checklist
- Define a clear fitness function aligned with the business metric (cost, latency, revenue).
- Choose an encoding where crossover and mutation preserve validity (or penalize violations).
- Log best, median, and worst fitness per generation — plots reveal convergence and stagnation.
- Start with tournament selection, elitism, and permutation-aware crossover for routing problems.
- Cache fitness evaluations for duplicate chromosomes.
- Run at least 10 independent seeds; report the best and median final fitness.
- Set a wall-clock budget; GAs are anytime algorithms — stop early if good enough.
- Benchmark against a simple heuristic (greedy nearest-neighbor, random search) before claiming victory.
- Version the simulator or objective — fitness landscape changes break reproducibility.
- Document population size, crossover rate, mutation rate, and selection method for every experiment.
Key takeaways
- Genetic algorithms evolve populations of candidate solutions via selection, crossover, and mutation.
- Encoding and operators matter more than fine-tuning population size — invalid offspring waste generations.
- GAs handle discrete and black-box objectives where gradients and exhaustive search fail.
- They trade optimality guarantees for flexibility and reasonable solutions under time limits.
- For differentiable machine learning training, use gradient-based methods; for combinatorial logistics and simulation tuning, GAs remain a sharp tool.
Related reading
- Hyperparameter tuning explained — grid, random, and Bayesian search when GAs are overkill
- Reinforcement learning explained — sequential decision-making with reward signals
- Multi-armed bandits explained — exploration vs exploitation in online settings
- Markov chains explained — stochastic processes underlying many simulators GAs optimize