Collectors are the strategies that tell a Stream how to turn a flow of elements into a concrete result such as a List, Map, number, or custom DTO. Conceptually, a collector answers the question: “Given a stream of T, how do I build a result R in a single reduction step?”


1. What is a Collector?

A Collector is a mutable reduction that accumulates stream elements into a container and optionally transforms that container into a final result. This is the formal definition of the Collector interface:

public interface Collector<T, A, R> {
    Supplier<A> supplier();
    BiConsumer<A, T> accumulator();
    BinaryOperator<A> combiner();
    Function<A, R> finisher();
    Set<Characteristics> characteristics();
}

Where:

  • T – input element type coming from the stream.
  • A – mutable accumulator type used during collection (e.g. ArrayList<T>, Map<K,V>, statistics object).
  • R – final result type (may be the same as A).

The functions have clear responsibilities:

  • supplier – creates a new accumulator instance A.
  • accumulator – folds each element T into the accumulator A.
  • combiner – merges two accumulators (essential for parallel streams).
  • finisher – converts A to R (often identity, sometimes a transformation like making the result unmodifiable).
  • characteristics – hints like CONCURRENT, UNORDERED, IDENTITY_FINISH that allow stream implementations to optimize.

The Collectors utility class provides dozens of ready‑made collectors so you rarely need to implement Collector yourself. You use them via the Stream.collect(...) terminal operation:

<R> R collect(Collector<? super T, ?, R> collector)

You can think of this as: collector = recipe, and collect(recipe) = “execute this aggregation recipe on the stream.”


2. Collectors vs Collector

Two related but distinct concepts:

  • Collector (interface)
    • Describes what a mutable reduction looks like in terms of supplier, accumulator, combiner, finisher, characteristics.
  • Collectors (utility class)
    • Provides static factory methods that create Collector instances: toList(), toMap(...), groupingBy(...), mapping(...), teeing(...), etc.

As an engineer, you almost always use the factory methods on Collectors, and only occasionally need to implement a custom Collector directly.


3. Collectors.toMap – building maps with unique keys

Collectors.toMap builds a Map by turning each stream element into exactly one key–value pair. It is appropriate when you conceptually want one aggregate value per key.

3.1 Overloads and semantics

Key overloads:

  • toMap(keyMapper, valueMapper)
    • Requires keys to be unique; on duplicates, throws IllegalStateException.
  • toMap(keyMapper, valueMapper, mergeFunction)
    • Uses mergeFunction to decide what to do with duplicate keys (e.g. pick first, pick max, sum).
  • toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)
    • Also allows specifying the Map implementation (e.g. LinkedHashMap, TreeMap).

The explicit mergeFunction parameter is a deliberate design: the JDK authors wanted to prevent silent data loss, forcing you to define your collision semantics.

3.2 Example

import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public record City(String name, String country, int population) {}

void main() {
    List<City> cities = List.of(
            new City("Paris", "France", 2_140_000),
            new City("Nice", "France", 340_000),
            new City("Berlin", "Germany", 3_600_000),
            new City("Hamburg", "Germany", 1_800_000)
    );

    // Country -> largest city by population, preserve insertion order
    Map<String, City> largestCityByCountry = cities.stream()
            .collect(Collectors.toMap(
                    City::country,
                    city -> city,
                    (c1, c2) -> c1.population() >= c2.population() ? c1 : c2,
                    LinkedHashMap::new
            ));

    System.out.println(largestCityByCountry);
}

Rationale:

  • We express domain logic (“keep the most populous city per country”) with a merge function instead of an extra grouping pass.
  • LinkedHashMap documents that iteration order matters (e.g. for responses or serialization) and keeps output deterministic.

4. Collectors.groupingBy – grouping and aggregating

Collectors.groupingBy is the collector analogue of SQL GROUP BY: it classifies elements into buckets and aggregates each bucket with a downstream collector. You use it when keys are not unique and you want collections or metrics per key.

4.1 Overloads and default shapes

Representative overloads:

  • groupingBy(classifier)
    • Map<K, List<T>>, using toList downstream.
  • groupingBy(classifier, downstream)
    • Map<K, D> where D is the downstream result (sum, count, set, custom type).
  • groupingBy(classifier, mapFactory, downstream)
    • Adds control over the map implementation.

This design splits the problem into classification (classifier) and aggregation (downstream), which makes collectors highly composable.

4.2 Example

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public record Order(String city, String status, double amount) {}

void main() {
    List<Order> orders = List.of(
            new Order("Auckland", "NEW", 100),
            new Order("Auckland", "NEW", 200),
            new Order("Auckland", "SHIPPED", 150),
            new Order("Wellington", "NEW", 300)
    );

    // City -> list of orders
    Map<String, List<Order>> ordersByCity = orders.stream()
            .collect(Collectors.groupingBy(Order::city));

    // City -> total amount
    Map<String, Double> totalByCity = orders.stream()
            .collect(Collectors.groupingBy(
                    Order::city,
                    Collectors.summingDouble(Order::amount)
            ));

    // Status -> number of orders
    Map<String, Long> countByStatus = orders.stream()
            .collect(Collectors.groupingBy(
                    Order::status,
                    Collectors.counting()
            ));

    System.out.println("Orders by city: " + ordersByCity);
    System.out.println("Total by city: " + totalByCity);
    System.out.println("Count by status: " + countByStatus);
}

Rationale:

  • We avoid explicit Map mutation and nested conditionals; aggregation logic is declarative and parallel‑safe by construction.
  • Downstream collectors like summingDouble and counting can be reused for other groupings.

5. Composing collectors – mapping, filtering, flatMapping, collectingAndThen

Collectors are designed to be nested, especially as downstreams of groupingBy or partitioningBy. This composability is what turns them into a mini DSL for aggregation.

5.1 mapping – transform before collecting

mapping(mapper, downstream) applies a mapping to each element, then forwards the result to a downstream collector. Use it when you don’t want to store the full original element in the group.

Example: department → distinct employee names.

import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

public record Employee(String department, String name) {}

void main() {
    List<Employee> employees = List.of(
            new Employee("Engineering", "Alice"),
            new Employee("Engineering", "Alice"),
            new Employee("Engineering", "Bob"),
            new Employee("Sales", "Carol")
    );

    Map<String, Set<String>> namesByDept = employees.stream()
            .collect(Collectors.groupingBy(
                    Employee::department,
                    Collectors.mapping(Employee::name, Collectors.toSet())
            ));

    System.out.println(namesByDept);
}

Rationale:

  • We avoid storing full Employee objects when we only need names, reducing memory and making the intent explicit.

5.2 filtering – per-group filtering

filtering(predicate, downstream) (Java 9+) filters elements at the collector level. Unlike stream.filter, it keeps the outer grouping key even if the filtered collection becomes empty.

Example: city → list of large orders (≥ 150), but preserve all cities as keys.

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public record Order(String city, double amount) {}

void main() {
    List<Order> orders = List.of(
            new Order("Auckland", 100),
            new Order("Auckland", 200),
            new Order("Wellington", 50),
            new Order("Wellington", 300)
    );

    Map<String, List<Order>> largeOrdersByCity = orders.stream()
            .collect(Collectors.groupingBy(
                    Order::city,
                    Collectors.filtering(
                            o -> o.amount() >= 150,
                            Collectors.toList()
                    )
            ));

    System.out.println(largeOrdersByCity);
}

Rationale:

  • This approach preserves the full key space (e.g. all cities), which can be important for UI or reporting, while still applying a per-group filter.

5.3 flatMapping – flatten nested collections

flatMapping(mapperToStream, downstream) (Java 9+) flattens nested collections or streams before collecting.

Example: department → set of all courses taught there.

import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

public record Staff(String department, List<String> courses) {}

void main() {
    List<Staff> staff = List.of(
            new Staff("CS", List.of("Algorithms", "DS")),
            new Staff("CS", List.of("Computer Architecture")),
            new Staff("Math", List.of("Discrete Maths", "Probability"))
    );

    Map<String, Set<String>> coursesByDept = staff.stream()
            .collect(Collectors.groupingBy(
                    Staff::department,
                    Collectors.flatMapping(
                            s -> s.courses().stream(),
                            Collectors.toSet()
                    )
            ));

    System.out.println(coursesByDept);
}

Rationale:

  • Without flatMapping, you’d get Set<Set<String>> or need an extra pass to flatten; this keeps it one-pass and semantically clear.

5.4 collectingAndThen – post-process a collected result

collectingAndThen(downstream, finisher) applies a finisher function to the result of the downstream collector.

Example: collect to an unmodifiable list.

import java.util.List;
import java.util.stream.Collectors;

void main() {
    List<String> names = List.of("Alice", "Bob", "Carol");

    List<String> unmodifiableNames = names.stream()
            .collect(Collectors.collectingAndThen(
                    Collectors.toList(),
                    List::copyOf
            ));

    System.out.println(unmodifiableNames);
}

Rationale:

  • It encapsulates the “collect then wrap” pattern into a single collector, improving readability and signaling immutability explicitly.

5.5 Nested composition example

Now combine several of these ideas:

import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

public record Employee(String department, String city, String name, int age) {}

void main() {
    List<Employee> employees = List.of(
            new Employee("Engineering", "Auckland", "Alice", 30),
            new Employee("Engineering", "Auckland", "Bob", 26),
            new Employee("Engineering", "Wellington", "Carol", 35),
            new Employee("Sales", "Auckland", "Dave", 40)
    );

    // Department -> City -> unmodifiable set of names for employees age >= 30
    Map<String, Map<String, Set<String>>> result = employees.stream()
            .collect(Collectors.groupingBy(
                    Employee::department,
                    Collectors.groupingBy(
                            Employee::city,
                            Collectors.collectingAndThen(
                                    Collectors.filtering(
                                            e -> e.age() >= 30,
                                            Collectors.mapping(Employee::name, Collectors.toSet())
                                    ),
                                    Set::copyOf
                            )
                    )
            ));

    System.out.println(result);
}

Rationale:

  • We express a fairly involved requirement in a single declarative pipeline and single pass, instead of multiple nested maps and loops.
  • Each collector in the composition captures a small, local concern (grouping, filtering, mapping, immutability).

6. Collectors.teeing – two collectors, one pass

Collectors.teeing (Java 12+) runs two collectors over the same stream in one pass and merges their results with a BiFunction.

Signature:

public static <T, R1, R2, R> Collector<T, ?, R>
teeing(Collector<? super T, ?, R1> downstream1,
       Collector<? super T, ?, R2> downstream2,
       java.util.function.BiFunction<? super R1, ? super R2, R> merger)

Use teeing when you want multiple aggregates (min and max, count and average, etc.) from the same data in one traversal.

6.1 Example: Stats in one pass

import java.util.List;
import java.util.stream.Collectors;

public record Stats(long count, int min, int max, double average) {}

void main() {
    List<Integer> numbers = List.of(5, 12, 19, 21);

    Stats stats = numbers.stream()
            .collect(Collectors.teeing(
                    Collectors.summarizingInt(Integer::intValue),
                    Collectors.teeing(
                            Collectors.minBy(Integer::compareTo),
                            Collectors.maxBy(Integer::compareTo),
                            (minOpt, maxOpt) -> new int[] {
                                    minOpt.orElseThrow(),
                                    maxOpt.orElseThrow()
                            }
                    ),
                    (summary, minMax) -> new Stats(
                            summary.getCount(),
                            minMax[0],
                            minMax[1],
                            summary.getAverage()
                    )
            ));

    System.out.println(stats);
}

Rationale:

  • We avoid traversing numbers multiple times or managing manual mutable state (counters, min/max variables).
  • We can reuse existing collectors (summarizingInt, minBy, maxBy) and compose them via teeing for a single-pass, parallelizable aggregation.

7. When to choose which collector

For design decisions, the following mental model works well:

Scenario Collector pattern
One value per key, need explicit handling of collisions toMap (with merge & mapSupplier as needed)
Many values per key (lists, sets, or metrics) groupingBy + downstream (toList, counting, etc.)
Need per-group transformation/filtering/flattening groupingBy with mapping, filtering, flatMapping
Need post-processing of collected result collectingAndThen(...)
Two independent aggregates, one traversal teeing(collector1, collector2, merger)

Viewed as a whole, collectors form a high-level, composable DSL for aggregation, while the Stream interface stays relatively small and general. Treating collectors as “aggregation policies” lets you reason about what result you want, while delegating how to accumulate, combine, and finish to the carefully designed mechanisms of the Collectors API.