Collectors are the strategies that tell a Stream how to turn a flow of elements into a concrete result such as a List, Map, number, or custom DTO. Conceptually, a collector answers the question: “Given a stream of T, how do I build a result R in a single reduction step?”
1. What is a Collector?
A Collector is a mutable reduction that accumulates stream elements into a container and optionally transforms that container into a final result. This is the formal definition of the Collector interface:
public interface Collector<T, A, R> {
Supplier<A> supplier();
BiConsumer<A, T> accumulator();
BinaryOperator<A> combiner();
Function<A, R> finisher();
Set<Characteristics> characteristics();
}
Where:
T – input element type coming from the stream.
A – mutable accumulator type used during collection (e.g. ArrayList<T>, Map<K,V>, statistics object).
R – final result type (may be the same as A).
The functions have clear responsibilities:
supplier – creates a new accumulator instance A.
accumulator – folds each element T into the accumulator A.
combiner – merges two accumulators (essential for parallel streams).
finisher – converts A to R (often identity, sometimes a transformation like making the result unmodifiable).
characteristics – hints like CONCURRENT, UNORDERED, IDENTITY_FINISH that allow stream implementations to optimize.
The Collectors utility class provides dozens of ready‑made collectors so you rarely need to implement Collector yourself. You use them via the Stream.collect(...) terminal operation:
<R> R collect(Collector<? super T, ?, R> collector)
You can think of this as: collector = recipe, and collect(recipe) = “execute this aggregation recipe on the stream.”
2. Collectors vs Collector
Two related but distinct concepts:
Collector (interface)
- Describes what a mutable reduction looks like in terms of
supplier, accumulator, combiner, finisher, characteristics.
Collectors (utility class)
- Provides static factory methods that create
Collector instances: toList(), toMap(...), groupingBy(...), mapping(...), teeing(...), etc.
As an engineer, you almost always use the factory methods on Collectors, and only occasionally need to implement a custom Collector directly.
3. Collectors.toMap – building maps with unique keys
Collectors.toMap builds a Map by turning each stream element into exactly one key–value pair. It is appropriate when you conceptually want one aggregate value per key.
3.1 Overloads and semantics
Key overloads:
toMap(keyMapper, valueMapper)
- Requires keys to be unique; on duplicates, throws
IllegalStateException.
toMap(keyMapper, valueMapper, mergeFunction)
- Uses
mergeFunction to decide what to do with duplicate keys (e.g. pick first, pick max, sum).
toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)
- Also allows specifying the
Map implementation (e.g. LinkedHashMap, TreeMap).
The explicit mergeFunction parameter is a deliberate design: the JDK authors wanted to prevent silent data loss, forcing you to define your collision semantics.
3.2 Example
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public record City(String name, String country, int population) {}
void main() {
List<City> cities = List.of(
new City("Paris", "France", 2_140_000),
new City("Nice", "France", 340_000),
new City("Berlin", "Germany", 3_600_000),
new City("Hamburg", "Germany", 1_800_000)
);
// Country -> largest city by population, preserve insertion order
Map<String, City> largestCityByCountry = cities.stream()
.collect(Collectors.toMap(
City::country,
city -> city,
(c1, c2) -> c1.population() >= c2.population() ? c1 : c2,
LinkedHashMap::new
));
System.out.println(largestCityByCountry);
}
Rationale:
- We express domain logic (“keep the most populous city per country”) with a merge function instead of an extra grouping pass.
LinkedHashMap documents that iteration order matters (e.g. for responses or serialization) and keeps output deterministic.
4. Collectors.groupingBy – grouping and aggregating
Collectors.groupingBy is the collector analogue of SQL GROUP BY: it classifies elements into buckets and aggregates each bucket with a downstream collector. You use it when keys are not unique and you want collections or metrics per key.
4.1 Overloads and default shapes
Representative overloads:
groupingBy(classifier)
Map<K, List<T>>, using toList downstream.
groupingBy(classifier, downstream)
Map<K, D> where D is the downstream result (sum, count, set, custom type).
groupingBy(classifier, mapFactory, downstream)
- Adds control over the map implementation.
This design splits the problem into classification (classifier) and aggregation (downstream), which makes collectors highly composable.
4.2 Example
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public record Order(String city, String status, double amount) {}
void main() {
List<Order> orders = List.of(
new Order("Auckland", "NEW", 100),
new Order("Auckland", "NEW", 200),
new Order("Auckland", "SHIPPED", 150),
new Order("Wellington", "NEW", 300)
);
// City -> list of orders
Map<String, List<Order>> ordersByCity = orders.stream()
.collect(Collectors.groupingBy(Order::city));
// City -> total amount
Map<String, Double> totalByCity = orders.stream()
.collect(Collectors.groupingBy(
Order::city,
Collectors.summingDouble(Order::amount)
));
// Status -> number of orders
Map<String, Long> countByStatus = orders.stream()
.collect(Collectors.groupingBy(
Order::status,
Collectors.counting()
));
System.out.println("Orders by city: " + ordersByCity);
System.out.println("Total by city: " + totalByCity);
System.out.println("Count by status: " + countByStatus);
}
Rationale:
- We avoid explicit
Map mutation and nested conditionals; aggregation logic is declarative and parallel‑safe by construction.
- Downstream collectors like
summingDouble and counting can be reused for other groupings.
5. Composing collectors – mapping, filtering, flatMapping, collectingAndThen
Collectors are designed to be nested, especially as downstreams of groupingBy or partitioningBy. This composability is what turns them into a mini DSL for aggregation.
5.1 mapping – transform before collecting
mapping(mapper, downstream) applies a mapping to each element, then forwards the result to a downstream collector. Use it when you don’t want to store the full original element in the group.
Example: department → distinct employee names.
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
public record Employee(String department, String name) {}
void main() {
List<Employee> employees = List.of(
new Employee("Engineering", "Alice"),
new Employee("Engineering", "Alice"),
new Employee("Engineering", "Bob"),
new Employee("Sales", "Carol")
);
Map<String, Set<String>> namesByDept = employees.stream()
.collect(Collectors.groupingBy(
Employee::department,
Collectors.mapping(Employee::name, Collectors.toSet())
));
System.out.println(namesByDept);
}
Rationale:
- We avoid storing full
Employee objects when we only need names, reducing memory and making the intent explicit.
5.2 filtering – per-group filtering
filtering(predicate, downstream) (Java 9+) filters elements at the collector level. Unlike stream.filter, it keeps the outer grouping key even if the filtered collection becomes empty.
Example: city → list of large orders (≥ 150), but preserve all cities as keys.
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public record Order(String city, double amount) {}
void main() {
List<Order> orders = List.of(
new Order("Auckland", 100),
new Order("Auckland", 200),
new Order("Wellington", 50),
new Order("Wellington", 300)
);
Map<String, List<Order>> largeOrdersByCity = orders.stream()
.collect(Collectors.groupingBy(
Order::city,
Collectors.filtering(
o -> o.amount() >= 150,
Collectors.toList()
)
));
System.out.println(largeOrdersByCity);
}
Rationale:
- This approach preserves the full key space (e.g. all cities), which can be important for UI or reporting, while still applying a per-group filter.
5.3 flatMapping – flatten nested collections
flatMapping(mapperToStream, downstream) (Java 9+) flattens nested collections or streams before collecting.
Example: department → set of all courses taught there.
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
public record Staff(String department, List<String> courses) {}
void main() {
List<Staff> staff = List.of(
new Staff("CS", List.of("Algorithms", "DS")),
new Staff("CS", List.of("Computer Architecture")),
new Staff("Math", List.of("Discrete Maths", "Probability"))
);
Map<String, Set<String>> coursesByDept = staff.stream()
.collect(Collectors.groupingBy(
Staff::department,
Collectors.flatMapping(
s -> s.courses().stream(),
Collectors.toSet()
)
));
System.out.println(coursesByDept);
}
Rationale:
- Without
flatMapping, you’d get Set<Set<String>> or need an extra pass to flatten; this keeps it one-pass and semantically clear.
5.4 collectingAndThen – post-process a collected result
collectingAndThen(downstream, finisher) applies a finisher function to the result of the downstream collector.
Example: collect to an unmodifiable list.
import java.util.List;
import java.util.stream.Collectors;
void main() {
List<String> names = List.of("Alice", "Bob", "Carol");
List<String> unmodifiableNames = names.stream()
.collect(Collectors.collectingAndThen(
Collectors.toList(),
List::copyOf
));
System.out.println(unmodifiableNames);
}
Rationale:
- It encapsulates the “collect then wrap” pattern into a single collector, improving readability and signaling immutability explicitly.
5.5 Nested composition example
Now combine several of these ideas:
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
public record Employee(String department, String city, String name, int age) {}
void main() {
List<Employee> employees = List.of(
new Employee("Engineering", "Auckland", "Alice", 30),
new Employee("Engineering", "Auckland", "Bob", 26),
new Employee("Engineering", "Wellington", "Carol", 35),
new Employee("Sales", "Auckland", "Dave", 40)
);
// Department -> City -> unmodifiable set of names for employees age >= 30
Map<String, Map<String, Set<String>>> result = employees.stream()
.collect(Collectors.groupingBy(
Employee::department,
Collectors.groupingBy(
Employee::city,
Collectors.collectingAndThen(
Collectors.filtering(
e -> e.age() >= 30,
Collectors.mapping(Employee::name, Collectors.toSet())
),
Set::copyOf
)
)
));
System.out.println(result);
}
Rationale:
- We express a fairly involved requirement in a single declarative pipeline and single pass, instead of multiple nested maps and loops.
- Each collector in the composition captures a small, local concern (grouping, filtering, mapping, immutability).
6. Collectors.teeing – two collectors, one pass
Collectors.teeing (Java 12+) runs two collectors over the same stream in one pass and merges their results with a BiFunction.
Signature:
public static <T, R1, R2, R> Collector<T, ?, R>
teeing(Collector<? super T, ?, R1> downstream1,
Collector<? super T, ?, R2> downstream2,
java.util.function.BiFunction<? super R1, ? super R2, R> merger)
Use teeing when you want multiple aggregates (min and max, count and average, etc.) from the same data in one traversal.
6.1 Example: Stats in one pass
import java.util.List;
import java.util.stream.Collectors;
public record Stats(long count, int min, int max, double average) {}
void main() {
List<Integer> numbers = List.of(5, 12, 19, 21);
Stats stats = numbers.stream()
.collect(Collectors.teeing(
Collectors.summarizingInt(Integer::intValue),
Collectors.teeing(
Collectors.minBy(Integer::compareTo),
Collectors.maxBy(Integer::compareTo),
(minOpt, maxOpt) -> new int[] {
minOpt.orElseThrow(),
maxOpt.orElseThrow()
}
),
(summary, minMax) -> new Stats(
summary.getCount(),
minMax[0],
minMax[1],
summary.getAverage()
)
));
System.out.println(stats);
}
Rationale:
- We avoid traversing
numbers multiple times or managing manual mutable state (counters, min/max variables).
- We can reuse existing collectors (
summarizingInt, minBy, maxBy) and compose them via teeing for a single-pass, parallelizable aggregation.
7. When to choose which collector
For design decisions, the following mental model works well:
| Scenario |
Collector pattern |
| One value per key, need explicit handling of collisions |
toMap (with merge & mapSupplier as needed) |
| Many values per key (lists, sets, or metrics) |
groupingBy + downstream (toList, counting, etc.) |
| Need per-group transformation/filtering/flattening |
groupingBy with mapping, filtering, flatMapping |
| Need post-processing of collected result |
collectingAndThen(...) |
| Two independent aggregates, one traversal |
teeing(collector1, collector2, merger) |
Viewed as a whole, collectors form a high-level, composable DSL for aggregation, while the Stream interface stays relatively small and general. Treating collectors as “aggregation policies” lets you reason about what result you want, while delegating how to accumulate, combine, and finish to the carefully designed mechanisms of the Collectors API.
Recent Comments