{"id":2114,"date":"2026-02-07T02:25:58","date_gmt":"2026-02-06T13:25:58","guid":{"rendered":"https:\/\/www.ronella.xyz\/?p=2114"},"modified":"2026-02-07T02:25:58","modified_gmt":"2026-02-06T13:25:58","slug":"java-stream-collectors","status":"publish","type":"post","link":"https:\/\/www.ronella.xyz\/?p=2114","title":{"rendered":"Java Stream Collectors"},"content":{"rendered":"<p>Collectors are the <strong>strategies<\/strong> that tell a Stream how to turn a flow of elements into a concrete result such as a <code>List<\/code>, <code>Map<\/code>, number, or custom DTO. Conceptually, a collector answers the question: \u201cGiven a stream of <code>T<\/code>, how do I build a result <code>R<\/code> in a single reduction step?\u201d<\/p>\n<hr \/>\n<h2>1. What is a Collector?<\/h2>\n<p>A <code>Collector<\/code> is a <em>mutable reduction<\/em> that accumulates stream elements into a container and optionally transforms that container into a final result. This is the formal definition of the <code>Collector<\/code> interface:<\/p>\n<pre><code class=\"language-java\">public interface Collector&lt;T, A, R&gt; {\n    Supplier&lt;A&gt; supplier();\n    BiConsumer&lt;A, T&gt; accumulator();\n    BinaryOperator&lt;A&gt; combiner();\n    Function&lt;A, R&gt; finisher();\n    Set&lt;Characteristics&gt; characteristics();\n}<\/code><\/pre>\n<p>Where:<\/p>\n<ul>\n<li><code>T<\/code> \u2013 input element type coming from the stream.  <\/li>\n<li><code>A<\/code> \u2013 mutable accumulator type used during collection (e.g. <code>ArrayList&lt;T&gt;<\/code>, <code>Map&lt;K,V&gt;<\/code>, statistics object).  <\/li>\n<li><code>R<\/code> \u2013 final result type (may be the same as <code>A<\/code>).<\/li>\n<\/ul>\n<p>The functions have clear responsibilities:<\/p>\n<ul>\n<li><code>supplier<\/code> \u2013 creates a new accumulator instance <code>A<\/code>.  <\/li>\n<li><code>accumulator<\/code> \u2013 folds each element <code>T<\/code> into the accumulator <code>A<\/code>.  <\/li>\n<li><code>combiner<\/code> \u2013 merges two accumulators (essential for parallel streams).  <\/li>\n<li><code>finisher<\/code> \u2013 converts <code>A<\/code> to <code>R<\/code> (often identity, sometimes a transformation like making the result unmodifiable).  <\/li>\n<li><code>characteristics<\/code> \u2013 hints like <code>CONCURRENT<\/code>, <code>UNORDERED<\/code>, <code>IDENTITY_FINISH<\/code> that allow stream implementations to optimize.<\/li>\n<\/ul>\n<p>The <code>Collectors<\/code> utility class provides dozens of ready\u2011made collectors so you rarely need to implement <code>Collector<\/code> yourself. You use them via the <code>Stream.collect(...)<\/code> terminal operation:<\/p>\n<pre><code class=\"language-java\">&lt;R&gt; R collect(Collector&lt;? super T, ?, R&gt; collector)<\/code><\/pre>\n<p>You can think of this as: <strong>collector = recipe<\/strong>, and <code>collect(recipe)<\/code> = \u201cexecute this aggregation recipe on the stream.\u201d<\/p>\n<hr \/>\n<h2>2. Collectors vs Collector<\/h2>\n<p>Two related but distinct concepts:<\/p>\n<ul>\n<li><code>Collector<\/code> (interface)\n<ul>\n<li>Describes <em>what<\/em> a mutable reduction looks like in terms of <code>supplier<\/code>, <code>accumulator<\/code>, <code>combiner<\/code>, <code>finisher<\/code>, <code>characteristics<\/code>.<\/li>\n<\/ul>\n<\/li>\n<li><code>Collectors<\/code> (utility class)\n<ul>\n<li>Provides static factory methods that create <code>Collector<\/code> instances: <code>toList()<\/code>, <code>toMap(...)<\/code>, <code>groupingBy(...)<\/code>, <code>mapping(...)<\/code>, <code>teeing(...)<\/code>, etc.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>As an engineer, you almost always use the <em>factory methods<\/em> on <code>Collectors<\/code>, and only occasionally need to implement a custom <code>Collector<\/code> directly.<\/p>\n<hr \/>\n<h2>3. Collectors.toMap \u2013 building maps with unique keys<\/h2>\n<p><code>Collectors.toMap<\/code> builds a <code>Map<\/code> by turning each stream element into exactly one key\u2013value pair. It is appropriate when you conceptually want <strong>one aggregate value per key<\/strong>.<\/p>\n<h3>3.1 Overloads and semantics<\/h3>\n<p>Key overloads:<\/p>\n<ul>\n<li><code>toMap(keyMapper, valueMapper)<\/code>\n<ul>\n<li>Requires keys to be unique; on duplicates, throws <code>IllegalStateException<\/code>.  <\/li>\n<\/ul>\n<\/li>\n<li><code>toMap(keyMapper, valueMapper, mergeFunction)<\/code>\n<ul>\n<li>Uses <code>mergeFunction<\/code> to decide what to do with duplicate keys (e.g. pick first, pick max, sum).  <\/li>\n<\/ul>\n<\/li>\n<li><code>toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)<\/code>\n<ul>\n<li>Also allows specifying the <code>Map<\/code> implementation (e.g. <code>LinkedHashMap<\/code>, <code>TreeMap<\/code>).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>The explicit <code>mergeFunction<\/code> parameter is a deliberate design: the JDK authors wanted to prevent silent data loss, forcing you to define your collision semantics.<\/p>\n<h3>3.2 Example<\/h3>\n<pre><code class=\"language-java\">import java.util.LinkedHashMap;\nimport java.util.List;\nimport java.util.Map;\nimport java.util.stream.Collectors;\n\npublic record City(String name, String country, int population) {}\n\nvoid main() {\n    List&lt;City&gt; cities = List.of(\n            new City(&quot;Paris&quot;, &quot;France&quot;, 2_140_000),\n            new City(&quot;Nice&quot;, &quot;France&quot;, 340_000),\n            new City(&quot;Berlin&quot;, &quot;Germany&quot;, 3_600_000),\n            new City(&quot;Hamburg&quot;, &quot;Germany&quot;, 1_800_000)\n    );\n\n    \/\/ Country -&gt; largest city by population, preserve insertion order\n    Map&lt;String, City&gt; largestCityByCountry = cities.stream()\n            .collect(Collectors.toMap(\n                    City::country,\n                    city -&gt; city,\n                    (c1, c2) -&gt; c1.population() &gt;= c2.population() ? c1 : c2,\n                    LinkedHashMap::new\n            ));\n\n    System.out.println(largestCityByCountry);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>We express domain logic (\u201ckeep the most populous city per country\u201d) with a merge function instead of an extra grouping pass.<\/li>\n<li><code>LinkedHashMap<\/code> documents that iteration order matters (e.g. for responses or serialization) and keeps output deterministic.<\/li>\n<\/ul>\n<hr \/>\n<h2>4. Collectors.groupingBy \u2013 grouping and aggregating<\/h2>\n<p><code>Collectors.groupingBy<\/code> is the collector analogue of SQL <code>GROUP BY<\/code>: it classifies elements into buckets and aggregates each bucket with a downstream collector. You use it when keys are <strong>not unique<\/strong> and you want collections or metrics per key.<\/p>\n<h3>4.1 Overloads and default shapes<\/h3>\n<p>Representative overloads:<\/p>\n<ul>\n<li><code>groupingBy(classifier)<\/code>\n<ul>\n<li><code>Map&lt;K, List&lt;T&gt;&gt;<\/code>, using <code>toList<\/code> downstream.  <\/li>\n<\/ul>\n<\/li>\n<li><code>groupingBy(classifier, downstream)<\/code>\n<ul>\n<li><code>Map&lt;K, D&gt;<\/code> where <code>D<\/code> is the downstream result (sum, count, set, custom type).  <\/li>\n<\/ul>\n<\/li>\n<li><code>groupingBy(classifier, mapFactory, downstream)<\/code>\n<ul>\n<li>Adds control over the map implementation.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>This design splits the problem into <strong>classification<\/strong> (<code>classifier<\/code>) and <strong>aggregation<\/strong> (downstream), which makes collectors highly composable.<\/p>\n<h3>4.2 Example<\/h3>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.Map;\nimport java.util.stream.Collectors;\n\npublic record Order(String city, String status, double amount) {}\n\nvoid main() {\n    List&lt;Order&gt; orders = List.of(\n            new Order(&quot;Auckland&quot;, &quot;NEW&quot;, 100),\n            new Order(&quot;Auckland&quot;, &quot;NEW&quot;, 200),\n            new Order(&quot;Auckland&quot;, &quot;SHIPPED&quot;, 150),\n            new Order(&quot;Wellington&quot;, &quot;NEW&quot;, 300)\n    );\n\n    \/\/ City -&gt; list of orders\n    Map&lt;String, List&lt;Order&gt;&gt; ordersByCity = orders.stream()\n            .collect(Collectors.groupingBy(Order::city));\n\n    \/\/ City -&gt; total amount\n    Map&lt;String, Double&gt; totalByCity = orders.stream()\n            .collect(Collectors.groupingBy(\n                    Order::city,\n                    Collectors.summingDouble(Order::amount)\n            ));\n\n    \/\/ Status -&gt; number of orders\n    Map&lt;String, Long&gt; countByStatus = orders.stream()\n            .collect(Collectors.groupingBy(\n                    Order::status,\n                    Collectors.counting()\n            ));\n\n    System.out.println(&quot;Orders by city: &quot; + ordersByCity);\n    System.out.println(&quot;Total by city: &quot; + totalByCity);\n    System.out.println(&quot;Count by status: &quot; + countByStatus);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>We avoid explicit <code>Map<\/code> mutation and nested conditionals; aggregation logic is declarative and parallel\u2011safe by construction.  <\/li>\n<li>Downstream collectors like <code>summingDouble<\/code> and <code>counting<\/code> can be reused for other groupings.<\/li>\n<\/ul>\n<hr \/>\n<h2>5. Composing collectors \u2013 mapping, filtering, flatMapping, collectingAndThen<\/h2>\n<p>Collectors are designed to be nested, especially as downstreams of <code>groupingBy<\/code> or <code>partitioningBy<\/code>. This composability is what turns them into a mini DSL for aggregation.<\/p>\n<h3>5.1 mapping \u2013 transform before collecting<\/h3>\n<p><code>mapping(mapper, downstream)<\/code> applies a mapping to each element, then forwards the result to a downstream collector. Use it when you don\u2019t want to store the full original element in the group. <\/p>\n<p>Example: department \u2192 distinct employee names.<\/p>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.Map;\nimport java.util.Set;\nimport java.util.stream.Collectors;\n\npublic record Employee(String department, String name) {}\n\nvoid main() {\n    List&lt;Employee&gt; employees = List.of(\n            new Employee(&quot;Engineering&quot;, &quot;Alice&quot;),\n            new Employee(&quot;Engineering&quot;, &quot;Alice&quot;),\n            new Employee(&quot;Engineering&quot;, &quot;Bob&quot;),\n            new Employee(&quot;Sales&quot;, &quot;Carol&quot;)\n    );\n\n    Map&lt;String, Set&lt;String&gt;&gt; namesByDept = employees.stream()\n            .collect(Collectors.groupingBy(\n                    Employee::department,\n                    Collectors.mapping(Employee::name, Collectors.toSet())\n            ));\n\n    System.out.println(namesByDept);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>We avoid storing full <code>Employee<\/code> objects when we only need names, reducing memory and making the intent explicit.<\/li>\n<\/ul>\n<h3>5.2 filtering \u2013 per-group filtering<\/h3>\n<p><code>filtering(predicate, downstream)<\/code> (Java 9+) filters elements at the collector level. Unlike <code>stream.filter<\/code>, it keeps the outer grouping key even if the filtered collection becomes empty. <\/p>\n<p>Example: city \u2192 list of large orders (\u2265 150), but preserve all cities as keys.<\/p>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.Map;\nimport java.util.stream.Collectors;\n\npublic record Order(String city, double amount) {}\n\nvoid main() {\n    List&lt;Order&gt; orders = List.of(\n            new Order(&quot;Auckland&quot;, 100),\n            new Order(&quot;Auckland&quot;, 200),\n            new Order(&quot;Wellington&quot;, 50),\n            new Order(&quot;Wellington&quot;, 300)\n    );\n\n    Map&lt;String, List&lt;Order&gt;&gt; largeOrdersByCity = orders.stream()\n            .collect(Collectors.groupingBy(\n                    Order::city,\n                    Collectors.filtering(\n                            o -&gt; o.amount() &gt;= 150,\n                            Collectors.toList()\n                    )\n            ));\n\n    System.out.println(largeOrdersByCity);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>This approach preserves the full key space (e.g. all cities), which can be important for UI or reporting, while still applying a per-group filter.<\/li>\n<\/ul>\n<h3>5.3 flatMapping \u2013 flatten nested collections<\/h3>\n<p><code>flatMapping(mapperToStream, downstream)<\/code> (Java 9+) flattens nested collections or streams before collecting.<\/p>\n<p>Example: department \u2192 set of all courses taught there.<\/p>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.Map;\nimport java.util.Set;\nimport java.util.stream.Collectors;\n\npublic record Staff(String department, List&lt;String&gt; courses) {}\n\nvoid main() {\n    List&lt;Staff&gt; staff = List.of(\n            new Staff(&quot;CS&quot;, List.of(&quot;Algorithms&quot;, &quot;DS&quot;)),\n            new Staff(&quot;CS&quot;, List.of(&quot;Computer Architecture&quot;)),\n            new Staff(&quot;Math&quot;, List.of(&quot;Discrete Maths&quot;, &quot;Probability&quot;))\n    );\n\n    Map&lt;String, Set&lt;String&gt;&gt; coursesByDept = staff.stream()\n            .collect(Collectors.groupingBy(\n                    Staff::department,\n                    Collectors.flatMapping(\n                            s -&gt; s.courses().stream(),\n                            Collectors.toSet()\n                    )\n            ));\n\n    System.out.println(coursesByDept);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>Without <code>flatMapping<\/code>, you\u2019d get <code>Set&lt;Set&lt;String&gt;&gt;<\/code> or need an extra pass to flatten; this keeps it one-pass and semantically clear.<\/li>\n<\/ul>\n<h3>5.4 collectingAndThen \u2013 post-process a collected result<\/h3>\n<p><code>collectingAndThen(downstream, finisher)<\/code> applies a finisher function to the result of the downstream collector.<\/p>\n<p>Example: collect to an unmodifiable list.<\/p>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.stream.Collectors;\n\nvoid main() {\n    List&lt;String&gt; names = List.of(&quot;Alice&quot;, &quot;Bob&quot;, &quot;Carol&quot;);\n\n    List&lt;String&gt; unmodifiableNames = names.stream()\n            .collect(Collectors.collectingAndThen(\n                    Collectors.toList(),\n                    List::copyOf\n            ));\n\n    System.out.println(unmodifiableNames);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>It encapsulates the \u201ccollect then wrap\u201d pattern into a single collector, improving readability and signaling immutability explicitly.<\/li>\n<\/ul>\n<h3>5.5 Nested composition example<\/h3>\n<p>Now combine several of these ideas:<\/p>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.Map;\nimport java.util.Set;\nimport java.util.stream.Collectors;\n\npublic record Employee(String department, String city, String name, int age) {}\n\nvoid main() {\n    List&lt;Employee&gt; employees = List.of(\n            new Employee(&quot;Engineering&quot;, &quot;Auckland&quot;, &quot;Alice&quot;, 30),\n            new Employee(&quot;Engineering&quot;, &quot;Auckland&quot;, &quot;Bob&quot;, 26),\n            new Employee(&quot;Engineering&quot;, &quot;Wellington&quot;, &quot;Carol&quot;, 35),\n            new Employee(&quot;Sales&quot;, &quot;Auckland&quot;, &quot;Dave&quot;, 40)\n    );\n\n    \/\/ Department -&gt; City -&gt; unmodifiable set of names for employees age &gt;= 30\n    Map&lt;String, Map&lt;String, Set&lt;String&gt;&gt;&gt; result = employees.stream()\n            .collect(Collectors.groupingBy(\n                    Employee::department,\n                    Collectors.groupingBy(\n                            Employee::city,\n                            Collectors.collectingAndThen(\n                                    Collectors.filtering(\n                                            e -&gt; e.age() &gt;= 30,\n                                            Collectors.mapping(Employee::name, Collectors.toSet())\n                                    ),\n                                    Set::copyOf\n                            )\n                    )\n            ));\n\n    System.out.println(result);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>We express a fairly involved requirement in a single declarative pipeline and single pass, instead of multiple nested maps and loops.<\/li>\n<li>Each collector in the composition captures a small, local concern (grouping, filtering, mapping, immutability).<\/li>\n<\/ul>\n<hr \/>\n<h2>6. Collectors.teeing \u2013 two collectors, one pass<\/h2>\n<p><code>Collectors.teeing<\/code> (Java 12+) runs two collectors over the same stream in one pass and merges their results with a <code>BiFunction<\/code>.<\/p>\n<p>Signature:<\/p>\n<pre><code class=\"language-java\">public static &lt;T, R1, R2, R&gt; Collector&lt;T, ?, R&gt;\nteeing(Collector&lt;? super T, ?, R1&gt; downstream1,\n       Collector&lt;? super T, ?, R2&gt; downstream2,\n       java.util.function.BiFunction&lt;? super R1, ? super R2, R&gt; merger)<\/code><\/pre>\n<p>Use <code>teeing<\/code> when you want multiple aggregates (min and max, count and average, etc.) from the same data in one traversal.<\/p>\n<h3>6.1 Example: Stats in one pass<\/h3>\n<pre><code class=\"language-java\">import java.util.List;\nimport java.util.stream.Collectors;\n\npublic record Stats(long count, int min, int max, double average) {}\n\nvoid main() {\n    List&lt;Integer&gt; numbers = List.of(5, 12, 19, 21);\n\n    Stats stats = numbers.stream()\n            .collect(Collectors.teeing(\n                    Collectors.summarizingInt(Integer::intValue),\n                    Collectors.teeing(\n                            Collectors.minBy(Integer::compareTo),\n                            Collectors.maxBy(Integer::compareTo),\n                            (minOpt, maxOpt) -&gt; new int[] {\n                                    minOpt.orElseThrow(),\n                                    maxOpt.orElseThrow()\n                            }\n                    ),\n                    (summary, minMax) -&gt; new Stats(\n                            summary.getCount(),\n                            minMax[0],\n                            minMax[1],\n                            summary.getAverage()\n                    )\n            ));\n\n    System.out.println(stats);\n}<\/code><\/pre>\n<p><strong>Rationale:<\/strong>  <\/p>\n<ul>\n<li>We avoid traversing <code>numbers<\/code> multiple times or managing manual mutable state (counters, min\/max variables).<\/li>\n<li>We can reuse existing collectors (<code>summarizingInt<\/code>, <code>minBy<\/code>, <code>maxBy<\/code>) and compose them via <code>teeing<\/code> for a single-pass, parallelizable aggregation.<\/li>\n<\/ul>\n<hr \/>\n<h2>7. When to choose which collector<\/h2>\n<p>For design decisions, the following mental model works well:<\/p>\n<table>\n<thead>\n<tr>\n<th>Scenario<\/th>\n<th>Collector pattern<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>One value per key, need explicit handling of collisions<\/td>\n<td><code>toMap<\/code> (with merge &amp; mapSupplier as needed)<\/td>\n<\/tr>\n<tr>\n<td>Many values per key (lists, sets, or metrics)<\/td>\n<td><code>groupingBy<\/code> + downstream (<code>toList<\/code>, <code>counting<\/code>, etc.)<\/td>\n<\/tr>\n<tr>\n<td>Need per-group transformation\/filtering\/flattening<\/td>\n<td><code>groupingBy<\/code> with <code>mapping<\/code>, <code>filtering<\/code>, <code>flatMapping<\/code><\/td>\n<\/tr>\n<tr>\n<td>Need post-processing of collected result<\/td>\n<td><code>collectingAndThen(...)<\/code><\/td>\n<\/tr>\n<tr>\n<td>Two independent aggregates, one traversal<\/td>\n<td><code>teeing(collector1, collector2, merger)<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Viewed as a whole, <strong>collectors<\/strong> form a high-level, composable DSL for aggregation, while the <code>Stream<\/code> interface stays relatively small and general. Treating collectors as \u201caggregation policies\u201d lets you reason about <em>what<\/em> result you want, while delegating <em>how<\/em> to accumulate, combine, and finish to the carefully designed mechanisms of the Collectors API.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Collectors are the strategies that tell a Stream how to turn a flow of elements into a concrete result such as a List, Map, number, or custom DTO. Conceptually, a collector answers the question: \u201cGiven a stream of T, how do I build a result R in a single reduction step?\u201d 1. What is a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[17],"tags":[],"_links":{"self":[{"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/posts\/2114"}],"collection":[{"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2114"}],"version-history":[{"count":1,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/posts\/2114\/revisions"}],"predecessor-version":[{"id":2115,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=\/wp\/v2\/posts\/2114\/revisions\/2115"}],"wp:attachment":[{"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ronella.xyz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}