To parallelize a stream in Java and improve performance, you can use the parallelStream method or convert a normal stream into a parallel stream using the Stream.parallel() method. Parallel streams allow data to be processed on multiple threads, leveraging multicore processors.
Here’s a detailed explanation and examples:
1. Using parallelStream()
You can use the parallelStream() method on a Collection (like a List, Set, etc.), which returns a parallel stream by default.
Example:
package org.kodejava.util.stream;
import java.util.List;
public class Main {
public static void main(String[] args) {
List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Process the stream in parallel
numbers.parallelStream()
.map(number -> number * 2) // Multiply each number by 2
.forEach(System.out::println); // Print each element
}
}
2. Using the parallel() Method
If you already have a sequential stream, you can convert it into a parallel stream using the Stream.parallel() method.
Example:
package org.kodejava.util.stream;
import java.util.stream.IntStream;
public class Main {
public static void main(String[] args) {
// Sequential stream
IntStream.range(1, 11)
.parallel() // Convert to parallel stream
.map(i -> i * i) // Square each number
.forEach(System.out::println); // Print squared numbers
}
}
3. Custom Thread Pool for ForkJoinPool
By default, parallel streams use the common ForkJoinPool for task execution with a default number of threads. If you want to control the thread pool size (e.g., prevent overloading the system), you can supply a custom ForkJoinPool.
Example:
package org.kodejava.util.stream;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
public class Main {
public static void main(String[] args) {
List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
ForkJoinPool customThreadPool = new ForkJoinPool(4); // Limit to 4 threads
customThreadPool.submit(() ->
numbers.parallelStream()
.map(number -> number * 2)
.forEach(System.out::println)
).join();
customThreadPool.shutdown();
}
}
Key Points About Parallel Streams
- Performance Consideration:
- Parallel streams divide their workload into smaller chunks and process them concurrently. Thus, they’re best suited for CPU-intensive operations or for working with large datasets.
- For smaller datasets, the overhead of parallelism might actually degrade performance compared to a sequential stream.
- Thread-Safety:
- Ensure your pipeline operations are thread-safe. For instance, avoid shared mutable state in stream operations as it can lead to race conditions.
- Order and Results:
- Parallel streams might not maintain the processing order unless explicitly required. If you want to maintain order, consider using operations like
forEachOrdered()instead offorEach().
Example with
forEachOrdered():numbers.parallelStream() .map(number -> number * 2) .forEachOrdered(System.out::println); // Maintain order - Parallel streams might not maintain the processing order unless explicitly required. If you want to maintain order, consider using operations like
- Parallelization is Not Always Optimal:
- Parallel streams are more effective when the processing of individual elements is computationally expensive or when the dataset is large.
- For small datasets or lightweight operations, the cost of managing threads can outweigh the performance benefits.
Summary
- Use
parallelStream()orStream.parallel()to parallelize your stream. - Optimize the operations in the stream pipeline to take full advantage of parallel processing.
- Be cautious with thread-safety and order requirements.
- Profile and test your application to confirm that parallel streams provide a tangible performance boost in your specific use case.
