Generators in Python: Fundamentals for Data Scientists
Understand the basics with a concrete example!
Generators are special functions that return a lazy iterator which we can iterate over to handle one unit of data at a time. As lazy iterators do not store the whole content of data in the memory, they are commonly used to work with data streams and large datasets.
Generators in Python are very similar to normal functions with some characteristic differences listed below;
- Generator functions have yield expression, instead of return used in normal functions.
- Both yield and return statements return a value from a function. While the return statement ends the function completely, yield statement suspends the function by keeping all its state in the memory for later use.
- When the generator functions yields, the function is not terminated. Instead, it pauses the function and gives control over to the caller.
- After fully iterated, generators terminate and raise stopIteration exception.
This post will introduce you to the basics of generators in Python.
Let’s write a Python3 code that contains simple examples of implementing generators:
# Simple Generator
def my_simple_generator():
k = 0
yield k k = 1
yield k k = 2
yield kmy_numbers = my_simple_generator()print(next(my_numbers))
print(next(my_numbers))
print(next(my_numbers))
# Below call raises StopIteration as generator is fully iterated
# print(next(my_numbers))# Defining Generators with Loop
def my_generator_with_loop(my_str):
length = len(my_str)
for k in range(length):
yield my_str[k]my_text = my_generator_with_loop("Coding")for char in my_text:
print(char)# Defining Generators with Expressions
my_generator_expression = (number**2 for number in range(4))
print (sum(my_generator_expression))# Defining Generator Pipeline
my_generator_01 = (number**2 for number in range(40))
my_generator_02 = (number-5 for number in my_generator_01)print(sum(my_generator_02))
Defining Generator Functions
# Simple Generator
def my_simple_generator():
k = 0
yield kk = 1
yield kk = 2
yield kmy_numbers = my_simple_generator()print(next(my_numbers))
print(next(my_numbers))
print(next(my_numbers))
# Below call raises StopIteration as generator is fully iterated
# print(next(my_numbers))#Output
0
1
2
Generator functions are defined as normal functions. The only difference in the code structure is the yield expression used instead of return.
We have three yield expression in our example above. This means that we can iterate the generator maximum of 3 times. If we iterate it more than 3 times, the generator will raise a StopIteration exception.
Once we define the generator function, we can assign it to a variable. With the assignment, my_numbers variable points to the generator object. Note that we are not iterating the generator with the assignment. To manually iterate the generator, we can use next(my_number) notation.
In every iteration step, the value of k, and the inner state of the my_simple_generator are remembered by Python.
Defining Generators with a Loop
To simplify the generator definition we can define and iterate the generators with the help of for loops.
# Defining Generators with Loop
def my_generator_with_loop(my_str):
length = len(my_str)
for k in range(length):
yield my_str[k]my_text = my_generator_with_loop("Coding")for char in my_text:
print(char)#Output
C
o
d
i
n
g
Defining Generators With Generator Expressions
We can further simplify the definition of generators with generator expressions. The syntax for creating a generator expression is very similar to the one used for list comprehension. Round brackets are used to create generator expressions, instead of square brackets used in the list comprehension.
As list comprehension returns and stores the whole content in the memory, generator expressions just returns and stores the portion of data when it is demanded.
# Defining Generators with Expressions
my_generator_expression = (number**2 for number in range(4))
print (sum(my_generator_expression))#Output
14
Defining Generator Pipelines
To create complex pipelines, multiple generators can be stacked easily as shown in the below example.
# Defining Generator Pipeline
my_generator_01 = (number**2 for number in range(40))
my_generator_02 = (number-5 for number in my_generator_01)print(sum(my_generator_02))#Output
20340
Key Takeaways
- Generators are memory-friendly as they return and store the portion of data only when it is demanded.
- Generators simplify the iterator definition process
- We can define generators with generators expressions or generator functions.
- We can develop memory-efficient data pipelines by using multiple generators.
Conclusion
In this post, I explained the basics of generators in Python.
The code in this post is available in my GitHub repository.
I hope you found this post useful.
Thank you for reading!