Filtering data is essential to analyze only the subset of a DataFrame that meets specific conditions. Pandas provides powerful tools to filter rows using single or multiple conditions.
🔹 Filtering with Single Condition
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David"],
"Age": [25, 30, 35, 28],
"City": ["New York", "London", "Paris", "London"]
}
df = pd.DataFrame(data)
# Filter rows where Age > 28
filtered_df = df[df["Age"] > 28]
print(filtered_df)
✅ Returns rows where the condition is True.
🔹 Filtering with Multiple Conditions
# Using & (and), | (or) operators
filtered_df = df[(df["Age"] > 28) & (df["City"] == "London")]
print(filtered_df)
filtered_df_or = df[(df["Age"] < 30) | (df["City"] == "Paris")]
print(filtered_df_or)
✅ Remember to wrap each condition in parentheses.
🔹 Filtering with isin()
# Filter rows where City is either London or Paris
filtered_df = df[df["City"].isin(["London", "Paris"])]
print(filtered_df)
🔹 Filtering String Columns
# Filter rows where Name starts with 'A'
filtered_df = df[df["Name"].str.startswith("A")]
print(filtered_df)
# Filter rows where City contains 'on'
filtered_df = df[df["City"].str.contains("on")]
print(filtered_df)
⚠️ Common Mistakes
- Forgetting parentheses around each condition when using
&or|. - Using
andororinstead of&/|in Pandas conditions. - Not using
str.methods correctly on string columns (e.g.,df["Name"].startswith("A")will fail; usedf["Name"].str.startswith("A")). - Forgetting that
isin()expects a list or iterable of values.
No comments:
Post a Comment