Wednesday, August 27, 2025

🐍Filtering Data with Pandas

Filtering data is essential to analyze only the subset of a DataFrame that meets specific conditions. Pandas provides powerful tools to filter rows using single or multiple conditions.

🔹 Filtering with Single Condition


import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 28],
    "City": ["New York", "London", "Paris", "London"]
}
df = pd.DataFrame(data)

# Filter rows where Age > 28
filtered_df = df[df["Age"] > 28]
print(filtered_df)
    

✅ Returns rows where the condition is True.

🔹 Filtering with Multiple Conditions


# Using & (and), | (or) operators
filtered_df = df[(df["Age"] > 28) & (df["City"] == "London")]
print(filtered_df)

filtered_df_or = df[(df["Age"] < 30) | (df["City"] == "Paris")]
print(filtered_df_or)
    

✅ Remember to wrap each condition in parentheses.

🔹 Filtering with isin()


# Filter rows where City is either London or Paris
filtered_df = df[df["City"].isin(["London", "Paris"])]
print(filtered_df)
    

🔹 Filtering String Columns


# Filter rows where Name starts with 'A'
filtered_df = df[df["Name"].str.startswith("A")]
print(filtered_df)

# Filter rows where City contains 'on'
filtered_df = df[df["City"].str.contains("on")]
print(filtered_df)
    

⚠️ Common Mistakes

  • Forgetting parentheses around each condition when using & or |.
  • Using and or or instead of & / | in Pandas conditions.
  • Not using str. methods correctly on string columns (e.g., df["Name"].startswith("A") will fail; use df["Name"].str.startswith("A")).
  • Forgetting that isin() expects a list or iterable of values.

🖥️ Practice in Browser

No comments:

Post a Comment

🐍What is scikitlearn??