Once you have a DataFrame, you often need to access specific columns, rows, or subsets of data. Pandas provides multiple methods for this, including label-based and position-based indexing.
🔹 Accessing Columns
Columns can be accessed like dictionary keys or using dot notation.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "London", "Paris"]
}
df = pd.DataFrame(data)
# Using dictionary-style
print(df["Name"])
# Using dot notation
print(df.Age)
✅ Dot notation works only if column names are valid Python identifiers.
🔹 Accessing Multiple Columns
# Select multiple columns by passing a list
print(df[["Name", "City"]])
🔹 Accessing Rows
Rows can be accessed by index labels (loc) or integer positions (iloc).
# Using loc (label-based)
print(df.loc[0]) # first row
print(df.loc[1:2]) # rows 1 to 2
# Using iloc (position-based)
print(df.iloc[0]) # first row
print(df.iloc[1:3]) # rows 1 and 2
🔹 Accessing Specific Cells
Combine row and column selection to access a single cell.
# Using loc
print(df.loc[0, "Name"]) # Alice
# Using iloc
print(df.iloc[1, 2]) # London
🔹 Conditional Row Selection
You can filter rows using conditions.
# Rows where Age > 28
print(df[df["Age"] > 28])
# Rows where City is London
print(df[df["City"] == "London"])
⚠️ Common Mistakes
- Using dot notation for column names with spaces or special characters (
df.Ageworks butdf.Age(Years)does not). - Confusing
loc(label-based) withiloc(integer-based) indexing. - Trying to access rows by number using
df[0]— always uselocoriloc. - For conditional selection, forgetting to use brackets around the condition:
df[df["Age"] > 28]is correct,df["Age"] > 28returns a boolean Series only.
No comments:
Post a Comment