Once you have a DataFrame, it’s important to explore and understand your data. Pandas provides several functions to quickly view the contents, structure, and summary statistics of a DataFrame.
🔹 head() - View Top Rows
head(n) shows the first n rows of a DataFrame. By default, n=5.
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head()) # first 5 rows
print(df.head(10)) # first 10 rows
✅ Useful to quickly check the beginning of your dataset.
🔹 tail() - View Bottom Rows
tail(n) shows the last n rows of a DataFrame.
print(df.tail()) # last 5 rows
print(df.tail(8)) # last 8 rows
✅ Helpful to check recent entries or end of the dataset.
🔹 info() - Summary of DataFrame
Provides concise summary: number of rows, columns, non-null values, and data types.
df.info()
Example output:
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 100 non-null object
1 Age 100 non-null int64
2 City 98 non-null object
3 Salary 100 non-null float64
✅ Helps identify missing data and understand column types.
🔹 describe() - Statistical Summary
Provides statistics for numeric columns: count, mean, std, min, 25%, 50%, 75%, max.
df.describe()
Example output:
Age Salary
count 100.000000 100.000000
mean 30.500000 55000.250000
std 5.920432 8500.123456
min 22.000000 40000.000000
25% 26.750000 48000.000000
50% 30.500000 55000.000000
75% 34.250000 62000.000000
max 45.000000 70000.000000
✅ Quickly understand the spread and range of your numeric data.
⚠️ Common Mistakes
- Calling
head()ortail()without parentheses:df.headreturns a method, not data. - Assuming
describe()includes non-numeric columns by default (useinclude='all'to include objects). - Misreading
info()output — the column with fewer non-null values indicates missing data. - For large datasets, printing
head()ortail()is more efficient than printing the entire DataFrame.
No comments:
Post a Comment