Wednesday, August 27, 2025

🐍Viewing data (head(), tail(), info(), describe())

Once you have a DataFrame, it’s important to explore and understand your data. Pandas provides several functions to quickly view the contents, structure, and summary statistics of a DataFrame.

🔹 head() - View Top Rows

head(n) shows the first n rows of a DataFrame. By default, n=5.


import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())       # first 5 rows
print(df.head(10))     # first 10 rows
    

✅ Useful to quickly check the beginning of your dataset.

🔹 tail() - View Bottom Rows

tail(n) shows the last n rows of a DataFrame.


print(df.tail())       # last 5 rows
print(df.tail(8))      # last 8 rows
    

✅ Helpful to check recent entries or end of the dataset.

🔹 info() - Summary of DataFrame

Provides concise summary: number of rows, columns, non-null values, and data types.


df.info()
    

Example output:



RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    100 non-null    object 
 1   Age     100 non-null    int64  
 2   City    98 non-null     object 
 3   Salary  100 non-null    float64
    

✅ Helps identify missing data and understand column types.

🔹 describe() - Statistical Summary

Provides statistics for numeric columns: count, mean, std, min, 25%, 50%, 75%, max.


df.describe()
    

Example output:


             Age        Salary
count  100.000000   100.000000
mean    30.500000  55000.250000
std      5.920432   8500.123456
min     22.000000  40000.000000
25%     26.750000  48000.000000
50%     30.500000  55000.000000
75%     34.250000  62000.000000
max     45.000000  70000.000000
    

✅ Quickly understand the spread and range of your numeric data.

⚠️ Common Mistakes

  • Calling head() or tail() without parentheses: df.head returns a method, not data.
  • Assuming describe() includes non-numeric columns by default (use include='all' to include objects).
  • Misreading info() output — the column with fewer non-null values indicates missing data.
  • For large datasets, printing head() or tail() is more efficient than printing the entire DataFrame.

🖥️ Practice in Browser

No comments:

Post a Comment

🐍What is scikitlearn??