Wednesday, August 27, 2025

๐ŸCreating DataFrames from dicts, lists, CSV, Excel.

Pandas allows you to create DataFrames from various data sources such as dictionaries, lists, CSV files, and Excel files. This flexibility makes it easy to work with structured data from different origins.

๐Ÿ”น From Dictionaries

Each key in a dictionary becomes a column, and the corresponding values become the rows.


import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "London", "Paris"]
}

df = pd.DataFrame(data)
print(df)
    

๐Ÿ‘‰ Output:


      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris
    

๐Ÿ”น From Lists

You can create a DataFrame from a list of lists or a list of tuples by providing column names.


data = [
    ["Alice", 25, "New York"],
    ["Bob", 30, "London"],
    ["Charlie", 35, "Paris"]
]

df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)
    

๐Ÿ”น From CSV Files

CSV (Comma Separated Values) files are plain text files used to store tabular data. Pandas provides pd.read_csv() to read CSV files into a DataFrame.

Example 1: CSV in the same folder


df = pd.read_csv("data.csv")
print(df.head())  # Show first 5 rows
    

Example 2: CSV with full path


df = pd.read_csv("C:/Users/username/Documents/data.csv")
print(df.head())
    

Example 3: CSV from an online URL


url = "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv"
df = pd.read_csv(url)
print(df.head())
    

✅ Pandas automatically handles downloading and reading the file from the web.

๐Ÿ”น From Excel Files

Excel files can store multiple sheets and heterogeneous data. Use pd.read_excel() to read them. You may need to install openpyxl (for .xlsx) or xlrd (for .xls).

Example 1: Excel in the same folder


df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
print(df.head())
    

Example 2: Excel with full path


df = pd.read_excel("C:/Users/username/Documents/data.xlsx", sheet_name="Sheet1")
print(df.head())
    

Example 3: Excel from an online repository


url = "https://github.com/selva86/datasets/raw/master/Smarket.xlsx"
df = pd.read_excel(url, sheet_name="Smarket")
print(df.head())
    

✅ Pandas can fetch and parse Excel files from online repositories directly, just like CSV.

⚠️ Common Mistakes

  • Forgetting to install required packages for Excel files (like openpyxl or xlrd).
  • Using incorrect file paths — remember to use forward slashes / or raw strings r"C:\path\to\file.csv" in Windows.
  • Not specifying sheet_name when reading Excel files with multiple sheets.
  • Assuming CSV/Excel files online are always publicly accessible — some URLs require authentication.
  • Not handling missing or malformed data — read_csv() and read_excel() have options like na_values and error_bad_lines.

๐Ÿ–ฅ️ Practice in Browser

No comments:

Post a Comment

๐ŸWhat is scikitlearn??