Pandas allows you to create DataFrames from various data sources such as dictionaries, lists, CSV files, and Excel files. This flexibility makes it easy to work with structured data from different origins.
๐น From Dictionaries
Each key in a dictionary becomes a column, and the corresponding values become the rows.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "London", "Paris"]
}
df = pd.DataFrame(data)
print(df)
๐ Output:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
๐น From Lists
You can create a DataFrame from a list of lists or a list of tuples by providing column names.
data = [
["Alice", 25, "New York"],
["Bob", 30, "London"],
["Charlie", 35, "Paris"]
]
df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)
๐น From CSV Files
CSV (Comma Separated Values) files are plain text files used to store tabular data.
Pandas provides pd.read_csv() to read CSV files into a DataFrame.
Example 1: CSV in the same folder
df = pd.read_csv("data.csv")
print(df.head()) # Show first 5 rows
Example 2: CSV with full path
df = pd.read_csv("C:/Users/username/Documents/data.csv")
print(df.head())
Example 3: CSV from an online URL
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv"
df = pd.read_csv(url)
print(df.head())
✅ Pandas automatically handles downloading and reading the file from the web.
๐น From Excel Files
Excel files can store multiple sheets and heterogeneous data. Use pd.read_excel() to read them.
You may need to install openpyxl (for .xlsx) or xlrd (for .xls).
Example 1: Excel in the same folder
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
print(df.head())
Example 2: Excel with full path
df = pd.read_excel("C:/Users/username/Documents/data.xlsx", sheet_name="Sheet1")
print(df.head())
Example 3: Excel from an online repository
url = "https://github.com/selva86/datasets/raw/master/Smarket.xlsx"
df = pd.read_excel(url, sheet_name="Smarket")
print(df.head())
✅ Pandas can fetch and parse Excel files from online repositories directly, just like CSV.
⚠️ Common Mistakes
- Forgetting to install required packages for Excel files (like
openpyxlorxlrd). - Using incorrect file paths — remember to use forward slashes
/or raw stringsr"C:\path\to\file.csv"in Windows. - Not specifying
sheet_namewhen reading Excel files with multiple sheets. - Assuming CSV/Excel files online are always publicly accessible — some URLs require authentication.
- Not handling missing or malformed data —
read_csv()andread_excel()have options likena_valuesanderror_bad_lines.
No comments:
Post a Comment