Hello there,
This week, we will warm ourselves up for data analysis!! I will be using the same data frame that I used in the previous blog post.
Let us first see the descriptive statistics such as mean, standard deviation, min and max values. To do that:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") print(df.describe())
Sometimes, when we prepare our data for the analysis, we would need to add a new variable. For instance, let us add the variable “Total”. Total will be the sum of the ratings for the Image 1 and Image 2. (Remember, participants rated the attractiveness of Image 1 and Image 2). To create the variable “Total”:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") df["Total"] = df["Image1"] + df["Image2"] print(df)
We used “print” to see our new variable in the console. If we want to apply the changes (i.e., having a new variable column “Total”) into our .csv file:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") df["Total"] = df["Image1"] + df["Image2"] df.to_csv("modified.csv", index=False) print(df)
So, now we created a new .csv file including the variable “Total”, and named it “modified.csv”. If we want to drop this new variable, then:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") df["Total"] = df["Image1"] + df["Image2"] df = df.drop(columns=["Total"]) df.to_csv("modified.csv", index=False) print(df)
Let us assume that we want to filter our data with multiple conditions (e.g., age and gender). For instance, we only want to see 19 year-old-females. And we want to create a new data frame (new–df), including only these data:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") new_df = df.loc[(df["Age"] == 19) & (df["Gender"] == "F")] print(new_df)
Lastly, when we want to modify the data: For instance, instead of having “F”, we may want to see “Female” in the data frame. To achieve that:
import pandas as pd df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv") df.loc[df["Gender"] == "F", "Gender"] = "Female" print(df)
Cheers!