We'll now perform advanced pandas DataFrame manipulation to create a composite feature called "p-class sex"—a powerful analytical construct that combines passenger class (first, second, or third) with gender. This feature engineering technique allows us to examine the intersection of socioeconomic status and gender in survival patterns, revealing insights that neither variable could provide independently.
Begin by defining our categorical framework with six distinct values: first class female, first class male, second class female, second class male, third class female, and third class male. These represent all possible combinations of our two variables and will serve as our controlled categories for analysis.
Next, we'll construct this composite column by concatenating the p-class and sex values. Execute this with: titanic_data['p_class_sex'] = titanic_data['pclass'].astype(str) + '_' + titanic_data['sex']. The critical step here is converting the numeric p-class values (1, 2, 3) to strings using astype(str), enabling seamless concatenation with the already-string sex variable. This type conversion prevents pandas from attempting arithmetic operations and ensures proper string joining.
The final transformation converts our new column into a pandas categorical data type—a best practice that improves both memory efficiency and analytical precision. Execute: titanic_data['p_class_sex'] = pd.Categorical(titanic_data['p_class_sex'], categories=categories_list). Categorical data types also enable ordered operations and ensure consistent behavior across different analytical operations.
Examining our newly created series reveals the expected combinations: third class male, first class female, third class female, and others distributed across our 891 total observations. This structured approach to feature engineering demonstrates how thoughtful data preparation can unlock deeper analytical insights.
With our composite feature ready, we can now visualize these intersectional survival patterns using Seaborn's sophisticated plotting capabilities. The count plot with survival status on the x-axis and our p-class sex feature as the hue parameter reveals stark disparities that individual variables alone couldn't illuminate.
The visualization exposes dramatic survival inequalities across our six categories. Third-class males experienced devastating mortality rates—barely any survived the disaster. Second-class males fared similarly poorly, highlighting how gender and class intersected fatally for men in lower passenger classes. The contrast with female passengers is striking: first-class women achieved remarkable survival rates with only three fatalities against 91 survivors. Second-class females also demonstrated strong survival advantage with just six deaths versus 70 survivors.
Most revealing is the third-class data, where gender advantage finally diminishes—72 deaths and 72 survivors represent near-parity. This suggests that by third class, socioeconomic disadvantage began overwhelming gender-based survival privileges. The "women and children first" protocol apparently held strongest among higher passenger classes, where social status reinforced traditional evacuation priorities.
This intersectional analysis demonstrates the power of composite features in revealing complex relationships within historical data. As we transition to machine learning applications, this engineered feature will likely prove highly predictive—capturing nuanced survival patterns that simpler models might miss. Our next phase involves preparing this enriched dataset for algorithmic analysis, beginning with a random forest classifier that can leverage these multi-dimensional insights for sophisticated predictive modeling.