Your first data source

Introduction

In this example, we will use the Apartments dataset to create different models using R and Python simultaneously. Finally, we will use both data sources in the Arena, but feel free to follow snippets of only one language.

The dataset contains the following columns representing apartments in Warsaw:
  • m2.price - price per square meter
  • surface - apartment area in square meters
  • n.rooms - number of rooms
  • district - district in which the apartment is located
  • floor - floor number
  • construction.date - construction year

Load libraries and data

We load apartments and apartmentsTest datasets from dalex package and split predictors and the target variable.
import dalex as dx

train = dx.datasets.load_apartments()
test = dx.datasets.load_apartments_test()

X_train = train.drop(columns='m2_price')
y_train = train["m2_price"]

X_test= test.drop(columns='m2_price')
y_test = test["m2_price"]

Modeling

In this section, we use some popular libraries. But keep in mind that Arena supports everything supported by DALEX.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor

numerical_features = X_train.select_dtypes(exclude=[object]).columns
numerical_transformer = Pipeline(
    steps=[
        ('scaler', StandardScaler())
    ]
)

categorical_features = X_train.select_dtypes(include=[object]).columns
categorical_transformer = Pipeline(
    steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ]
)

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

model_elastic_net = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('model', ElasticNet())
    ]
)
model_elastic_net.fit(X=X_train, y=y_train)
model_decision_tree = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('model', DecisionTreeRegressor())
    ]
)
model_decision_tree.fit(X=X_train, y=y_train)

Creating Explainers

Now we are going to pack our models into the universal structure called Explainer provided by the DALEX package. In most cases, you need to create Explainer using just three arguments:
  1. model
  2. data - data frame of predictors
  3. y - target vector
Sometimes you need to override the default predict_function or model_type. For popular packages like mlr, mlr3, or tidymodels there are wrappers in DALEXtra.
For more info about explaining models look at this.
exp_elastic_net = dx.Explainer(model_elastic_net, data=X_test, y=y_test)
exp_decision_tree = dx.Explainer(model_decision_tree, data=X_test, y=y_test)

Create Arena

In the last part, we want to create an Arena object and run a server or export JSON. To do that we need three steps
  1. Initialize object
  2. Use methods below to fill the Arena. You can use them an unlimited number of times, but labels are required to be unique.
    • push_models - this method takes Explainer as an argument and each call adds one model.
    • push_observations - use this method to add data frames of observations. They will be explained using observation level XAI charts. In each call, you can pass multiple observations (one row = one observation labeled by row name).
      Datasets have to contain variables required to make a prediction. Additional ones will be displayed in the Observation Details panel.
    • push_dataset - Data frames pushed using this method will be used to create EDA (exploratory data analysis) charts. These data frames have to contain the target variable.
  3. Run server or export data
    • run_server - starts server in the live mode
    • save_arena - generates all plots and saves them to the file
    • upload_arena - generates all plots and upload them to GitHub Gist
# Start creating data source
arena=dx.Arena()
# Add explainers
arena.push_model(exp_elastic_net)
arena.push_model(exp_decision_tree)
# Add 10 first rows of testing dataset as observations
arena.push_observations(X_test.iloc[1:10])
# Run live data source on port 9294
arena.run_server(port=9294)
Params Sharing Session