Multinomial Logit API¶
ChoiceModels has built-in functionality for Multinomial Logit estimation and simulation. This can use either the PyLogit MNL estimation engine or a custom engine optimized for fast performance with large numbers of alternatives. The custom engine is originally from
Fitting a model yields a results object that can generate choice probabilities for out-of-sample scenarios.
MultinomialLogit(data, model_expression, observation_id_col=None, choice_col=None, model_labels=None, alternative_id_col=None, initial_coefs=None, weights=None)¶
A class with methods for estimating multinomial logit discrete choice models. Each observation is a choice scenario in which a chooser selects one alternative from a choice set of two or more. The fitted parameters represent a joint optimization of utility expressions that explains observed choices based on attributes of the alternatives and of the choosers.
The input data needs to be in “long” format, with one row for each combination of chooser and alternative. Columns contain relevant attributes and identifiers. (If the choice sets are large, sampling of alternatives should be carried out before data is passed to this class.)
The class constructor supports two use cases:
The first use case is simpler and requires fewer inputs. Each choice scenario must have the same number of alternatives, and each alternative must have the same model expression (utility equation). This is typical when the alternatives are relatively numerous and homogenous, for example with travel destination choice or household location choice.
The following parameters are required: ‘data’, ‘observation_id_col’, ‘choice_col’, ‘model_expression’ in Patsy format. If data is provided as a MergedChoiceTable, the observation id and choice column names can be read directly from its metadata.
To fit this type of model, ChoiceModels will use its own estimation engine adapted from the UrbanSim MNL codebase.
Migration from ‘urbansim.urbanchoice’: Note that these requirements differ from the old UrbanSim codebase in a couple of ways. (1) The chosen alternatives need to be indicated in a column of the estimation data table instead of in a separate matrix, and (2) in lieu of indicating the number of alternatives in each choice set, the estimation data table should include an observation id column. These changes make the API more consistent with other use cases. See the MergedChoiceTable() class for tools and code examples to help with migration.
The second use case is more flexible. Choice scenarios can have varying numbers of alternatives, and the model expression (utility equation) can be different for distinct alternatives. This is typical when there is a small number of alternatives whose salient characteristics vary, for example with travel mode choice.
The following parameters are required: ‘data’, ‘observation_id_col’, ‘alternative_id_col’, ‘choice_col’, ‘model_expression’ in PyLogit format, ‘model_labels’ in PyLogit format (optional).
To fit this type of model, ChoiceModels will use the PyLogit estimation engine.
With either use case, the model expression can include attributes of both the choosers and the alternatives. Attributes of a particular alternative may vary for different choosers (distance, for example), but this must be set up manually in the input data.
Note that prediction methods are in a separate class: see MultinomialLogitResults().
- data (pd.DataFrame or choicemodels.tools.MergedChoiceTable) – A table of estimation data in “long” format, with one row for each combination of chooser and alternative. Column labeling must be consistent with the ‘model_expression’. May include extra columns.
- model_expression (Patsy 'formula-like' or PyLogit 'specification') –
For the simpler use case where each choice scenario has the same number of alternatives and each alternative has the same model expression, this should be a Patsy formula representing the right-hand side of the single model expression. This can be a string or a number of other data types. See here: https://patsy.readthedocs.io/en/v0.1.0/API-reference.html#patsy.dmatrix
For the more flexible use case where choice scenarios have varying numbers of alternatives or the model expessions vary, this should be a PyLogit OrderedDict model specification. See here: https://github.com/timothyb0912/pylogit/blob/master/pylogit/pylogit.py#L116-L130
- observation_id_col (str, optional) – Name of column or index containing the observation id. This should uniquely identify each distinct choice scenario. Not required if data is passed as a MergedChoiceTable.
- choice_col (str, optional) – Name of column containing an indication of which alternative has been chosen in each scenario. Values should evaluate as binary: 1/0, True/False, etc. Not required if data is passed as a MergedChoiceTable.
- model_labels (PyLogit 'names', optional) – If the model expression is a PyLogit OrderedDict, you can provide a corresponding OrderedDict of labels. See here: https://github.com/timothyb0912/pylogit/blob/master/pylogit/pylogit.py#L151-L165
- alternative_id_col (str, optional) – Name of column or index containing the alternative id. This is only required if the model expression varies for different alternatives. Not required if data is passed as a MergedChoiceTable.
- initial_coefs (numeric or list-like of numerics, optional) – Initial coefficients (beta values) to begin the optimization process with. Provide a single value for all coefficients, or an array containing a value for each one being estimated. If None, initial coefficients will be 0.
- weights (1D array, optional) – NOT YET IMPLEMENTED - Estimation weights.
‘ChoiceModels’ or ‘PyLogit’.
MultinomialLogitResults(model_expression, results=None, fitted_parameters=None, estimation_engine='ChoiceModels')¶
The results class represents a fitted model. It can report the model fit, generate choice probabilties, etc.
A full-featured results object is returned by MultinomialLogit.fit(). A results object with more limited functionality can also be built directly from fitted parameters and a model expression.
- model_expression (str or OrderedDict) – Patsy ‘formula-like’ (str) or PyLogit ‘specification’ (OrderedDict).
- results (dict or object, optional) – Raw results as currently provided by the estimation engine. This should be replaced with a more consistent and comprehensive set of inputs.
- fitted_parameters (list of floats, optional) – If not provided, these will be extracted from the raw results.
- estimation_engine (str, optional) – ‘ChoiceModels’ (default) or ‘PyLogit’.
Return the raw results as provided by the estimation engine. Dict or object.
Generate predicted probabilities for a table of choice scenarios, using the fitted parameters stored in the results object.
- data (choicemodels.tools.MergedChoiceTable) – Long-format table of choice scenarios. TO DO - accept other data formats.
- class parameters (Expected) –
- ------------------------- –
- self.model_expression (patsy string) –
- self.fitted_parameters (list of floats) –
Returns: Return type:
pandas.Series with indexes matching the input
Print a report of the model estimation results.