
Steps 5 and 6 are related to the ML algorithms for the decision trees specifically. Likewise, splitting data is a mandatory task in any backtesting process (ML or not), the idea is to have one set of data to train the model and another set of data, which have not been used in training, to test the model. These indicators or predictors are used to predict the target variable that is the financial instrument will go up or down for the classification model, or the future price level for the regression model. However, they are nothing more than additional columns in the data frame that contain some type of indicator. If you are a newcomer to decision trees the predictor and target variables may sound exotic to you. If we look at the first four steps, they are common operations for data processing.


The main steps to build a decision tree are: (among many others things)īuilding a classification decision tree or a regression decision tree is very similar in the way we organize the input data and predictor variables, then, by calling the corresponding functions, the classification decision tree or regression decision tree will be automatically created for us according to some criteria we must specify.
USE A DECISION TREE FOR MAC SOFTWARE
Preparing The Environmentīe sure you have available the following software pieces in order to follow the examples: We will also make a decision tree to forecasts about the concrete return of the index the next day.
In this introduction post to decision trees, we will create a classification decision tree in Python to make forecasts about whether the financial instrument we are going to analyze will go up or down the next day. Thanks to Python’s Sklearn library, the tree is automatically created for us taking as a starting point the predictor variables that we hypothetically think are responsible for the output we are looking for.

Although the classification and regression problems have different objectives, the trees have the same structure: It must not be confused with linear regression which is used to study the relationship between variables. yes/no, up/down, red/blue/yellow, etc.)Ī Regression problem tries to forecast a number such as the return for the next day. Remember that a Classification problem tries to classify unknown elements into a class or category the output always are categorical variables (i.e. Decision Trees, are a Machine Supervised Learning method used in Classification and Regression problems, also known as CART.
