It is likely that if this is the first time you hear that about R squared, you have no idea what I mean exactly or where it is going. It is normal, there is a lot written about supports, resistances, chart figures… but not so much about more objective indicators.
The subject is a bit technical, based on mathematics and statistics, but I am going to (try to) explain it to you in a practical way and to the point. In the end, you will see how everything is easier than it seems.
- What is R Square?
- Characteristics of an evaluation criterion of trading systems
- Linear regression applied to our trading system
- 4. Pearson’s correlation coefficient.
- 5. Calculation of the coefficient of determination R squared.
- Limitations on use:
- Application in trading systems
What is R Square?
First, let’s start by defining and understanding the concept of R-Square. R squared is a statistical coefficient of determinationalso represented as Rtwo, which allows us to predict some outcome or test a hypothesis. In other words, when we analyze a statistical model, the R-squared coefficient determines the efficiency of the model (how good it is) and also expresses the percentage or proportion of variation of the results that can be explained by this model.
With this definition clear, in order to use this R-squared coefficient in practice, it is necessary to understand two important concepts:
In statistics, a linear regression, also known as linear dependence, is a mathematical model used to approximate dependency ratio between a dependent variable (for example Y), the independent variables (X1,Xtwo,X3,…….xno) and a random term ɛ (associated with any process whose result is not foreseeable except in the intervention of chance).
Pearson’s correlation coefficient:
In statistics, the Pearson correlation coefficient is a linear measure of the degree of relationship between two variables quantitative random, that is, two variables that can be measured or observed and also be represented by numerical quantities.
Now, having defined these concepts, you may be wondering: How to use this to evaluate my trading system? Step by Step.
Every strategy or trading system needs an objective evaluation of its effectiveness. To achieve this, we can use a wide series of ratios, some more complex than others, both in their calculation process and in their interpretation. Despite all this variety, there are very few quality metrics to assess something very important: the regularity of the balance line of the system or trading strategy.
To do this, we will use the coefficient of determination, R squared, to calculate the quantitative estimate of that ascending straight line that all traders want to see in our results.
Characteristics of an evaluation criterion of trading systems
Each criterion or ratio used to assess the effectiveness or robustness of a trading system has its application limitations. There are no ideal or pre-established criteria that allow us to determine with absolute certainty the robustness of a trading system. However, some properties or characteristics that they must have can be formulated:
Independence in relation to the duration of the trial period
Many parameters of the strategy or trading system depend on the duration of the trial period, for example: the longer the trial period for a profitable strategy, the higher its net profit. Independence with respect to the time period is necessary and essential to compare the effectiveness of different strategies in different test periods.
Test Endpoint Independence
For example, if the strategy “plays” with simply stopping losses, the end point of the test can change the bottom line considerably. The criterion or indicator must be immune to this type of machinations and offer a clear picture of the work of the trading system.
Simplicity in interpretation
All the indicators of a trading system must be quantitative, that is, they must be represented by a certain numerical quantity. It is important that this numerical quantity is intuitively understandable. The simpler the interpretation of the obtained value, the easier the parameter will be to understand. It is also desirable that the value of the indicator falls within a set limits or a defined interval, since it is more difficult to understand the meaning of extremely large numbers.
Representative results with few transactions
This is probably the hardest requirement on the list of features for a good metric to meet because all statistical methods depend on the number of measurements. The higher the measurements, the more stable the statistics obtained are. It is practically impossible to completely solve this problem in a small sample, but the effects that arise due to the lack of data can be smoothed out.
Linear regression applied to our trading system
To calculate the coefficient of determination R squared, we must calculate or determine the linear regression. As we explained before, there can be several independent variables, however, for a better understanding we will use the simplest case: A single independent variable.
In the case of an independent variable, the regression or linear dependence of a dependent variable (Y) with respect to an independent variable (X) can be expressed using the formula Y=aX+b. This formula graphs a line in the XY plane, hence the name linear regression.
Now we are going to choose in our trading platform a chart of a currency pair, of our preference, with a clear upward trend in a given period of time.
We download and save this data, then we build a chart in Excel with the closing prices. On the Y axis we will have the closing prices and on the X axis the dates that we will replace with order numbers (for convenience: 1, 2, 3, …..). By doing this, we are going to get a chart with a clearly bullish trend, but we are interested in a quantitative interpretation of that trend.
The easiest way to achieve this is to draw a line that will fit more precisely to the trend obtained in the graph. This straight line is linear regression. If the graph is quite uniform, one or more lines can be drawn that fit or describe our bullish graph. Then a question arises: Which of all these lines is correct?
The correct line will be that straight line in which the sum of the distance of the existing points to the line, be the minimum distance.
It is also important to note that the regression line must always pass through the center of gravity of all the data that make up the point cloud. The coordinate of this point of gravity would be on the X axis, the mean of the X variable, and on the Y axis, the mean of the Y variable. Knowing a point on the line, we can use the slope point equation to determine the equation of straight. Obtaining the correct line we can calculate the coefficients of the linear regression.
4. Pearson’s correlation coefficient.
Once the linear regression has been calculated, we have to calculate the correlation between the line obtained above and the data on which said line was calculated. Remember that correlation is the statistical relationship between two random variables. The correlation can take values ranging from -1 to +1. A value close to zero means that there is no relationship between the measured values, a value of +1 (or very close to it) means a direct relationship of the variables, and a value of -1 (or very close to it) means a relationship inverse of the variables.
Pearson’s correlation coefficient can be calculated using the following formula:
is the covariance of (X, Y)
X: is the standard deviation of the variable X
Y: is the standard deviation of the variable Y
Covariance is a value that indicates the degree of joint variation of two random variables with respect to their means. In other words, it is the common variance between the variables and the standard deviation is the square root of the variance.
The coefficient of Pearson’s correlation shows how well the line describes the data. If the data points are at a large distance from the line, the dispersion is high and the correlation is low, and conversely if the data points are at a small distance from the line, the dispersion is low and the correlation is high. A value of zero says that there is no relationship between the linear regression and the data.
Important, in Metatrader there is a metric called LR Correlation and it shows the correlation between the balance line and the linear regression found for that line. However, in statistics they do not usually directly compare the data and the regression that describes it.
5. Calculation of the coefficient of determination R squared.
In the case of linear regression, to calculate the coefficient of determination R squared it is enough with squaring the Pearson correlation coefficient which we calculated in the previous step.
is the covariance of (X, Y)
X2: is the variance of the variable X
Y2: is the variance of the variable Y
This coefficient can take values ranging from from 0 to +1being a result equal to zero or very close to zero pure unpredictable chance and a result equal to or very close to one a market in which all prices are located on the line.
R squared tells us what percentage of the price movement follows a defined trendwhile the rest of the percentage will be due to random movements.
Limitations on use:
Every statistical metric has its advantages and disadvantages, and the coefficient of determination is no exception. Some disadvantages are:
- They depend on number of operations. Exaggerates rates with few transactions.
- For the calculation, you need some complex mathematical computations.
- It is applicable exclusively for the estimation of linear processes, or the systems that trade with a fixed lotage.
Application in trading systems
In trading systems you can see this ratio represented as a percentage, which often closer to 100% the better (in theory) is the quality of our system. I give you an example:
In my experience, a system with a score above 65 It usually has a fairly stable performance over time. It is one of my favorite filters.
Once analyzed and studied the calculation process of the coefficient of determination R squared, I can tell you that this coefficient is one of the few measures that they calculate the regularity of the curve of both the balance line, as well as the unrecorded profit of the strategy (among others).
R² is easy to use because its range of values is fixed and lies within the limits of -1 to +1. Values close to -1 alert us or notify us of the negative trend of the balance of the strategy. A value close to zero warns us of the lack of trend in the balance of the strategy. Values close to +1 warn of a positive trend.
As I have told you, the square R, just like any other ratio, has its limitations that you must take into account. In my case I use it as a top 3 ratios to measure if I have a valid trading strategy or if, on the contrary, it goes to the trash.
If you have any questions or want to share and complement all this, write me in comments.
Thank you so much for reading!