 # -4- Statistical Learning Theory – More On The Law Of Large Numbers .prm09

So I would get back with the previous formulation reusing we start maybe resume with this step we stopped here from I equals 1 to M which basically our classifiers empirical risk from one classifier fi to its expected risk like this and if that is greater than or equal to some epsilon there are some different formulations if you if.

You get like there are books which doesn’t use.

The equal sign which is greater than but actually it’s an open ball and that open ball might have I mean topological terms it’s like an open ball so you can have an open ball with a boundary or without a boundary and that doesn’t make like that’s a great difference on the other side also the turns from 1 to M which.

Are equal to 2 times initial value this is after the law of large numbers of large numbers actually I just did not provide.

Very useful a very common to be used there is like a short notation for that and I believe that’s that’s.

Probably interesting to to introduce now that notations like this we can say that.

Vapnik was interested in formulating the statistical learning theory like this the probability on the supreme.

Value for every function and inside some bias inside some wing bias in terms of empirical risk of.

That function and expected risk of the same function to be greater than or equal to Epsilon that’s somehow actually we can compute exactly compute this guy if we have set of functions I gonna say something about that afterwards but before maybe we could remember those terms just to bring back the notion of the risks so here I have some risk or error in a sample in.

A given sample it’s always in a sample and here we have the expected risk which is in the whole population what means that I have to have access to the joint probability distribution to compute that so it doesn’t it doesn’t matter if you’re having like lots of examples could.
Be like 1 million 1 billion examples.

As I don’t have access to the joint probability distribution I just can’t compute this guy just I’m computable in fact so there is a way of dealing.

With that guy we’re gonna talk about that later but before I think you guys remember last last time we’re discussing that this could be computed somehow using this probability so this is actually the same as a probability of a function and the empirical.

Risk of the function being selected let’s say the first function the first classifying sites and bias minus the expected risk of the same classifier to D greater than.

Or equal to Epsilon or the second third until the last was if I we have inside advice so somehow we are supposing there are countable classifiers inside space inside the bias of some learning this is actually the same so it’s like instead of using just one of those functions I’m using the worst case scenario I’m just deciding to.

Select the worst as possible function inside advice what this means I mean it’s better draw that because that’s.

Gonna be easier to understand I suppose so if I have a space like could be a square like this representing all possible functions in the universe every function and it’s it’s not difficult to imagine that.