### Intuition for higher moments in circular statistics

In circular statistics, the expectation value of a random variable $Z$ with values on the circle $S$ is defined as$$m_1(Z)=\int_S z P^Z(\theta)\textrm{d}\theta$$(see wikipedia).This is a very natural definition, as is the definition of the variance$$\mathrm{Var}(Z)=1-|m_1(Z)|.$$So we didn't need a second moment in order to define the variance!Nonetheless, we define the higher moments$$m_n(Z)=\int_S z^n P^Z(\theta)\textrm{d}\theta.$$I admit that this looks rather natural as well at first sight, and very similar to the definition in linear statis...Read more

### mathematical statistics - intuition for moments about the mean of a distribution?

can someone provide an intuition on why the higher moments of a probability distribution p(x) like the third and fourth moments correspond to skewness and kurtosis, respectively?specifically, why does the deviation about the mean raised to the 3rd or 4th power end up translating into a measure of skewness and kurtosis? Is there a way to relate this to the third or fourth derivatives of the function? consider this definition of kurtosis:$Kurtosis(X) = E[(x - \mu_{X})^4] / \sigma^4$again, not clear why raising $(x-\mu)^4$ gives "peakedness" or wh...Read more

### mathematical statistics - How to rigorously define the likelihood?

The likelihood could be defined by several ways, for instance :the function $L$ from $\Theta\times{\cal X}$ which maps $(\theta,x)$ to $L(\theta \mid x)$ i.e. $L:\Theta\times{\cal X} \rightarrow \mathbb{R}$.the random function $L(\cdot \mid X)$we could also consider that the likelihood is only the "observed" likelihood $L(\cdot \mid x^{\text{obs}})$in practice the likelihood brings information on $\theta$ only up to a multiplicative constant, hence we could consider the likelihood as an equivalence class of functions rather than a functionAnot...Read more

### mathematical statistics - Expected value of a natural logarithm

I know $E(aX+b) = aE(X)+b$ with $a,b$ constants, so given $E(X)$, it's easy to solve. I also know that you can't apply that when its a nonlinear function, like in this case $E(1/X) \neq 1/E(X)$, and in order to solve that, I've got to do an approximation with Taylor's.So my question is how do I solve $E(\ln(1+X))$?? do I also approximate with Taylor?...Read more

### Does differential geometry have anything to do with statistics?

I am doing master in statistics and I am advised to learn differential geometry. I would be happier to hear about statistical applications for differential geometry since this would make me motivated. Does anyone happen to know applications for differential geometry in statistics?...Read more

### mathematical statistics - Null distribution of subspaces similarity, or what is the distribution of $\mathrm{tr}(AA'BB')$?

What is the distribution of $\mathrm{tr}(AA'BB')$ where $A$ and $B$ are two random matrices of $d \times k$ size with orthonormal columns?Maybe the expected value is easier to compute? A fallback solution would be to use a simulation. What would be the most effective scheme? Typical values for $d$ would be around 2000, while $k$ ranges from ~10 to a few hundreds.Below is a more detailed account of my problem and its context, how I ended up to ask this question and what I tried.ContextI want to check if the principal components computed from a s...Read more

### references - Path to mathematical statistics without analysis background: ideal textbook for self study

I'm fairly mathematically inclined — had 6 semesters of Math in my undergrad — though I'm a bit out of practice and slow with say partial differential equations and path integrals my concepts come back with a bit of practice. I have not had a course on mathematical proofs (mathematical thinking) or one on analysis.I also understand graduate level probability — have studied it formally and refreshed my knowledge lately.I also have had a couple of graduate level courses on statistics and statistical learning.I want to, out of personal interest, s...Read more

### mathematical statistics - Can CCA model any linear transformation?

I have recently been looking into canonical correlation analysis (CCA) as a way to map between different spaces. As I understand it, CCA maps data from both distinct spaces to a common (possibly lower dimensional) space where they can be compared. It works in a similar way to PCA, choosing the direction from each input space which maximises the correlation between datasets, subject to the chosen directions being uncorrelated. Now, the descriptions I've seen suggest that CCA can learn any linear transformation. However, I can't see how it's poss...Read more

### mathematical statistics - Random Variable

Three components are randomly sampled, one at atime, from a large lot. As each component is selected,it is tested. If it passes the test, a success (S) occurs; ifit fails the test, a failure (F) occurs.Assume that 80%of the components in the lot will succeed in passing thetest. Let X represent the number of successes amongthe three sampled components.What are the possible values for X? And There Probabilities ?...Read more