Welcome to the mathematics reference desk.

Select a section:

WP:RD/MA

Want a faster answer?

Main page: Help searching Wikipedia

How can I get my question answered?

Select the section of the desk that best fits the general topic of your question (see the navigation column to the right).
Post your question to only one section, providing a short header that gives the topic of your question.
Type '~~~~' (that is, four tilde characters) at the end – this signs and dates your contribution so we know who wrote what and when.
Don't post personal contact information – it will be removed. Any answers will be provided here.
Please be as specific as possible, and include all relevant context – the usefulness of answers may depend on the context.
Note:
- We don't answer (and may remove) questions that require medical diagnosis or legal advice.
- We don't answer requests for opinions, predictions or debate.
- We don't do your homework for you, though we'll help you past the stuck point.
- We don't conduct original research or provide a free source of ideas, but we'll help you find information you need.

Ready? Ask a new question!

How do I answer a question?

Main page: Wikipedia:Reference desk/Guidelines

The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.

April 15[edit]

What does it mean when we say that data is normally distributed?[edit]

Suppose we have a set $S$ of data which is possibly infinite. When we say that the data is normally distributed, mathematically what do we mean? My understanding is that we consider the probability space $(\mathbb {R} ,{\mathcal {B}},P)$ where ${\mathcal {B}}$ is the sigma algebra of all real Borel sets, $P(B)=\int _{B}f(x)dx$ where $f(x)$ is the normal PDF with the mean and variance given by the descriptive data. On this probability space we further have the identity function as a random variable $X$ . Now the statement that the data is normally distributed means (according to my understanding), that the frequency polygon of the data coincides (approximately) with the graph of the PDF. I have not been able to find this written anywhere however, and hence I am asking this question to clarify whether my understanding is correct or not? Is it correct, and if so can it be made more rigorous. If not, what is the correct meaning of saying that the data is normally distributed.-- Abdul Muhsy talk 07:26, 15 April 2022 (UTC)[reply]

The term is often used somewhat loosely, not to say sloppily, when the authors actually should have said that the sample distribution is not significantly different from a (best-fit) normal distribution, as might be revealed by a normality test. Probably, the test they applied was the squinting test: squinting their eyes and noticing some similarity. In such cases one should not be surprised they are not more precise. Are you aware of cases where the claim is applied to an infinite set? --Lambiam 09:37, 15 April 2022 (UTC)[reply]

Thanks. The first option given by you, (given a set of data apply a normality test and if there isn't sufficient evidence to reject normality, then it is normal) is rigorous enough but a person not well versed in statistical theory will probably not understand the justification easily. Your second criteria of squinting is somewhat similar to my undersanding and seems more satisfying visually. To clarify, are we taking the frequency polygon (after standardizing the data), squinting and then 'seeing' whether it is resembling the pdf of N(0,1)? If so, is standardizing necessary? If we are just looking at the data (and not at any visual representation of it) what exactly are we looking for? Thirdly, to answer your question I think it is possible to construct a number whose digits follow any given distribution but I am not competent enough to understand fully the proof [1]-- Abdul Muhsy talk 11:43, 15 April 2022 (UTC)[reply]

To start with the last point, if we have an infinite sequence

(s_{1},s_{2},...),

we can take the initial segments

I_{1}=(s_{1}),

I_{2}=(s_{1},s_{2}),...,I_{n}=(s_{1},s_{2},...,s_{n}),...,

and hope that the distributions of

I_{1},I_{2},...

converge to a limit distribution. This may or may not be the case, depending on the sequence. If we are only given the set

S=\textstyle {\bigcup _{i}}\{s_{i}\},

there is not enough information to assign a limit distribution; a new sequence

(s'_{1},s'_{2},...)

obtained by reshuffling the elements of the old sequence can have a definite but different limit distribution, while containing the same set of elements

S=\textstyle {\bigcup _{i}}\{s'_{i}\}.

I don't know how squinters reach their conclusions; you'd have to ask them personally. I expect that if you ask one hundred statisticians to graph a normal distribution by hand, at least 99 will be seriously off, mostly because the tails will be far too fat. If true, this does not bode well for outlier tests based on sample estimates of the dispersion applied to a population distribution assumed normal. --Lambiam 17:42, 15 April 2022 (UTC)[reply]

It doesn't have any mathematically precise meaning to say that observed data are normally distributed. It's actually the process that generates the data that gives rise to a probability distribution. It's still useful in practice to talk about observed data being normally distributed, and I trust without having read it that your discussion with Lambiam has covered the appropriate points.

But if you're bringing up Borel sets, you're talking about a different level of the discussion. This gets into philosophy and interpretation of probability pretty fast. --Trovatore (talk) 18:21, 15 April 2022 (UTC)[reply]

April 19[edit]

Proof by induction for real numbers?[edit]

Is there a parallel for proof by induction which works with real numbers, replacing the induction step "if the statement holds for any given case n = k, then it must also hold for the next case n = k + 1" with "if the statement holds for any given case n = k, then it must also hold for the next case n = k + 1 / r" and letting r become arbitrarily large? 2A01:E34:EF5E:4640:44FE:12D:9E52:1C3D (talk) 11:00, 19 April 2022 (UTC)[reply]

There is no proof by induction for real numbers, as there is no such thing as the "next case". I don't know if you'd consider it a "parallel", but there are ways to reduce proofs about all real numbers to smaller steps. For example, you can use the fact that the real numbers are connected to prove that a statement is true for all real numbers by showing that the set where it holds is nonempty, open and closed. So you can show that (i) your statement holds for 0 (ii) if it holds for a real number r, then there is s>0 such that it holds for all numbers in the interval (r-s,r+s), (iii) if it holds for a convergent sequence of real numbers, it holds for their limit. Then (i)+(ii)+(iii) imply your statement holds for all real numbers. —Kusma (talk) 11:13, 19 April 2022 (UTC)[reply]

This sounds a bit like Proof by exhaustion. --Jayron 32 11:59, 19 April 2022 (UTC)[reply]

ZFC implies the existence of a well-order on the set of real numbers, which would seem to imply the possibility of using transfinite induction. Unfortunately, the existence proof is completely non-constructive; we have no way of actually defining this order, so there is no way to prove any of the three cases (zero, successor and limit) other than by first proving the conclusion being sought. --Lambiam 13:40, 19 April 2022 (UTC)[reply]

Anyway, extending a proof to n = k + 1/r would only cover rational numbers, not all the reals (assuming r is an integer). -- Verbarson ^talk_edits 08:48, 21 April 2022 (UTC)[reply]

And proofs by induction for the rationals are feasible. The well-known proof by infinite descent of the irrationality of √2 is a proof by mathematical induction in disguise. --Lambiam 11:15, 21 April 2022 (UTC)[reply]

Wouldn't Cantor's diagonal argument also qualify as inductive? --Jayron 32 12:26, 21 April 2022 (UTC)[reply]

It is a famous example of a proof by contradiction.^[2] --Lambiam 13:23, 21 April 2022 (UTC)[reply]

April 21[edit]

Wikipedia:Reference desk/Mathematics

Contents

April 15[edit]

What does it mean when we say that data is normally distributed?[edit]

April 19[edit]

Proof by induction for real numbers?[edit]

April 21[edit]

Navigation menu

Wikipedia:Reference desk/Mathematics

April 15[edit]

What does it mean when we say that data is normally distributed?[edit]

April 19[edit]

Proof by induction for real numbers?[edit]

April 21[edit]

Navigation menu

Search