Statistical Hypothesis Testing Process Explained: Key Concepts

By April Foster, updated May 22, 2025

Table of Contents

There are many cases when one needs to decide whether a judgment is fair. For example, is it true that two datasets come from the same source? That A is a better employee than B? Whether it is faster to walk from home to work, rather than take a bus, etc. If we believe that the initial data for such judgments are, to some extent, random, then the answers can only be given with a certain degree of confidence, and there is some possibility of error. Therefore, when answering such questions, it would be best not only to make the most well-grounded decisions but also to assess the likelihood of an erroneous conclusion — this is where the statistical hypothesis testing process becomes essential.

Consideration of such problems in a rigorous mathematical setting leads to the concept of a statistical hypothesis. The following article discusses questions about what statistical hypotheses are and what methods are available to test them. For a how to write a null and alternative hypothesis guide with examples, visit wr1ter.com.

Null and Alternative Hypothesis Concept

Since statistics as a research method deals with data in which various random factors distort the researcher’s regularities of interest, most statistical calculations are accompanied by verifying assumptions or hypotheses about these data sources.

A statistical hypothesis assumes the properties of random variables or events that we want to test against the available data. Here are some examples of statistical hypotheses in educational research:

Hypothesis 1. Class performance stochastically (probabilistically) depends on the level of student learning capability.

Hypothesis 2. Mastering the elementary course of mathematics does not significantly differ among students who started their studies at 6 or 7.

Hypothesis 3. Problematic schooling in the first grade is more effective than traditional teaching methods concerning students’ general development.

A null hypothesis is the primary testable assumption, which is usually formulated as the absence of differences, the lack of the influence of a factor, the absence of an effect, the equality of the sample characteristics’ values to zero, etc. An example of the null hypothesis in pedagogy is that the difference in the results of two groups of students performing the same test work is caused only by random reasons.

Alternative Hypothesis and Types of Errors in Testing

Another testable hypothesis, called a competing or alternative hypothesis, does not always directly oppose the first. For example, in pedagogy, an alternative hypothesis might state that the work performance levels in two groups of students differ due to non-random factors, such as different teaching methods. Researchers make such comparisons within the statistical hypothesis testing process, which helps determine whether observed differences are statistically significant or simply due to chance.

The hypothesis put forward can be correct or incorrect, so it becomes necessary to test it. Since statistical methods carry out the verification, it is called statistical.

When testing statistical hypotheses, errors (erroneous judgments) of two types are possible:

– you can reject the null hypothesis when it is correct (the so-called type I error);

– you can accept the null hypothesis when it is not true (the so-called type II error).

Errors in Hypothesis Testing: Medical Example

Accepting a null hypothesis error when it is false is qualitatively different from rejecting a hypothesis when it is true. This difference is essential since the significance of these errors is different. Let us illustrate those mentioned above with the following example:

The manufacturing process for a particular medical product is quite complicated. At first glance, insignificant deviations from technology cause the appearance of a highly toxic side impurity. The toxicity of this impurity can be so high that even such an amount that cannot be detected by conventional chemical analysis can be dangerous to the person taking this medicine. As a result, before the newly produced batch is released for sale, it is subjected to toxicity testing by biological methods. Small doses of drugs are administered to several test animals, e.g., mice, and the result is recorded. If the drug is toxic, then all or almost all of the animals die. Otherwise, the survival rate is high.

Researching a medicine can lead to one of the possible courses of action: release the batch for sale (a1), return the set to the supplier for revision, or perhaps destroy (a2).

Differences Between Type I and II Errors

Errors of the two types associated with actions a1 and a2 are entirely different; the importance of avoiding them also differs. Let us first consider the case where action a1 is applied, while a2 is preferable. The medicine is dangerous for the patient, while it is found to be safe. This type of error can cause death in patients using this drug. It is the first kind of mistake, since it is more vital for us to avoid it.

Now let’s consider the case where action a2 is taken while a1 is preferred. It means that the batch of non-toxic drugs was classified as hazardous due to inaccuracies in the experiment. The consequences of an error can result in a financial loss and an increase in the medicine cost. However, the accidental rejection of a perfectly safe drug is less desirable than patients’ occasional death. Rejecting a non-toxic batch of medicine is a Type II mistake.

The admissible probability of a Type I error can be 5% or 1% (0.05 or 0.01).

Significance Levels and Decision Making in Hypothesis Testing

The relevance level is the probability of a Type I error in making a decision (the possibility of erroneously rejecting the null hypothesis).

Alternative hypotheses are accepted if and only if the null hypothesis is refuted. It happens in cases when the differences, say, in the arithmetic means of the experimental and control groups, are so statistically significant that the risk of error to reject the null hypothesis and accept the alternative does not exceed one of the three accepted significance levels of statistical inference:

– the first level – 5%; where the risk of error in assumption is allowed in five cases out of a hundred theoretically possible similar experiments with a strictly random selection of subjects for each experiment;

– the second level is 1%. Accordingly, the risk of making a mistake is allowed only in one case out of a hundred;

– the third level is 0.1%, that is, the chance of making a mistake is allowed only in one in a thousand.

The last level of significance makes very high demands on substantiating experimental results’ reliability and, therefore, is rarely used. In pedagogical research that does not need a very high level of reliability, it seems reasonable to take the 5% level of significance.

Criterion Statistics and Decision Rules in Hypothesis Testing

Criterion statistics represent a function of the initial data used to test the null hypothesis. Usually, criterion statistics take a numeric form, but they can also be other functions, such as multidimensional processes. These statistics play a central role in the statistical hypothesis testing process by helping researchers determine whether the observed data provide enough evidence to reject the null hypothesis.

A criterion for testing a hypothesis is any rule that guides the rejection or acceptance of the null hypothesis. A statistical test acts as a random variable that helps evaluate statistical ideas.

The critical area consists of criterion values at which researchers reject the null hypothesis. Conversely, the acceptance area includes criterion values for which researchers accept the null hypothesis.

The Concept of a Hypothesis in Pedagogy

A research hypothesis is a research methodological characteristic, a scientific assumption that explains a phenomenon and requires experimental verification to become reliable scientific knowledge. A hypothesis differs from a simple assumption in several features. These include:

– compliance with the facts based on which and for the justification of which it was created;

– verifiability;

– applicability to the broadest possible range of phenomena;

– relative simplicity.

In a hypothesis, two points are organically merged: the advancement of a particular position and the subsequent logical and practical proof.

Hypotheses in educational research may suggest that one of the tools (or a group of them) will be more effective than other devices. Here, hypothetically, an assumption is made about the comparative effectiveness of means, methods, and education forms. To validate such assumptions, researchers often rely on the statistical hypothesis testing process to determine whether observed differences in effectiveness are significant or due to random variation.

A higher level of hypothetical prediction is that the author of the study hypothesizes that some measures will be better than others. Still, from some possible scenarios, it seems optimal in terms of specific criteria. Such a hypothesis needs an even more rigorous and, therefore, more detailed proof. For examples of research questions with null and alternative hypotheses, see wr1ter.com.

0 Shares