Data collection and results generalization stage | Methodology and Registers | GUS - Portal Informacyjny

Data collection and results generalization stage

Data collection and results generalization stage

After drawing the sample, the stage of data collection begins and then generalization of the results, i.e. generalizing information from the sample level to the population. This is achieved using the methods of weighting (assigning weights to units from the sample before the survey) and reweighting (weighting after conducting the survey, correction of weights). Weighting methods are also used to deal with missing values and achieve consistency with data from other sources - they can reduce bias and increase precision by using additional information.

When generalizing results from the sample to the population, the probabilities of inclusion of a given unit in the sample, related to the established sampling scheme, the so-called inclusion probabilities (πh), are of crucial importance. The inverse of the inclusion probability is called the sampling weight (1/πh), and assigning weights to the appropriate units under observation is the so-called weighting process. In stratified sampling, in which units from each stratum are drawn by simple random sampling without replacement, the inclusion weights for each stratum can be described by a simple formula: ah  = Nh/nh (where Nh is the population size in a given stratum and nh is the sample size in stratum h). If non-responses occur, sampling weights must be modified. The modified weights are the inverse of the result of multiplying the inclusion probability (πh) and the response probability. We estimate the response probability on the basis of available data, the simplest way is to adopt the so-called completeness indicators, i.e. quotients of the number of units examined to the number of units that should be examined, in other words, the response probability is determined as a proportion of the number of responses given/observations examined in relation to the planned number of responses given/observations examined less the number of units in the sample outside the scope of the survey.