Additional resources for users of the Veneto Workers History dataset (VWH)

Here, I provide some additional information on the VWH dataset. My hope is that these prove to be useful tips for a first-time user of the data. Some of the information on this page is already in the readme file of the replication package in Differences in On-the-Job Learning across Firms, but some other things mentioned here go beyond the strictly necessary to replicate the paper. Click here for the actual replication files for Differences in On-the-Job Learning across Firms.

Background and access

The VWH was originally developed by the Economics Department in Università Ca’ Foscari Venezia under the supervision of Giuseppe Tattara. A copy of the dataset can easily be obtained through the Fondazione Rodolfo Debenedetti (fRDB), following the instructions here. The dataset description provided by fRDB is here.

Here is a non-comprehensive list of published papers using these data: Card et al., 2014; Battisti, 2017; Bartolucci et al., 2018; Serafinelli 2019; Kline et al., 2020; Hong and Lattanzio, 2025. Their replication files might be useful resources too.

Files

The VWH is composed of three data files, which can be linked to each other:

  • Person-level dataset (anagr.dta)

  • Firm-level dataset (azien.dta)

  • Contract-level dataset (contr.dta)

Workers are uniquely identified by the variable cod_pgr. Firms are uniquely identified by the variable matr_az. These variables can be used to link the person-level dataset anagr.dta and the firm-level dataset azien.dta to the contract-level dataset contr.dta.

Sampling

VWH covers the employment histories of individuals who ever work in the Italian region of Veneto between 1975-2001.

  • If a worker leaves Veneto and is employed elsewhere in Italy, information on that non-Veneto contract and non-Veneto employer is also recorded in VWH. As such, there is no sample attrition due to inter-regional migration. However, this implies that the sample is representative of individuals who ever work in Veneto, and the firm-level data is representative of Veneto firms, even if it includes information on some non-Veneto firms.

  • Note that the fRDB VWH documentation file mentions that the data only covers the provinces of Vicenza and Treviso. However, the data clearly encompasses the entire Veneto region. Kline et al. (2020) reach a similar conclusion.

  • There are some missing data issues before the mid 1980s. In Differences in On-the-Job Learning across Firms, we carry out our analysis 1984-2001. Kline et al. (2020) do 1985-2001 instead.

Sectors

The dataset azien.dta includes the variable ateco81 reflecting a firm’s 3-digit sector, following the ATECO classification from 1981. This PDF file lists the descriptions of each of the 3-digit 1981 ATECO codes.

Geography

Firms: The dataset azien.dta includes firms’ full addresses:

  • Address (indirizzo)

  • Municipality name (comune)

  • Postal code (cap)

  • Province (prov)

  • Municipality statistical code (cod_com)

The statistical code cod_com can be linked to other municipality-level datasets. Here is a spreadsheet with Italian municipalities and their statistical code.

Workers: The dataset anagr.dta includes geographical information on workers’ place of birth:

  • Municipality of birth name (com_n)

  • Province of birth (prov_n)

Note, however, that anagr.dta does not include municipality code, so any linkage to external data would have to be done using municipality names.

Also note that anagr.dta includes workers’ municipality and province of residence (com_r and prov_r). However, it is unclear which is the time period in which an individual’s residence is recorded.

Occupations

The dataset contr.dta includes information on the type of position each employment contract is assigned to. This information is recorded in the variable qualif. This variable does not capture occupation in the sense of ISCO or SOC, but in essence classifies workers into managers (dirigente), white-collar (impiegato), blue-collar (operaio), and apprentices (apprendista):

  • The variable qualif takes on many different values, which are listed and named in this spreadsheet.

  • In Differences in On-the-Job Learning across Firms, we use the names attached to each qualif value to map qualif values into managers, white-collar, blue-collar, and apprentices. This is the crosswalk.