Data Entry and Data Processing in Clinical Trials: Training Aspects
By Malgorzata Krzeminska-Flowers, Ph.D
Lodz, 92-503 POLAND
Quality of data collected during the clinical trial is the single most important key to its success. There are numerous factors influencing accuracy and timeliness of data collection, entry and processing. This article discusses several issues concerning data quality, data entry and processing. A variety of methods that assure complete and legible data collection and accurate data entry should be used. Adequate training of all personnel involved in data collecting, entry and processing can lower the error rate and therefore increase timeliness of data retrieval. Methods for data acquisition, recording and entry are presented together with techniques to decrease error rate and ensure final high data quality. Final part focuses on the importance of personnelâ?Ts training in this area. The article also includes personal training plan design which needs to cover important topics regarding data collection, entry and processing.
“No study is better than the quality of its data” 
Quality of data collected during the clinical trial is the single most important key to its success. The need for high quality standards for data used in clinical trials has been acknowledged for more than two decades now. Efforts in quality assurance in clinical data management have been increasing, as reflected in the increase of related publications, books, industry guidelines.
Today, the need continues and has become even more important as government agencies and other regulatory bodies rely more and more on the evaluation of electronic data, i.e. the data, which is electronically collected, stored, transmitted, and archived data for critical data-based decision making. Despite the success of harmonization of Good Clinical Practice (GCP) in Europe, Japan, and the United States, no harmonized guideline on good clinical data management is available yet.
There are several sources of errors in study data at the level of data collection, entry and processing :
Deviations from misinterpretations of the protocol when making measurements and observations, or preparing primary documents eg. CRFs
Inaccurate, illegible, or incomplete data recording
Inaccurate or incomplete data transcription to electronic files
Errors or omissions in data and materials transferred between field sites and the coordinating center, between field sites and resource centers, and between the resource centers and the coordinating center
Excess data collection to the extent that it jeopardizes the quality of essential data
Inadequate training of study personnel, especially new or replacement personnel recruited after the start of the study
Intentional data fraud
In the following paragraphs I would like to discuss the issues of data quality in modern clinical trials. The first part of this article will describe in short the procedures of data collection, entry and processing. Based on this information, in the latter part I will present training requirements and solutions to improve data quality by ensuring smaller error rate through adequate training and auxiliary materials.
WHAT IS DATA QUALITY?
In 1999 report  Institute of Medicine (IOM) defined data quality as â?odata that can be used without further revisions or data that will produce conclusions and interpretations that are equivalent to those that would be derived from error-free data, that is, data that are accurate, reliable, and fit for useâ?ť. In the same report, it was recognized that the only way to produce such data is to engineer data quality into the entire clinical trial process. This means that â?oduring all phases of a study, sufficient effort should be spent to ensure that all key data critical to the interpretation of the trial are of high qualityâ?ť.
Data quality can be viewed in two aspects: inherent and pragmatic. Inherent quality refers to the â?ocorrectness or accuracy of dataâ?ť and pragmatic quality is â?othe value that accurate data has in supporting the work of the enterpriseâ?ť. In case of clinical trials, the pragmatic quality of data is ensured by scientific validity of the protocol and proper statement of the research question or problem. Inherent data quality comprises a wide range of other elements: complete and accurate recording of results, proper performance of tests and evaluations, and appropriate record verification and retention, to name just a few. After a suitable study is designed and thoroughly reviewed, assurance of quality is dependent on the behavior of the clinical trial personnel, which is affected by staff training and integrity .
Several changes have occurred over the years which make attention to data quality an issue of highest importance. From their beginnings, clinical trials have evolved dramatically. Firstly, clinical trials are no longer primarily conducted at a single center. Nowadays, they are primarily not only multi-centered but also multinational, therefore trial conduct has become much more complex, both administratively (multiple entities involved in aspects of trial conduct) and scientifically (the number of procedures performed, the number of observations, and design complexity). Increasing scientific complexity of trials is creating constant challenges for the clinical research community, the pharmaceutical industry, and the regulatory agencies.
In addition, the use of electronic record-keeping in the studies has increased dramatically and we can expect for it to continue to grow exponentially. Remote data entry allowed the industry to improve the integrity of clinical trial data collection in the 1980s. Additional timesaving processes evolved during the 1990s via the advent of the Internet . Nowadays, thanks to electronic methods, a high integrity of data collection is obtainable in a relatively easy way. Introduction of regulations (e.g. 21 CFR Part 11 in the U.S.) enabled consideration of electronic records and signatures as generally equivalent to paper records and handwritten signatures. However, it set specific requirements regarding procedures on creation, modification, maintenance, and transmission of to ensure the authenticity and integrity of the records. In addition, the adopted systems must ensure that electronic records are accurately and reliably retained.
STEPS IN DATA COLLECTION AND PROCESSING
Data acquisition, i.e. the actual â?omeasurement processâ?ť, whether it is a blood pressure reading,
Electrocardiogram, or interviewing the participant, and recording of data, is the first and probably most crucial stage in overall data quality management process. Any errors made at this first stage are more difficult to detect and correct than those made later in the process .
Data should be recorded directly on forms that have been specifically designed for data collection in the particular trial. The primary issue in designing data collection forms for a research study is a decision what data should be collected at what time points or intervals. The caveat should be â?oCollecting too much data can be problematicâ?ť and â?oAs volume increases, quality can be compromisedâ?ť. In practice, large chunks of data concerning e.g. clinical care or administrative issue are not needed as part of trial database. Data management process in clinical trial starts with development of trial-specific case report form, which should be designed to (1) capture all data required per the study protocol, (2) collect data elements in standardized format, (3) capture data elements in a fashion that ensures that data are suitable for summarization and analysis, (4) facilitate transcription and subsequent comparison to source documents, (5) avoid redundant and unnecessary collection. It is essential that all and only the necessary data is collected .
It is important that all data is recorded directly on the forms at the time of measurement to minimize the possibility of notes losses or transcription errors. All subsequent changes to both paper and electronic data should be done in a manner enabling identification of primary value, correct data, date of change, and a person who made the change [10, 11].
Data recording refers to transcribing information onto case report forms. Such forms used to be created on paper, but a new trend is to shift towards direct computer entry and computer screens that resemble forms. The success in both approaches equally depends on proper design of forms/data entry screens . Besides traditional paper CRF and newly introduced e-CRFs, other methods of data recording and collection are common: scannable forms (NCR), fax-based forms (Teleform), or data collection directly from patient with the use of touch-tone telephone systems (interactive voice randomization system â?” IVRS). The choice of case report form type may pay a crucial role in later speed of data entry/collection, and, what may be surprising, the e-forms are not always the best choice considering the speed of data collection.
Data entry refers to various modes of entering information into a computer for further processing. Data entry may take a form of direct computer entry by a person transferring data from paper-based CRFs into a computer database, optical mark reading (scanning) or optical character recognition (DATAFAX) for scannable/faxable forms. With electronically captured information the case of data entry per se is practically non-existent, as it is reduced to electronic transfer of information from portable medium into a main database. Aside from the selection of CRF format, each clinical trial faces another choice: the type of data entry system. Information can be entered locally at each site, centrally, upon collection of CRFs from all study sites, or in a web-based format. The choice depends of available resources and clinical trial size, and each of the options has its advantages. In case of local data entry system, data is keyed in onsite by clinic personnel. This allows for quick resolution of data omissions, errors, and inconsistencies, however, it also raises the need for additional local staff training in terms of data entry. When the sponsor/CRO decides to apply a central data entry system, all forms are sent to sponsor or data coordinating center and stored centrally. Only there data is entered by experienced clerks. This allows for speeding up the actual entry process, assuming quick delivery of forms from each clinical site. On the other hand, resolving any issues regarding data omissions, errors, and inconsistencies, will take longer. Another mode of data entry is gaining more and more popularity over the past couple of years, i.e. a web-based data entry system. It combines features of the two previously mentioned systems: data entry by local staff, directly into the central database. There are no specific software or hardware requirements and with wide spread of the Internet, the only specific necessity is providing a secure link for data transmission, using proper protocol, e.g. 128 bit SSL (Secure Socket Layer), or secure hypertext transmission protocol (HTTPS) which are nowadays a standard in most web-based transactions.
Errors in data entry process
â?oTo err is human, but to really mess things up takes a computerâ?ť
During the conduct of any observational studies, including clinical trials, two types of data errors can occur. One type is due to deliberate falsification; the second type results from human or measurement error (such as inaccurate data entries, inaccurate transfers of data, misinterpretation, and inherent limitations of the measurement instruments). Errors resulting from falsified data are always serious and must be dealt with accordingly; other errors may be serious or trivial.
Errors inevitably occur during data entry. The most common errors that usually occur during data entry include (1) typographical errors, (2) copying errors, (3) coding errors, and (4) range errors . Typographical or printing errors, usually take place when someone is typing at a very high key rate. Copying errors are usually the result of poorly filled-in CRFs with a not very legible handwriting, when data entry clerks cannot differentiate between certain letters or numbers. Coding errors occur when personnel filling the CRFs with given codes or data entry clerks make a mistake with wrong assignment of given codes to items on the CRFs. Range errors occur where lower and/or upper limits of known values are exceeded when typing. This type of error can be easily detected with proper database design. In case of scannable and faxable forms, the errors may occur when the optical recognition program is not able to recognize illegible handwriting.
Decreasing error rate in data-entry processes
It is obvious that every clinical study must include procedures to avoid or minimize data errors. A good practice to implement in order to avoid data entry errors is double data entry (i.e., entry by two different entry clerks), which can be considered while using paper-based CRFs. Upon the receipt of CRFs, a working copy of the original CRFs will be made and reviewed by the data manager prior to entry. Data from the working copy of the CRFs will be entered into the database. CRFs will be initialed, dated, and stamped entered after the initial entry. A second entry will occur after the initial entry. The two files are then checked for differences by running a verification program that compares the two files. If the records do not match, discrepancies are printed out and corrections are made. The process is repeated until such a time that each file is an exact replica of the other. Any missing or inconsistent information will be entered on the Query Form forwarded to the clinical monitor for processing. After the information has been verified, i.e., there are no inconsistencies between the initial and the second entries, the CRFs will be initialed, dated, and stamped verified.
Verification ensures that data entered are actually on the CRFs. The data quality of the resulting database is inversely proportional to the chance of having the same in a field by two persons, which is hopefully negligible. The overall error rate from double data entry can be as low as 0.001% .
Logical data verification should be routinely performed in all trials. In addition to logic checks other data quality inspections in on the database are also highly desirable to determine the data quality level regardless of the type of CRFs used. There are statistical methods utilizing sampling techniques to estimate the error rate and data quality quantification.
The process of data entry and verification will most likely result in detection of omissions, errors, or items requiring clarification or changes to CRF. These items need to be written down on the Query Form and forwarded to the clinical monitor for processing. Data queries can be generated at any step during the CRF review, data entry and verification, and data analyses by the data management personnel and/ or trial / project statistician. All data clarifications, regardless of whether they require CRF alterations or not, should be entered on the Query Form and forwarded to be resolved by the designated monitor.
Both ICH and FDA guidelines require that all subsequent changes and corrections must be made in a manner which does not obscure the initial value. Each correction on CRF must be initialed and dated by the investigator or other authorized person and a reason for the change made should be documented. All corrections upon data entry into the database should be made based on information provided on the Query Forms. Upon completion of data entry, original Query Forms should also be dated, initialed, and stamped Corrected to indicate that the entry of the data update is completed.
TRAINING AND CERTIFICATION
In order to standardize clinical trial procedures and minimize data collection and entry errors it is necessary to provide centrally operated training and certification program for field site personnel. The training should include: local data coordinators or managers, key entry personnel, technicians, examiners, counselors, physicians, or dietitians, depending on the study design. Before trial begins, the coordinating center staff should take proactive measures to assure that these individuals understand the study protocol and design, including participant eligibility criteria and plans for analyzing primary and secondary outcomes of interest. All the personnel involved in data entry and collection process, especially site data coordinators and key entry personnel should be certified in study procedures (as described above) for data entry, data transmission, and error correction. They also should be prepared to use the trial’s system for electronic mail, if applicable, and other communication and tracking mechanisms. At least two people at each field site should be trained for each task requiring study certification so that at all times one individual is available to the trial. Because of the time and resources required for training, especially in case of fully or partially electronic data capture, entry and processing, the on-line training program can be considered. Learning via Internet can consist of life no-line meeting and conferences, lectures provided by experienced trainers, real-time tasks, etc. This way staff localized in many centers, even throughout the globe, may access diverse training material at any time, at their own working place. Studying in a well-known environment and possibility to apply gained knowledge immediately in real-life situations was found very effective .
Not every organization remembers that it is prudent to include some review of medical and other research issues as part of the training program, also for data entry and data processing personnel. Coordinating center staff who participate in training programs for field site personnel often are dismayed by how little training in the medical aspects of the study local investigators have provided. Introduction of some medical background information, trial objectives and statistical analysis increases the understanding of whole process and the meaning of data which the staff processes, reducing the amount of logical errors. The training for local data entry and processing personnel should include:
Principles of good research practice regarding documentation and data handling
The quality assurance and monitoring program for the study
Expectations regarding communications with resource centers, in performance monitoring reports, and during site visits.
During the training of personnel, it is advisable that field site staff is given a practical exercise, which they are to solve utilizing the field site microcomputers and their operating systems, data collection procedures, and various study-related procedures. After completed training, the staff members should perform a more detailed certification test. Including practice into the training program allows the staff members to be better informed and more comfortable in their responsibilities [2, 13].
It is very important to remember that physicians and clinical investigators participating in the study also must be trained in all aspects of the study protocol, including data collection, entry and reporting. Other data quality assurance efforts are irrelevant if the physicians do not understand the basic principles of data collection and study design .
Typically, training program for study personnel should be followed by a competency tests for procedures or examinations required by the study protocol, which includes topics of handling data.
Training for data entry and processing personnel
Clinical trial management must ensure that appropriate training is provided for all procedures so that all personnel are able to fulfill their functions. Detailed training records for each individual should be kept in their personnel file. These training records detail three levels of ability :
1. Under supervision
3. Competent to train
Prior to the use of any procedure, management should assess and record each staff memberâ?Ts level of competency for the procedure in his or her training record.
When a trainee requires specific training, the category â?oUnder Supervisionâ?ť should be entered in the training record. Each record and record change needs to be confirmed by signature by both: the trainer or assessor and the trainee. When sufficient training has been given so that the trainee can carry out a procedure without supervision, they will be certified as â?oCompetentâ?ť. Competent individuals, who have sufficient knowledge of a procedure, and necessary skills to transfer their knowledge and expertise, may be certified as â?oCompetent to Trainâ?ť.
Attendance at internal or external training courses on specific topics relating to their work should be recorded in the individualâ?Ts training record, together with the name of the training institution or trainer and the dates of attendance. The maintenance of the training record is the responsibility of the concerned individual. However, it is the responsibility of clinical trial management to ensure that an updated curriculum vitae (CV) is maintained on file for all staff, therefore all personnel should advise management of any significant changes (e.g. increased qualification, finished courses and trainings).
The daily work environment of personnel dealing with data following is affected by various topics. The following is an all- inclusive list of training, which may serve as a reference guide for the development of a master training plan and individual development plans :
Standard Operating Procedures and Departmental Policies
All employees are required to understand and follow these SOPs. Simple confirmation by an employee that a SOP has been read and is understood often constitutes required SOP training, however it is rarely enough. To increase understanding and effectiveness, the trainer should discuss required SOPs with employees and explain how they affect their daily work routine.
Computer Software and Technical Skills
Modern clinical trial environment utilizes various computer software applications to enter, clean and analyze data. These applications, including clinical databases, CRF imaging software, edit specification development, discrepancy management, and others, require practical training for all those employees who use them.
Regulations and Industry Standards
Data management personnel are required to work within the constraints of codes and regulations. Industry standards should not be treated as â?orestraintsâ?ť; they give employees guidance in their common work practices. To improve training schedule for data entry and processing employees, a trainer can make references to information regarding standards such as GCP, ICH Guidelines, FDA Regulations, 21CFR Part 11, as well as GCDMP, which can be found in various publications, educational seminars or Internet web sites.
For many employees, individual development plans provided by a company are a very important part of a training schedule, especially when they allow for the employeeâ?Ts growth outside of the technical skills that are required. Often, the concept of â?otrainingâ?ť is being supplanted by â?olearningâ?ť; this is a shift toward developing learning skills. The main objective of learning is to help an individual become self-directed, with a clear set of objectives and maintenance of records of their progress, such activity is commensurated
with the existence of documented procedures in the regulated environment.
A good learning program will allow individuals to realize their full potential
to the benefit of both parties.
For the data collection and processing to be fully effective, employees must also understand the processes that occur before and after the data is handled by them.
Beyond training: aids and reminders
Another way to improve data collection and processing timeliness and lower the error-rate is creating a number of aids for use by field site personnel. These aids may include data collection schedules for study participants, reminders of upcoming examinations, lists of participants for whom data reporting is delinquent, data records or other items associated with an examination that have not been received at the coordinating center, and lists of edit queries that have not been resolved.
How to Assess Training Effectiveness?
Assessment of the traineeâ?Ts theoretical and practical skills is a necessary part of the training experience. Trainers should cooperate with their colleagues to monitor the traineeâ?Ts development throughout the training period. Continuous feedback from colleagues will help the trainer and trainee to identify areas that require further development or to identify areas in which the trainee has shown significant skills.
The first and most obvious method of assessment whether the trainee gained the intended skills or knowledge is assured through observation by the trainer. If this method is not possible or is too labor intensive (e.g. in case of very wide-spread training programs in global organizations) other assessment options can be applied, such as providing written tests for personnel utilizing questions based on material covered during the training. Especially in the larger organizations, â?oe-learningâ?ť is much more useful as a lot of personnel can be trained in their own time or spare time. The subsequent test usually has a form of multiple choice questions based on instructional material with certain pass mark set. Of course, paper-based systems also can be used, but these require delegation of human resources to produce and to mark the responses, whereas the electronic assessment process saves a considerable amount of time and human resource.
It is obvious that no matter what the form the training, there must be always some robust method of examination or assessment; else we cannot be sure that we have covered all the bases.
Whatever form of assessment is used, it is essential that appropriate records are maintained of these operations in the training records. Therefore, it makes good sense to have a system that can automatically produce a written assessment report. This latter component is another bonus that makes an electronic training and evaluation system so much more attractive .
Completeness and accuracy of clinical trial data greatly affects the statistical conclusions from the data analyses. Data accuracy can be enhanced when researchers do some advanced planning and follow careful procedures. A variety of methods should be used that assure complete and legible data collection and accurate data entry. Adequate training of all personnel involved in data collection, entry and processing, can lower the error rate and therefore increase timeliness of data retrieval. In fierce business such as modern pharmaceutical industry, time means money, and every delay in obtaining accurate and complete trial data means huge losses for the company. This is why the importance of prior training of all staff involved in data handling should not be underestimated. Time and money spent on staffâ?Ts training is never lost, and actually brings a quick return of investment in form of better personnel performance during the clinical trial.
 Friedman L, Furberg C, DeMets DL: Fundamentals of Clinical Trials. John Wright-PSG Inc., Littleton, MA, 1981
 Gassman JJ, Owen WW, Kuntz TE, Martin JP, Amoroso WP: Data Quality Assurance, Monitoring, and Reporting. Controlled Clinical Trials 16:104S-136S, 1995
 Davis JR, Nolan VP, Woodcock J, and Estabrook RW (Eds): Assuring Data Quality and Validity in Clinical Trials for Regulatory Decision Making: Workshop Report. Division of Health Sciences Policy, Institute of Medicine, National Academy Press, Washington, DC, 1991; available on-line at: http://www.nap.edu/catalog.php?record_id=9623
 English LP: Improving Data Warehouse and Business Information Quality. Wiley, New York, NY, 1999
 Concept Paper: Quality in FDA-Regulated Clinical Research. Background to HSP/BIMO Workshop; available on-line at: http://www.fda.gov/oc/initiatives/criticalpath/clinicalresearch.html
 Mitchel JT, You J, Kim YJ, Nardi R, Cheng L, Fein S, Lau A: Clinical Trial Data Integrity.
Applied Clinical Trials, Mar 2, 2003; available on-line at: http://actmagazine.findpharma.com/appliedclinicaltrials
 Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records; Electronic Signatures
 Hosking JD, Newhouse MM, Bagniewska A, Hawkins BS: Data Collection and Transcription. Controlled Clinical Trials 16:66S-103S, 1995
 Chow SC, Liu JP: Design and Analysis of Clinical Trials: Concepts and Methodologies, Second Edition, John Wiley & Sons, Inc., 2004
 Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB: Designing Clinical Research: An Epidemiologic Approach. Lippincott, Williams & Wilkins, Philadelphia, PA, 2001
 International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use: Guideline for Good Clinical Practice. 1996
 No More Pencils, No More Books, Future Pharmaceuticals Podcast, available at http://www.futurepharmaus.com/Default.aspx?mc=no-pencils&page=tl-viewwebevent
 Good Clinical Data Management Practices. Society for Clinical Data Management, Version 2, January 2002
 Ogg GD: A Practical Guide to Quality Management in Clinical Trial Research. Taylor & Francis Group, LLC, Boca Raton, FL, 2006