Consumer wearable devices have become common tools for health tracking, but accuracy varies widely by what is being measured and how the device is used. Research and regulatory review show some measurements are reliable in controlled settings, while others remain noisy or biased in real-world use. Clinicians and public-health researchers emphasize that wearables are best treated as adjuncts to care, not replacements for clinical tests.
Which measurements are relatively reliable?
Heart rate measured by wrist photoplethysmography is generally acceptable at rest. Michael Snyder at Stanford University and colleagues have demonstrated that continuous heart-rate trends from consumer devices can detect meaningful changes in physiology, such as illness-related deviations. The U.S. Food and Drug Administration has cleared specific smartwatch features, for example ECG and irregular rhythm notification tools, after clinical validation; FDA clearance applies to particular algorithms and modes, not the whole device. Step counting and basic activity duration are reasonably accurate for tracking trends in most people, a finding consistent with physical-activity measurement work led by Kristin Evenson at the University of North Carolina. Sleep staging, calorie expenditure, and advanced hemodynamics remain less reliable because they depend heavily on proprietary algorithms and modeling assumptions.
Causes and consequences of measurement errors
Most consumer wearables rely on photoplethysmography (PPG) sensors or inertial measurement (accelerometers). PPG measures blood-volume changes with light, and accuracy depends on skin tone, ambient light, wrist placement, device fit, and motion. Studies have documented systematic underperformance of some PPG-based readings in people with darker skin, producing health equity concerns and potential misdiagnoses for marginalized groups. Motion artifacts during vigorous exercise can reduce heart-rate accuracy and lead to erroneous calorie or activity estimates. Algorithmic differences across manufacturers mean one device’s step count or sleep score may not compare directly to another’s.
Clinical and social consequences follow from these limitations. False negatives—missed arrhythmias or infections—can delay needed care, while false positives generate anxiety and unnecessary medical visits, straining clinical services and adding cost. Cultural factors shape how wearables are used: in some communities devices are embraced for preventive health and fitness, while in others cost, privacy concerns, or distrust of technology limit uptake. Environmental context matters too; occupations involving heavy manual labor can produce poor data quality if devices shift or are covered during work.
Practical use requires judgment: treat wearable outputs as trend indicators, not definitive diagnoses. When a device flags a serious issue, seek clinical confirmation with validated medical testing. Industry and academic collaborations, along with regulatory oversight by the U.S. Food and Drug Administration and ongoing research from groups such as Eric Topol at Scripps Research, are improving validation standards and transparency. Continued attention to algorithmic fairness, community-specific validation, and clear communication will determine whether wearables fulfill their promise to support equitable, practical health monitoring.