Shadow Data: What Is It, Where It Hides And Why Is It A Compliance Liability

Apr 9, 2026

Article by

Shadow data functions outside the focus of compliance frameworks in the form of undetected and matured data as compared to structured databases and cloud data. This category encompasses personal information copied, moved, or stored outside official data pipelines. These copies often lack encryption, access controls, or retention policies. For Data Fiduciaries operating under the Digital Personal Data Protection Act, 2023 (DPDP Act), shadow data represents a liability that conventional security standards cannot address.

Understanding the Shadow Data Problem

Shadow data differs conceptually from shadow IT. The latter refers to unauthorised systems or devices. The former concerns with data that systems generate, replicate, and leave behind. A 2026 report by Netskope found that nearly half (47%) of employees using generative AI tools in the workplace do so through personal accounts. This results in an average of 223 data policy violations per organisation per month as data cannot be tracked back to by the employer. Each violation potentially creates a shadow copy of personal data. That data may be sent to an unvetted large language model, stored in a personal account's chat history, or retained on a vendor's server outside the data fiduciary's control.

The compliance risk is situated around three characteristics of shadow data: storage and management, consent, remedies. These factors play a fundamental role in data protection. First, the lack of inventory management because the data fiduciary cannot list it in records of processing activities because they do not know it exists. Second, the lack of consent framework as the original data principal's consent, obtained for a specific purpose, does not extend to secondary shadow copies processed by unauthorised tools. Third, it is unremediable; once personal data enters a shadow environment, the fiduciary loses the technical ability to modify, erase, or transfer it in response to a data principal request.

For instance, a marketing analyst at a financial services firm copies a customer database into a personal Google Sheet to perform offline analysis. The sheet contains names, PAN numbers, and transaction histories. The analyst shares the sheet via a link to three colleagues. Six months later, the analyst leaves the organisation. The Google Sheet remains accessible to the colleagues and to anyone who received the forwarded link. The firm has no record of this copy, no access logs, and no ability to delete it. This data constitutes shadow data as it is ungovernable and accessibility to personal data outside the fiduciary’s system will pose cybersecurity concerns.

Legal Exposure Under the DPDP Act 2023

The DPDP Act constructs liability around control of data over custody. Section 6 requires every processing of personal data to be undertaken only for a lawful purpose for which the data principal has given consent. Shadow data processing typically occurs without a compatible lawful purpose, and critically, without the Data Fiduciary's ability to demonstrate compliance with Section 8(5) security safeguards. Section 8(5) imposes obligations on Data Fiduciaries to implement reasonable security safeguards to prevent personal data breaches. The presence of unmanaged shadow copies directly undermines this obligation. A fiduciary cannot secure data whose location and access patterns they cannot map.

The penalty structure under Schedule 1 of the DPDP Act authorises the Data Protection Board to impose financial penalties of up to ₹250 crore per breach for failure to take reasonable security safeguards. ₹250 crore penalty under Schedule 1 attaches to the failure to implement reasonable safeguards (Section 8(5)) – the existence of shadow data itself can trigger this, separate from any subsequent breach notification penalty.

Critically, liability attaches regardless of whether the fiduciary knew about the shadow data. The test is objective: did the fiduciary implement measures that a reasonable person would consider adequate to prevent unauthorised processing? Allowing employees to paste personal data into personal LLM without technical controls almost certainly fails this standard.

With the prevalence of AI use in workplaces; to understand the impact lets take for instance a healthcare company's employee pastes patient diagnostic reports into a personal ChatGPT account to summarise findings. OpenAI's privacy statement permits using submitted content to improve models unless users opt out. The patient data becomes part of the model's training corpus. Another user of the same AI tool later receives a response that includes fragments of the diagnostic report. The healthcare company faces a breach notification obligation under Section 8(6) of the DPDP Act. The penalty could reach ₹250 crore. The company cannot argue that it did not authorise the transfer. The objective test looks only at whether reasonable safeguards were in place.

Recent enforcement actions in comparable jurisdictions confirm the materiality of this risk. In October 2025, the Australian Federal Court imposed a A$5.8 million penalty on Australian Clinical Labs following a ransomware attack that exposed health records of 223,000 individuals on the dark web. The court found that the company had failed to implement data loss prevention tools, lacked multi-factor authentication, and relied on an inadequate third-party incident response. The case illustrates a broader principle where courts examine the adequacy of preventative measures and not merely the breach's magnitude.

Where Shadow Data Hides

Practical experience suggests five common loopholes of shadow data within regulated entities:

Personal LLM accounts: Employees paste customer information, internal memos, or code into freely available generative AI tools without realising that the input becomes training data or persists on vendor servers. The volume of data being sent to SaaS-based generative AI applications has grown sixfold, from 3,000 to 18,000 prompts per month per organisation.
Legacy backups and decommissioned systems: Organisations retain tape backups, old server images, or spreadsheet archives containing personal data long after the original processing purpose expires. These assets lack active access controls but remain within the fiduciary's legal responsibility.
Data analytics sandboxes: Data scientists routinely copy production datasets into exploratory environments for analysis. These copies often strip out governance metadata but retain personal identifiers, creating unmanaged processing instances.
Business unit spreadsheets: Marketing, human resources, and sales teams maintain local files containing customer or employee personal data. These files are shared via email or USB drives, entirely outside information technology monitoring.
API response caching: Developers cache application programming interface responses containing personal data to improve performance. Without explicit cache invalidation policies, these copies persist indefinitely in memory stores or log files.

How GoTrust Operationalises Shadow Data Governance

Addressing shadow data requires a systematic approach of discovery, classification, monitoring, and remediation. Under the DPDP Act 2023, unidentified data creates significant compliance risk, particularly for erasure requests and breach notification.

Discovery and Classification: Automated discovery tools scan structured and unstructured data across cloud, SaaS, on-premises, and hybrid environments to identify personal data and create a complete inventory. Classification systems use machine learning and pattern detection to identify PII, financial records, and health information, mapping each asset to applicable frameworks including DPDP Act requirements.
Continuous Monitoring (DSPM): Data Security Posture Management focuses on the data tracking where sensitive data resides, who has access, and how it is being used. DSPM evaluates access governance by detecting over-privileged users, public exposure, and misconfigurations, enabling prioritization of highest-risk shadow data first.
Automated Rights Request Workflows: Under Section 12 of the DPDP Act, data principals have the right to correction and erasure, with Rule 14 of the DPDP Rules 2025 requiring organizations to establish mechanisms for receiving and processing such requests. Automated platforms search across databases, cloud storage, and legacy systems including shadow data to retrieve all personal data associated with a requester. The platform verifies identity, executes secure deletion across all identified locations, and maintains audit logs.
Breach Management: Section 8(6) of the DPDP Act requires Data Fiduciaries to notify the Data Protection Board of personal data breaches. Rule 6 mandates continuous monitoring and logging for forensic readiness. An integrated platform maintains immutable logs of discovery results, classification decisions, and access patterns, providing breach reporting with risk assessment, root cause analysis, and notification templates for official reporting.

While no platform can eliminate all shadow data. Effectiveness depends on API coverage (e.g., Dropbox, Google Workspace, Snowflake, Salesforce). Organisations must combine DSPM tools with employee training and Data Loss Prevention (DLP) rules for unmanaged endpoints to catch what automated discovery cannot, such as screenshots of PII or offline copies.

From Discovery to Governance: A Unified Framework

Shadow data is structurally incompatible with thr DPDP Act's emphasis on proactive compliance. A fiduciary cannot comply with Section 12's grievance redressal mechanism if they cannot locate all copies of a data principal's information. It does not adequately conform with the withdrawal of consent under Section 7 if shadow copies continue to be processed after consent is revoked.

GoTrust's approach addresses this through a unified framework that combines four layers. To illustrate this, an employee creates a shadow copy of customer data in a personal Dropbox account. GoTrust's continuous discovery scans the Dropbox environment through API integration. The platform detects the file, classifies its contents as personally identifiable information, and assigns a risk score of 90/100 due to the lack of encryption and the personal account's weak access controls. The DSPM module evaluates who has access to the Dropbox file. It finds that the employee shared the file publicly by generating a shareable link. The platform triggers an automated remediation workflow. The compliance team receives an alert. The system creates a ticket in Jira assigned to the employee's manager. The platform also sends an automated message to the employee instructing them to remove the file and revoke the shareable link. Within two hours, the manager verifies removal. The entire sequence is logged in the audit trail. When the company's annual audit under Rule 13 occurs, the compliance team exports a report showing the incident, the response, and the remediation confirmation. The auditor accepts this as evidence of reasonable security safeguards.

Conclusion

Shadow data poses a predictable outcome where access and flexibility is prioritized over control of the data and governance that the data protection framework seeks to unravel. The DPDP Act's liability framework does not distinguish between intentional processing and accidental processing. For Data Fiduciaries, the compliance obligation is binary: either they maintain visibility and control over all personal data within their possession, or they accept the risk of penalties, litigation, and regulatory enforcement. To ensure compliance organisations must treat shadow data discovery as a prerequisite, not a remediation step. This can be done by deploying technical controls that make unauthorised copying difficult. Platforms such as GoTrust's DSPM and DPO Copilot provide the infrastructure for this transition by transforming shadow data governance into an automated compliance step that does not wait for a trigger. In a threat environment where personal LLM accounts proliferate and data policy violations double year over year, governance must focus on whether data fiduciaries have placed adequate security infrastructure to detect and manage shadow data.