Blog | March 18, 2025
Taming Modern Data Challenges: The Three Vs of Big Data Today
In today’s digital era, it’s no secret that organizations generate and manage vast volumes of data. The term we’ve applied to the data explosion is “Big Data”, which was first coined in the 1990s by John Mashey, a computer scientist who worked at Silicon Graphics. The term gained widespread popularity in the early 2000s.
In previous blog posts over the years, we’ve looked at the growth of Big Data, and – guess what? – it keeps on growing, faster than ever. Per Statista, the current estimate for the total amount of data created, captured, copied, and consumed globally is forecast to rise to 182 zettabytes (i.e., 182 billion terabytes) in 2025.
This estimate has now been forecasted to 2028. Between 2025 and 2028, data in the world is expected to more than double again to 394 zettabytes by 2028. This explosion of information has necessitated a paradigm shift in how businesses approach information governance, compliance, and eDiscovery.
Understanding the Three Vs of Big Data
But Big Data isn’t just about the amount of data. It’s also about the speed at which new data is being created. And it’s additionally about the various formats in which modern data is created. The volume, velocity, and variety of data today are known as the three Vs of Big Data:
- Volume – As noted above, the amount of data generated and stored by organizations has reached staggering levels. With enterprises dealing with petabytes of information, traditional data storage and retrieval methods are no longer sufficient. Managing large volumes of data requires sophisticated strategies to identify, collect, and preserve relevant information without unnecessary burden or cost.
- Velocity – Data is being created at an unprecedented pace. Real-time communications, social media, collaboration tools, and IoT devices contribute to an ever-accelerating flow of information. Organizations must be able to process, analyze, and respond to data in real time, particularly for eDiscovery use cases like regulatory investigations, compliance audits, or (of course) litigation.
- Variety – Modern data comes in multiple formats, including structured (databases, spreadsheets), semi-structured (emails, chat messages), and unstructured (videos, images, audio recordings). The diversity of data sources presents unique challenges in classification, extraction, and review, making it critical to leverage advanced technologies to ensure that relevant data is identified accurately.
These three Vs have fundamentally changed how organizations manage their information, and their influence is particularly pronounced in the field of eDiscovery. As legal and compliance teams grapple with data complexity, understanding the impact of these three Vs is essential for efficient, cost-effective, and defensible eDiscovery management.
How the Big Data Era Is Changing Information Governance
The proliferation of Big Data has required organizations to transform how they govern their information. Traditional data management approaches based on structured repositories and manual review processes are no longer viable. Instead, organizations are increasingly investing in technology-driven solutions that allow for automated classification, predictive analytics, and AI-driven insights. As a result, businesses are shifting toward proactive information governance frameworks to ensure data remains both an asset and a manageable resource, reducing inefficiencies and ensuring compliance with regulatory requirements.
This means that instead of addressing data challenges only when litigation or regulatory inquiries arise, organizations are implementing policies that emphasize data minimization and secure retention strategies. Eliminating redundant, obsolete, or trivial (ROT) data, leaving more of the sensitive, useful, and necessary (SUN) data needed by businesses to maintain more efficient and manageable data environments. A well-structured information governance strategy not only mitigates risk and reduces costs but also prepares organizations to meet legal and compliance requirements efficiently and effectively. It’s no longer a “nice to have” – it’s a “must have” in today’s Big Data world.
The Impact of the Three Vs on eDiscovery
The three Vs of Big Data have created modern data challenges, which can cause eDiscovery to become more complex and costly. Here’s how each of the Vs is influencing eDiscovery practices and how smart organizations are taming these modern data challenges:
Volume: Reducing the Burden Through Advanced Culling and AI
The sheer volume of data subject to legal review has transformed eDiscovery review, requiring review teams to leverage technology to keep up. Advanced techniques for data culling, including early case assessment (ECA) and technology-assisted review (TAR), are critical for reducing the size of datasets before review. AI-powered tools can identify patterns, recognize duplicates, and prioritize the most relevant documents, thereby decreasing costs and improving efficiency.
Velocity: Keeping Pace with Rapid Data Growth and Real-Time Communications
Velocity presents unique challenges for eDiscovery, particularly in dealing with ephemeral and real-time communications, and other challenges such as hyperlinked files. Tools such as Slack, Microsoft Teams, and other chat-based platforms generate data at an extraordinary rate, often with minimal built-in retention mechanisms.
Keeping up involves the use of automated data capture solutions that preserve relevant communications before they are lost, cloud-based eDiscovery solutions that can process and analyze high-velocity data efficiently, and AI-driven analytics to rapidly assess risk and prioritize the most important documents.
Variety: Handling Diverse Data Formats and Sources
The variety of data sources adds layers of complexity to eDiscovery, as legal teams must manage emails, text messages, social media content, audio and video files, with more diverse formats in the future likely to come.
Key approaches to managing data variety include:
- Unified eDiscovery Platforms – Solutions that consolidate structured and unstructured data sources, enabling comprehensive search and review.
- Multimodal Analytics – AI-powered analytics capable of processing text, images, audio, and video files to extract meaningful insights.
- Metadata Preservation – Ensuring that important metadata (timestamps, geolocation, authorship) remains intact for defensibility in legal proceedings.
Taming the modern data challenges illustrated by the three Vs of Big Data requires tools that support the variety of modern data sources, enabling legal teams to improve accuracy and efficiency in eDiscovery while reducing the risk of missing critical evidence.
Conclusion
The three Vs of the Big Data era – Volume, Velocity, and Variety – have fundamentally altered how organizations manage information, and nowhere is this more apparent than in eDiscovery. Over the next several weeks, we will discuss taming various modern data formats and challenges that eDiscovery professionals are faced with today, including:
- Mobile Devices and Possession, Custody & Control: Mobile device challenges today, including BYOD devices and possession, custody & control considerations.
- Enterprise Solutions and Collaboration Apps: The explosion of enterprise solutions (like Google Suite and M365) and collaboration apps (like Slack and Teams) and the challenges they create for legal teams in discovery.
- Defining a “Conversation”: Considerations and current lack of standards (including case law variations) regarding defining a conversation in modern message formats, like text or chat messages.
- Hyperlinked Files: Considerations and challenges associated with hyperlinked files and the discussion about whether to treat them as “modern attachments”.
- Case Law Trends Regarding Hyperlinked Files: Recent case law rulings regarding whether courts are treating hyperlinked files as “modern attachments”.
- Structured Data: Trends and considerations associated with discovery of structured data from databases and other structured data sources.
- Emojis: Current considerations and case law trends associated with discovery of emojis.
- Generative AI Created Content: Considerations for addressing content created by generative AI solutions, which is an emerging data source.
- The Importance of Information Governance (IG): The importance of an IG program in preparing your legal team for modern data discovery challenges.
- Legal Data Intelligence: The Legal Data Intelligence (LDI) initiative and how it can help legal teams streamline their modern data workflows in discovery.
In our next post in the series, we will discuss considerations for taming data from mobile devices!
For more regarding Cimplifi data reduction & analytics services, click here.