Search Legislation

Big Data for Law

The National Archives has received 'big data' funding from the Arts and Humanities Research Council (AHRC) to deliver the 'Big Data for Law' project. Just over £550,000 will enable the project to transform how we understand and use current legislation, delivering a new service – legislation.gov.uk Research – by March 2015.

There are an estimated 50 million words in the statute book, with 100,000 words added or changed every month. Search engines and services like legislation.gov.uk have transformed access to legislation. Law is accessed by a much wider group of people, the majority of whom are typically not legally trained or qualified.

All users of legislation are confronted by the volume of legislation, its piecemeal structure, frequent amendments, and the interaction of the statute book with common law and European law. Not surprisingly, many find the law difficult to understand and comply with.

There has never been a more relevant time for research into the architecture and content of law, the language used in legislation and how, through interpretation by the courts, it is given effect. Research that will underpin the drive to deliver good, clear and effective law.

Researchers typically lack the raw data, the tools, and the methods to undertake research across the whole statute book. Meanwhile, the combination of low cost cloud computing, open source software and new methods of data analysis - the enablers of the big data revolution - are transforming research in other fields.

Big data research is perfectly possible with legislation if only the basic ingredients - the data, the tools and some tried and trusted methods - were as readily available as the computing power and the storage. The vision for this project is to address that gap by providing a new Legislation Data Research Infrastructure at research.legislation.gov.uk. Specifically tailored to researchers’ needs, it will consist of downloadable data, online tools for end-users; and open source tools for researchers to download, adapt and use.

There are three main areas for research:

  • Understanding researchers’ needs: to ensure the service is based on evidenced need, capabilities and limitations, putting big data technologies in the hands of non-technical researchers for the first time.
  • Deriving new open data from closed data: no one has all the data that researchers might find useful. For example, the potentially personally identifiable data about users and usage of legislation.gov.uk cannot be made available as open data but is perfect for processing using existing big data tools; eg to identify clusters in legislation or "recommendations" datasets of "people who read Act A or B also looked at Act Y or Z". The project will look whether it is possible to create new open data sets from this type of closed data. An N-Grams dataset and appropriate user interface for legislation or related case law, for example, would contain sequences of words/phrases/statistics about their frequency of occurrence per document. N-Grams are useful for research in linguistics or history, and could be used to provide a predictive text feature in a drafting tool for legislation.
  • Pattern language for legislation: We need new ways of codifying and modelling the architecture of the statute book to make it easier to research its entirety using big data technologies. The project will seek to learn from other disciplines, applying the concept of a ‘pattern language’ to legislation. Pattern languages have revolutionised software engineering over the last twenty years and have the potential to do the same for our understanding of the statute book. A pattern language is simply a structured method of describing good design practices, providing a common vocabulary between users and specialists, structured around problems or issues, with a solution. Patterns are not created or invented – they are identified as ‘good design’ based on evidence about how useful and effective they are. Applied to legislation, this might lead to a common vocabulary between the users of legislation and legislative drafters, to identifying useful and effective drafting practices and solutions that deliver good law. This could enable a radically different approach to structuring teaching materials or guidance for legislators.