Carnival description
This is a data-masking application that is intended to obfuscate the data that is contained in a database, so enterprises can hide client data and can also comply with laws.
Many enterprises have confidential data to protect. These data may be business critic data or customer's data, which if it is disclosed to the public it could be catastrophic for that enterprise (either for loosing leadership in the market or facing lawsuits of angry customers ... or even worse ex-customers)
This application will be ideal to create realistic data (though not real) from a production database. So QA teams and developers can work confidently knowing that their application will work in the production site, while keeping the data confidential.
Goal
This project is intended to make a data masking tool for enterprises. It pretends to be built with Microsoft's best practices and patterns (One way to achieve it will be by using the Enterprise Library 3.0). It will work with the latest technologies such as WPF and WCF.
Specification
- Built-in steps
- Shuffle data.
- Replace with mock data.
- Insert data from other columns or tables.
- Data generator
- Random data (support different types and apply restriction rules).
- Algorithmic correct data. (e.g. DNI (Spanish ID) numbers).
- From stored data sources (Databases, XML, flat files ... )
- Shuffler. (shuffle existing data)
- Encrypt data
Technical objectives
These are the main objectives of the application:
- Multiple database support (mainly SQL Server and Oracle).
- No database server manipulation required.
- UI and "business logic" (masking engine) decoupled.
- Multiple UI (Rich client and ASP.Net).
- Scriptable.
- Multilanguage support.
- Open format (XML) to store configuration and store mock data.
- Multithreaded.
- Generate engine API documentation with Sandcastle.
- Keep performance in mind.
Future milestones
- Create extensible masking steps and data generation, so anyone can add a custom step by referencing their step-assembly. (e.g. US Social Security numbers can be generated.)
- Complex data masking techniques. (Such as keeping relationships)
Feedback
- Any kind of feedback is more than appreciated.