Raw data sets need transformation, enrichment, and cleanup before they can be used effectively.
Data sources often provide data on an as-is basis as opposed to what is needed. The shape, format, and encoding of information happens to be whatever is practical for the system that generated the data. This means that data extracts are often cryptic like this:
C|Acme Inc N|982 I|MLC|27|90.34 I|BBL|4|9.99 N|989 I|QTA|2|10.00 I|EAD|3|4.35 ...
The information content is far from self-evident. The data uses internal mnemonics, identifiers which need to be looked up, and a non-uniform structure that implies various kinds of records and record relationships are present.
Tweakstreet allows you to automate the data preparation process that takes the data as it comes from the source, and transforms it into a usable asset. The process typically consists of several conceptual phases.
The structure of the data set needs to be transformed to fit its intended use. Staying with our example from above, we learn that the data given encodes the following logical structure:
Company: Acme Inc Invoice: 982 Line Item: [sku: MLC, count: 27, price: 90.34] Line Item: [sku: BBL, count: 4, price: 9.99] Invoice: 989 Line Item: [sku: QTA, count: 2, price: 10.00] Line Item: [sku: EAD, count: 3, price: 4.35] ...
Tweakstreet enables you to form and shape such data structures from raw source records. You would then store them in a manner suitable for your usecase, such as a SQL database, JSON, XML, CSV files, Excel files, or online spreadsheets.
Data sets often contain internal mnemonics or ids that need to be resolved or looked up in a reference system. Who, after all, knows that sku BBL refers to a bucket of blue paint, from the Paints product category. That information has to go into the dataset in order for it to be useful.
Company: Acme Inc Invoice: 982 Line Item: [item: Men's Leather Shoes, category: Shoes, count: 27, price: 90.34] Line Item: [item: Bucket of Blue Paint, category: Paints, count: 4, price: 9.99] Invoice: 989 Line Item: [item: Terracotta Vase, category: Earthenware, count: 2, price: 10.00] Line Item: [item: Chocolate Drink, category: Food, count: 3, price: 4.35] ...
Tweakstreet allows you to look up reference data from any data source such as databases, reference files or online APIs.
Most data sets need cleanup before being processed further. Invalid or incomplete records need to be identified - and then corrected or filtered out.
Tweakstreet makes it easy to:
- Identify data exchange format problems
- Guard against unexpected format changes
- Validate data against plausibility rules
- Fix or redirect problematic records
- Collect bad records for further inspection and discussion with data suppliers
Data is only useful when prepared
Whether you're training a ML model, preparing a custom report, or loading data warehouse tables, you'll always need to take raw data as you find it - and make it usable.
With Tweakstreet you can interactively design and automate that process in a visual way. Turning cryptic data sources into queryable information and therefore into insights.
Watch it in action!
Share your pain points, ask questions, and challenge us with data problems - we'll address them in a demo tailored for you.