Elody's core idea is that any developer can contribute software and integrate it into Elody, similar to Wikipedia. This is described in more detail here.
We have chosen Data Science as our initial focus area.
Our goal is to provide a free service that can automate the most time-consuming and annoying aspects of data science: Data preparation, data cleansing, data exploration. Some more advanced modelling capabilities will also be made available, but our focus is on saving time on these simple tasks.
We want to be the first step of any data scientist whenever they get new data: Before you even look at your data, upload it to Elody (or don't upload it, if you have privacy concerns). We will run hundreds of algorithms against your data, identify any irregularities, fix common problems, and detect and highlight important information.
Because any developer can contribute to this system in a modular way, we can run much more exhaustive analyses and check for a much greater variety of problems than an ordinary program.
If you have any programs of your own that would fit in well, you can add them.
If you find that some features are missing, but you don't want to add them yourself, you can request them. We plan to put bounties on the most requested features, so that Elody grows to solve exactly those problems that people need the most.
An example for Data Science preparation, cleansing, and exploration
This is an example for a basic data science exploration pipeline that is already implemented. Contributors can improve this example further, add special cases or extensions to it, or create their own independent data science systems. Elody's rating system will ensure that useful software automatically gets used when it's needed, while useless software is ignored.
As a first step, we are asked to upload the data to analyze in a file. This can be extended to enable other input formats as well: Developers can contribute code that can read strange file formats, or that allows importing data from other sources, such as databases or APIs.
Elody now calls various programs in the background to ensure that the file is formatted propely and to fix common formatting mistakes. This too can be extended and improved by contributors.
Once the data is in the correct format, Elody begins applying context-sensitive analysis algorithms to the data. The user is informed of irregularities, and asked if the data should be modified in ways that make sense.
There are a number of different kinds of anomalies that can be automatically detected and dealt with. This saves a great deal of time, and can sometimes even discover anomalies that you might not think of on your own. As more developers contribute heuristics to Elody, more and more special cases can be caught automatically.
In this example, it detects a likely spelling mistake in the "name" column and corrects it, notices that two columns have missing values that need to be dealt with, and even detects that some of the data is geographical.
Once the analysis is concluded, Elody summarizes and visualizes the content of the file. This uses different kinds of visualization depending on the data types.
Some visualizations are context dependent: Because our example data contains latitude/longitude pairs as well as street names, Elody automatically creates an interactive visualization for it by embedding Google Maps.
After this initial analysis, Elody checks if any developers have defined programs that are suitable as a next step.
Because our example data contains both a date column and at least one numerical column, Elody offers a program that creates a timeseries from this data.
It is possible to build on this further: Developers could contribute code that reacts to timeseries data, for example. In this way, every time a program discovers something, Elody automatically suggests new programs that can build on those discoveries.
Elody is already useful for saving time with data analysis tasks, even at this early stage.
As more features get added, Elody will eventually grow to cover most of the data science pipeline.
The best part is that this is all highly contextual and guided by ratings: If a developers has written a program that is only useful in very specific circumstances, then it would normally be very hard to get exposure for that program. But with Elody, the program is only offered when it is appropriate to use, so there is no drawback to adding it to the system.
Notes about future developments
- We are currently in an open Beta: We are happy for each developer who joins and contributes their code, and we will actively support you with any questions you may have. Just write us at firstname.lastname@example.org or post on the forum.
- If you have any questions, or if there are any features you would like us to add, check out this page, where you can make feature requests and give us feedback.
- We are hiring.
- We have a referral program, both for hiring and for attracting new users and contributors. Write us at email@example.com to find out more.