Programs
loading x elements...

Programs

-
-
Name Version Rating Creator Description
AdvancedDataCleansingAndAnalysisForPandas 20 (release) - Florian_Dietz This Program works on [[symbol:task_data_cleansing_and_analysis_for_pandas]] to supplement [[program:BasicDataCleansingAndAnalysisForPandas]]. Features: -Convert strings columns with numerical data that end on a percentage symbol to a float. -Check numerical columns for outliers, using IQR. The user is presented with an Option to remove or replace outliers. -Recognize possible misspellings in strings that could be categorical. This is for identifying mistakes in categorical columns, not for natural language processing. The user is presented with an Option for each possibly misspelt word and its most likely correction. The identification of possible misspellings follows a simple algorithm that takes uses Levenshtein distance and compares frequently occurring words to rarely occurring ones. -Recognize a pair of columns as a pair of geographic coordinates usable with [[symbol:info_geographic_coordinate_column_pair]] if all numbers have a valid range and type and the column names...
Enrich_Convert_CSV_to_Pandas_Dataframe 3 (release) - initial_tools This program takes a file under the parameter name "input_file". It treats that file as a CSV file and converts it into a pickled Pandas Dataframe file. This program was added after the following programs, to speed things up a bit because it turned out that the conversion CSV->Pandas needed to be fast and so couldn't take two steps: -[[program:Enrich_Convert_Excel_to_Pandas_Dataframe]] -[[program:Enrich_Convert_CSV_to_Excel]] -[[program:Enrich_Convert_Pandas_Dataframe_to_CSV]] This conversion uses standard Pandas functions for parsing a CSV file and performs no modifications whatsoever, loading the file in the most naive way. In particular, headers are not recognized as headers and are left as part of the DataFrame. (A note for new developers joining LOD: This Program is not very good. It was written because something is better than nothing. You are invited to write better alternatives if you encounter any errors.)
Enrich_Convert_Pandas_Dataframe_to_CSV 7 (release) - initial_tools This program takes a file under the parameter name "input_file". It treats that file as a pickled Pandas dataframe file and converts it into a CSV file. The three programs [[program:Enrich_Convert_Excel_to_Pandas_Dataframe]], [[program:Enrich_Convert_CSV_to_Excel]] and [[program:Enrich_Convert_Pandas_Dataframe_to_CSV]] are circular and are called by corresponding rules. This conversion is very simple: It does not attempt to clean anything and makes naive assumptions about file formats. It also can't handle quoted strings or dates. The header and index are dropped. This is simply to ensure that this Program is reversible by using [[program:Enrich_Convert_CSV_to_Excel]] and [[program:Enrich_Convert_Excel_to_Pandas_Dataframe]]. (A note for new developers joining LOD: This Program is not very good. It was written because something is better than nothing. You are invited to write better alternatives if you encounter any errors.)
Enrich_Convert_CSV_to_Excel 7 (release) - initial_tools This program takes a file under the parameter name "input_file". It treats that file as a CSV file and converts it into an Excel file. The three programs [[program:Enrich_Convert_Excel_to_Pandas_Dataframe]], [[program:Enrich_Convert_CSV_to_Excel]] and [[program:Enrich_Convert_Pandas_Dataframe_to_CSV]] are circular and are called by corresponding rules. This conversion is very simple: It does not attempt to clean anything and makes naive assumptions about file formats. It also can't handle quoted strings or dates. (A note for new developers joining LOD: This Program is not very good. It was written because something is better than nothing. You are invited to write better alternatives if you encounter any errors.)
Interact_ask_user_if_they_want_file_conversion 7 (release) - initial_tools This Program will ask the user if they want to start a new [[symbol:task_convert_file_type]]. If the user selects this option, it creates an appropriate task, and once the task is solved it takes the converted file and presents it to the user for download. The particulars of the [[symbol:task_convert_file_type]] depend on the parameters with which this program is called: -the_file : this argument should be the file to convert. The other arguments should each be a Tag with a comment (the symbol and weight don't matter). The comment is used by this Program: -symbol_name_of_require_tag : The name/symbol of the tag that should be used as the primary requirement of the [[symbol:task_convert_file_type]]. -readable_name : The file type as a human-readable word, to be used in a message. -matching_file_endings : A list of possible file endings, separated by pipes (|). This is used for one simple check: if the file already has one of the matching file endings, no conversion is...
Beautify_options_to_select_input_timeseries 10 (release) - tutorial_developer_improved This Program is part of the tutorial. It is called by the Rule [[rule:Combine-options-to-select-input-timeseries]] and takes a list of Options under the parameter 'options_to_summarize'. It deletes all the Options it receives as that use a simple 'text' message_component to display the timeseries excerpt instead of an 'html' message_component. It then recreates them with some small modifications to make them look more uniform.
Visualize_predicted_timeseries_alternate 10 (release) - tutorial_developer_improved This Program is part of the tutorial. This program takes a file under the parameter name 'original' and one under 'prediction', which should both be a CSV file containing a single line of data, with optionally a header. Creates a Message that shows both timeseries in a graph. This Message is a bit more complex than the one used by [[program:Visualize_predicted_timeseries]] as it uses an HTML component.
Convert_python_pandas_timeseries_to_simplistic_csv 10 (release) - tutorial_developer_improved This Program is part of the tutorial. This program takes a file under the parameter name "pandas_timeseries_file" that is of format [[symbol:format_is_python_pandas_pickle_dataframe]]. It attempts to transform this into a file of format [[symbol:format_is_csv]] with [[symbol:format_csv_has_header]] with weight=0. I.e., the output file is simply a text file of a single list of numbers that can be treated as a timeseries of equally spaced elements. This program can fail if the conversion is not possible, for example because the timeseries has unevenly spaced elements.
Extract_timeseries_from_excel_file 10 (release) - tutorial_developer_improved This Program is part of the tutorial. This program takes a file under the parameter name "raw_timeseries_file" and attempts to extract a timeseries from it that is properly formatted and cleaned to be suitable for further use by machine learning algorithms. It also takes a parameter "require_timeseries_to_predict_tag", which is a Tag of type [[symbol:require_timeseries_to_predict]]. This Program requires files in excel format (.xlsx). It can find tables anywhere in the file, but it only works properly if there are no missing values in the table. It outputs files in the Python pandas format. The detection of timeseries works if: -The content is either vertical or horizontal. -The table's x-axis may optionally have a header. -The table's y-axis may optionally have a header. The detection does not work if: -There are missing values in a table. If this happens, this program will treat the gap as the end of the column and will treat the next number as the beginning of another,...
Visualize_predicted_timeseries 10 (release) - tutorial_developer_basic This Program is part of the tutorial. This program takes a file under the parameter name 'original' and one under 'prediction', which should both be a CSV file containing a single line of data, with optionally a header. Creates a Message that shows both timeseries in a graph.