Program: AdvancedDataCleansingAndAnalysisForPandas
loading x elements...

Name: AdvancedDataCleansingAndAnalysisForPandas

Discussion Thread

Version: 20 (release)

Newest release version: This is the newest release version of this Program.

Newest release or development version: This is the newest version of this Program.

Creator: Florian_Dietz

This Program works on task_data_cleansing_and_analysis_for_pandas to supplement BasicDataCleansingAndAnalysisForPandas.


-Convert strings columns with numerical data that end on a percentage symbol to a float.

-Check numerical columns for outliers, using IQR. The user is presented with an Option to remove or replace outliers.

-Recognize possible misspellings in strings that could be categorical. This is for identifying mistakes in categorical columns, not for natural language processing. The user is presented with an Option for each possibly misspelt word and its most likely correction. The identification of possible misspellings follows a simple algorithm that takes uses Levenshtein distance and compares frequently occurring words to rarely occurring ones.

-Recognize a pair of columns as a pair of geographic coordinates usable with info_geographic_coordinate_column_pair if all numbers have a valid range and type and the column names contain 'lat' and 'long'.

-Recognize columns with many place names and mark them with info_geographic_place_name. This is not very effective, but good enough for many columns in practice. It just checks if enough entries in the column contain any of the world's largest cities, in English. This evaluation is conservative: If it's unclear, it probably won't suggest an analysis.

ID: 278

Created: April 9, 2019, 5:12 p.m.

Docker Image:

Source code: Run the following command in a terminal to download the source code: 'lod-tools download-program -f <destination_folder> --name "AdvancedDataCleansingAndAnalysisForPandas" --version 20'

all versions of this Program:

Version 20 (release)

Version 19 (release)

Version 18 (development)

Version 17 (development)

Version 16 (development)

Version 15 (release)

Version 14 (development)

Version 13 (release)

Version 12 (development)

Version 11 (development)

Version 10 (development)

Version 9 (development)

Version 8 (development)

Version 7 (development)

Version 6 (development)

Version 5 (development)

Version 4 (development)

Version 3 (release)

Version 2 (release)

Version 1 (release)