Version: 20 (release)
Newest release version: This is the newest release version of this Program.
Newest release or development version: This is the newest version of this Program.
This Program works on task_data_cleansing_and_analysis_for_pandas to supplement BasicDataCleansingAndAnalysisForPandas.
-Convert strings columns with numerical data that end on a percentage symbol to a float.
-Check numerical columns for outliers, using IQR. The user is presented with an Option to remove or replace outliers.
-Recognize possible misspellings in strings that could be categorical. This is for identifying mistakes in categorical columns, not for natural language processing. The user is presented with an Option for each possibly misspelt word and its most likely correction. The identification of possible misspellings follows a simple algorithm that takes uses Levenshtein distance and compares frequently occurring words to rarely occurring ones.
-Recognize a pair of columns as a pair of geographic coordinates usable with info_geographic_coordinate_column_pair if all numbers have a valid range and type and the column names contain 'lat' and 'long'.
-Recognize columns with many place names and mark them with info_geographic_place_name. This is not very effective, but good enough for many columns in practice. It just checks if enough entries in the column contain any of the world's largest cities, in English. This evaluation is conservative: If it's unclear, it probably won't suggest an analysis.
Created: April 9, 2019, 5:12 p.m.
Docker Image: elody.com:444/advanceddatacleansingandanalysisforpandas@sha256:e848d8fdd06d5d29fbdf959b97572d63e0c1b904984cace17f628f65d1575870
Source code: Run the following command in a terminal to download the source code: 'lod-tools download-program -f <destination_folder> --name "AdvancedDataCleansingAndAnalysisForPandas" --version 20'
all versions of this Program:
Version 20 (release)