PHP Classes

Sweeper: Clean HTML to remove unwanted tags and attributes

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 96 All time: 9,863 This week: 67Up
Version License PHP version Categories
sweeper 2.6Freeware5HTML, PHP 5, Parsers
Description 

Author

This package can clean HTML to remove unwanted tags and attributes.

It is based on Mihai Sucan's ReTidy package and it uses regular expressions, DOM and XPath to find and remove the unwanted HTML code.

That package can can also reformat HTML tables to improve accessibility, and automatically generates a table of contents restructure contents.

Picture of Jill Lingoff
Name: Jill Lingoff <contact>
Classes: 3 packages by
Country: France France
Innovation award
Innovation award
Nominee: 1x

 

Recommendations

Extract PDF to text and XML
I need to parse a PDF file and convert whole text into XML

Example

<meta charset="utf-8" />
Run sweeper.php Script
<form method="POST" action="sweeper.php" style="margin-top:0;">
<br>
Profile: <br>
<select style="WIDTH: 350px;" name="profile">

<?php

$directory
= "profiles";
$handle = opendir($directory);

$profiles_array = array();
$file = "string_not_null";
while(
$file != "") {
   
$file = readdir($handle);
    if(
$file != "." && $file != ".." && $file != "" && !is_dir($directory . '/' . $file)) {
       
//print("<!--$file-->\r\n");
       
$profiles_array[] = substr($file, 0, strpos($file, "."));
    }
}
closedir($handle);
sort($profiles_array, SORT_NATURAL | SORT_FLAG_CASE); // for linux
foreach($profiles_array as $profile) {
    print(
"<option value=\"" . $profile . "\">" . $profile . "</option>\r\n");
}

?>

</select><br><br>
<div id="EngDepDiv">
Path: <input type="text" name="acronym_path" size="70"> (in the abbr folder)<br>
</div>
<br>
<div style="float: left;">
English Template: <br>
<select style="WIDTH: 350px;" name="EngTemplate">
<option value=""></option>
<option value="none">none</option>
<?php

$directory
= "Templates";

print_template_options($directory);

closedir($handle);

function
print_template_options($source) {
    if(
is_dir($source)) {
       
$d = dir($source);
        while(
FALSE !== ($entry = $d->read())) {
            if(
$entry == '.' || $entry == '..') {
                continue;
            }
           
$Entry = $source . '/' . $entry;
            if(
is_dir($Entry)) {
                if(
$entry != 'Templates') {
                   
print_template_options($Entry);
                }
                continue;
            }
            if(
strpos($Entry, ".html") || strpos($Entry, ".htm") || strpos($Entry, ".asp") || strpos($Entry, ".xml")) {
                print(
"<option value=\"" . $Entry . "\">" . $Entry . "</option>\r\n");
            }
        }
       
$d->close();
    }
    else {
        print(
"<option value=\"" . $Entry . "\">" . $Entry . "</option>\r\n");
    }
}

?>
</select>
</div>

<div style="float: left;margin-left: 10px;">
French Template: <br>
<select style="WIDTH: 350px;" name="FraTemplate">
<option value=""></option>
<option value="none">none</option>
<?php

$directory
= "Templates";

print_template_options($directory);

closedir($handle);

?>
</select>
</div><br><br><br>

Source: <br><input type="text" name="source" value="not-swept" size="70"><br>
Target: <br><input type="text" name="target" value="swept" size="70"><br>

<br>
<input type="submit">
</form>


Details

sweeper

Sweeper is an HTML code cleaner based on Mihai ?ucan's ReTidy. It is written in PHP and mostly uses regular expressions, DOM and XPath.

It does some handy stuff like table accessibility, abbreviations, automatic table of contents to content structuring

See documentation.html for fuller information.


  Files folder image Files (120)  
File Role Description
Files folder imageabbr (2 files, 2 directories)
Files folder imagebasic (5 files)
Files folder imageDTD (10 files)
Files folder imagefeed_generator (5 files)
Files folder imagemappings (4 files)
Files folder imageprofiles (36 files)
Files folder imagesrc (1 file)
Files folder imageTemplates (4 files, 1 directory)
Accessible without login Plain text file character_generator.php Aux. Auxiliary script
Accessible without login Plain text file charsets.php Aux. Auxiliary script
Accessible without login Plain text file clean_dreamweaver_files.php Example Example script
Accessible without login HTML file documentation.html Doc. Documentation
Plain text file DTD.php Class Class source
Plain text file even_qs.php Class Class source
Accessible without login Plain text file filter_url_list.php Aux. Auxiliary script
Accessible without login Plain text file find_empty_ths.php Example Example script
Accessible without login Plain text file find_paragraphs_to_list.php Example Example script
Accessible without login Plain text file flip_acronyms.php Aux. Auxiliary script
Accessible without login Plain text file getLanguage.php Aux. Auxiliary script
Accessible without login Plain text file get_all_folder_names.php Example Example script
Accessible without login Plain text file get_recently_modified.php Example Example script
Accessible without login HTML file index.html Doc. Documentation
Plain text file OM.php Class Class source
Accessible without login Plain text file page_id_counter.txt Doc. Documentation
Accessible without login Plain text file paste_sweep.php Example Example script
Accessible without login Plain text file purge_old_abbr_and_acronyms.php Example Example script
Accessible without login Plain text file readme.md Doc. Documentation
Accessible without login Plain text file recursive_list.php Example Example script
Accessible without login Plain text file redistribute_acronyms_files.php Aux. Auxiliary script
Plain text file retidy.php Class Class source
Accessible without login Plain text file run_sweeper.php Example Example script
Accessible without login Plain text file sweeper.php Example Example script
Accessible without login Plain text file upperclass_spans.php Aux. Auxiliary script
Accessible without login Plain text file WAMP_to_LAMP.php Aux. Auxiliary script
Plain text file wordtonumber.class.php Class Class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 100%
Total:96
This week:0
All time:9,863
This week:67Up