Generating Advanced URLs
Contents |
FirstSpirit provides an API interface that allows users to create their own “URL generators” (in the form of modules). These generators are used to generate URLs on demand. This makes it possible to undertake search machine optimization (SEO), such as by using “talking” URLs that are easier for users to understand and are easier for search engines to analyze. URLs that fully support multiple languages can also be generated. In every case, it is important to start early with planning the structure of the URLs: subsequent changes to the URLs of pages that have already been indexed can have a negative impact on the ranking (at least over the short term).
Entry points (FirstSpirit Developer API):
- UrlFactory URLs are generated when a project node is generated. The URL paths are usually formed from the reference names of the included objects, for a page reference, for instance from the reference name of the page reference and the reference names of the higher menu levels. The UrlFactory interface provides for the ability to use display names to generate language-dependent URLs. However, it is also possible to add directory structures for the web server that are completely different from the project structure (see the PathLookup interface).
See also Configuration of user-specific Advanced URLs. - FilenameFactory Usually, URL paths are generated in the file system on the basis of the structure of the generated project node. Use the “FilenameFactory” interface to allocate paths within the file systen independently of URLs.
- PathLookup In FirstSpirit SiteArchitect, it is possible to define user-defined paths to hierarchically higher level elements such as Media or Site Store folders in addition to defining user-defined URLs or paths to project nodes such as media or page references. The PathLookup interface can be used to include these paths during the generation of a child object URL.
A complex URL generation strategy makes it impossible to determine in advance whether conflicts (identical file names) might occur as a result of user actions (e.g. moving nodes in the tree structure). This may be discovered quite late (e.g. when generating the file or when first deployed), which means that problems may occur due to poor implementation and/or poor selection of display names that could not occur in the standard URL generation mode. These problems are detected and eliminated automatically by appending a consecutive number to the names in order to make them unique. |
Adaptation of the URL creator affects the downstream deployment processes and live system configuration. These may need to be adapted to the modified URL creator. |
All defined SEO or Short URLs in a project can be read out via the interface UrlAgent (package: de.espirit.firstspirit.agency). Reading out can take place in the GenerateTask at the time of generation, for example, or the process can be used within RewriteRules for a web server (e.g., if URLs are modified on the live page). |
Advanced URL Creator (reference implementation)
FirstSpirit includes the URL generator called “Advanced URL Creator”. This is a reference implementation for a new URL generation strategy. The reference implementation does NOT claim to be able to completely represent all SEO strategies (by design, this is not even possible), but rather it forms a basis with which significantly more flexible URLs can be generated. Java programming is required in order to tailor it to particular needs in some cases.
Use of the “Advanced URL Creator” reference implementation can be activated directly in the generation schedules using the “Path generation” combo box (see image). No additional implementation is required.
While standard URL generation forms the URLs based on the file and reference names, the Advanced URL Creator generates the URLs based on the display names of FirstSpirit objects.
Here, all URLs are generated using UTF-8 and include spaces and special characters. This can also cause problems in Windows, since no distinction is made between upper and lower case in the file system. However, leading and trailing white spaces are removed during URL generation, and the following characters and other spaces are replaced by the minus sign (-):
\ / , : ; * ? " < > | # @ = & + % $
For possibilities of character substitution, see also below.
All URLs of a project can be created in multiple languages. Language and template set directories are no longer separate. URLs/paths that are not unique are made unique during generation and can thus be distinguished from each other (by appending numbers).
This URL generator is therefore ideally suited primarily for projects that contain the corresponding display names in the Site and Media Store maintained in multiple languages for all the languages to be generated. If no display name is maintained in one language, the display name of the master language is used, or if no display name is present for the master language either, the reference name is used.
Substitution of characters
Project-specific rules for character substitutions can be defined via the “FirstSpirit AdvancedUrlFactory Configuration” project component. (For information on project components see Project components (→Documentation for Administrators).)
Here either
- one of the server-wide conversion rules can be selected or
- a project-specific conversion rule can be defined.
1. Server-wide conversion rules are defined in the FirstSpirit ServerManager under Server properties / Conversion rules (see Conversion rules (→Documentation for Administrators)).
The desired rule can be selected from the “Conversion rule” drop-down box. The definition is then displayed in the “Definition” field. It cannot be edited here.
2. A project-specific conversion rule can be defined if -CUSTOM- is selected in the “Conversion rule” drop-down box.
Each rule must be in one line and consists of two values separated by an equal sign:
- on the left the special character to be transformed, either entered directly via keyboard or as hexadecimal code (e.g. 0xe4 for ae).
- on the right the valid character(s) into which the special character is to be converted when using the Advanced URL Creator, in double quotation marks.
Note: Equals signs (0x3D), spaces (0xa0) and some control characters must be specified in hexadecimal code.
Example:
[convert]
Ä="Ae"
Ö="Oe"
Ü="Ue"
ä="ae"
ö="oe"
ü="ue"
ß="ss"
Menu levels and page references
Every page reference that must be included is generated along with its folder path (menu levels). Menu levels without a page reference are not included during generation unless they contain one or more additional menu levels with released page references. For page references and menu levels alike, the display names are used so that all URLs can be generated with fully support for multiple languages.
With standard URL generation in advanced mode,
../de/startpage/firstspirit.html
and
../en/startpage/firstspirit.html
become:
../Startseite/index.html
and
../Startpage/index.html
However, these pages can also be called using
../Startseite
or
../Startpage
.
If an editor did not give a display name to a menu level in a language, the directory of the master language (“fallback master language”) or a directory with the reference name (“fallback reference name”) is used. That way, if multiple index.html files are present in a directory, a number is appended to them for unique identification (“fallback disambiguation”); for example:
../Startpage/index.html
../Startpage/index.1.html
This is also the case if folders in different languages have the same display name.
Using a script predefined for generation (schedule management in the project properties), it is possible, for instance, to generate files based on the page reference display name instead of generating “index.*” files; for example:
context.setProperty("#urlCreatorSettings", Collections.singletonMap("usewelcomefilenames", "false"));
Page groups
If page groups are used, the start page is given the index.html file name by default. The file names of other page group pages are generated from the page display name (where the spaces are replaced by a “-” in this case). If no display name is maintained, the reference name is used. File names in folders are given unique names in a similar way as page references.
For instance, in standard URL mode,
../de/seitengruppe/pressemitteilung_1.html
../de/seitengruppe/pressemitteilung_2.html
../de/seitengruppe/pressemitteilung_3.html
or
../en/seitengruppe/pressemitteilung_1.html
../en/seitengruppe/pressemitteilung_2.html
../en/seitengruppe/pressemitteilung_3.html
would become the following
in Advanced URL mode:
../Seitengruppe/index.html
../Seitengruppe/Pressemitteilung-2.html
../Seitengruppe/Pressemitteilung-3.html
or
../Page-group/index.html
../Page-group/Press-Release-2.html
../Page-group/Press-Release-3.html
Even in this case, using the script predefined for generation (schedule management in the project properties),
context.setProperty("#urlCreatorSettings", Collections.singletonMap("usewelcomefilenames", "false"));
makes it possible to generate files based on the display name of the start page reference instead of generating the “index.*” files.
Datasets
If datasets are to be output as distributed across multiple pages using content projection (“Data” tab at the menu levels), a number is appended to the display name of the page reference to which the content projection is to be output. If a dataset is output for each page, this is the ID of the dataset; for example:
.../Presse/Pressemitteilungen/Pressemitteilungen-Detailseite_128.html
or
.../Press/Press-Releases/Press-Releases-Details_128.html
Even in this case, the entire folder path is included in the generation. Like URL generation for page references, the display names are included here as well for page references and folder path.
If the field “Variable for sitemap text” in a content projection (Site Store, page reference with “Data” tab) contains a column selected from the data source, the text in this column is included when generating the name and is language-dependent. This also makes it possible to generate multi-page “talking URLs”; for example:
../Presse/Pressemitteilungen/20-Jahre-e-Spirit-vom-Start-up-zum-global-Player.html
or
../Press/Press-Releases/e-Spirit-celebrates-20th-anniversary.html
If the field selected via “Variable for sitemap text” in one or more languages is not populated, the URL will be formed again by appending the dataset ID by default. If multiple datasets are output to a site, consecutive numbering will be appended to the pages; for example:
../Presse/Pressemitteilungen/Pressemitteilungen-Übersicht.html
../Presse/Pressemitteilungen/Pressemitteilungen-Übersicht_1.html
../Presse/Pressemitteilungen/Pressemitteilungen-Übersicht_2.html
or
../en/Press/Press-Releases/Press-Releases-Overview.html
../en/Press/Press-Releases/Press-Releases-Overview_1.html
../en/Press/Press-Releases/Press-Releases-Overview_2.html
Media
Unlike standard URL generation, Advanced URL generation does not create a higher level media directory. The Media Store media to be included are stored hierarchically in folders, and folders or media in the root directory of the Media Store are stored at the top level of the generation directory. The display names are included for media and media folders alike so that all URLs can be generated in multiple languages. This makes it possible to store the files of the Site Store (HTML pages, PDF documents, etc.) and media in the same directory when the display name of the folders in the Site Store and Media Store is identical.
In standard URL generation mode,
1) ../media/products/powerinverter/control-panel.jpg
2) ../media/products/powerinverter/control-panel_Produktteaser.jpg
3) ../media/de/products/downloaddokumente/produktuebersicht.doc
4) ../media/en/products/downloaddokumente/produktuebersicht.doc
become the following
in advanced mode:
1) ../Produkte/Wechselrichter/Instrumententafel.jpg
2) ../Produkte/Wechselrichter/Instrumententafel_Produktteaser.jpg
3) ../Produkte/Word-Download-Dokumente/Produkt-Übersicht.doc
4) ../Products/Word-downloads/Product-overview.doc
In the case of language-dependent media (for the same reference, a different medium is used for each language in the project: examples 3) and 4)), the display name of the corresponding language or the reference name is used. In the case of language-independent media (the same medium is used for all languages: examples 1) and 2)), the display name of the master language is used. If the master language does not have a display name, the reference name is used.
If an editor did not give a medium in a language a display name, the display name of the master language (“fallback master language”) or the reference name (“fallback reference name”) is used. This makes it possible to append a number to them for unique identification if multiple index.html files are present (“fallback disambiguation”); for example:
../logo.png
../logo-1.png
The API can also be used to manually assign URLs to media folders: for example “media” for the media root node.