Jobs Configuration

Jobs in are controlled through configuration files specified on start up. This is a quick overview of some of the most important options you have to help schedule, organize and manage your tasks, with examples and explanations. Everything is optional, so you can add complexity depending on your needs. We like to start simple, as we are simple people.

This is what a job might look like - click a parameter to see more detail:


[
  {
    // job metadata
    Name: the job's name, not yours
    Description: a short description about the job (optional)
    Comment: an internal comment within config (optional)

    Tags: an array of strings
    Groups: a name grouping jobs in gui

    Type: set as "template" to declare jobspec as template
    Inherits: specify a template to inherit from

    // job specifics
    Cmd: the command to run - can be multiple lines
    CronStart: a cron-syntax like 0 0 * * 1-5 or @daily
    CronEnd
    CronRestart
    Jitter: add a random number of (maximum) seconds to CronStart

    ShutdownCmd: graceful shutdown command or signal
    ShutdownSig

    // complex dependency triggers
    Dependency: trigger based on one or more other jobs

    // job parameters
    Env: array of "key=value" variables to use
    DateEnv: like Env, but for dates
    Timezone: timezone to run each job
    Calendar: calendar to determine valid dates
    CalendarDirs: location of optional calendar files
    RequireCal: behavior if calendar doesn't cover current period
    Rollback: rollback to date before holiday?

    // enabling and visibility
    Hold: hold a job on rpeat startup
    Disabled: remove job from scheduler, but keep config
    Hidden: add job to scheduler, but remove from dashboard

    // exception handling
    Retry: how many attempts to restart a failing job
    RetryWait: duration between retry attempts
    MaxDuration: maximum runtime before stopping
    AlertAction how and what should trigger an alert

    // logging
    TmpDir: where are temp files written
    Logging: how should logs be named, created, and rotated

    // permissions
    User: who is owner of the job
    Admin: who else has access
    Permissions: what level of access do admins get.
  },

  ...
]

Name, Description, Comment [top]


  "Name":"The most important job ever!"

This feels obvious and frankly almost necessary, but it isn't. Choose something descriptive and useful, but don't worry too much about it as Name is for humans only and you can change it whenever you want. (Internally everything is tracked using an automatically generated id called JobUUID when first started — do not change this in your config or you will lose history)

  "Description":"Maybe not the most important job _ever_, but it is important to me"

Description is also optional. Helps to convey a bit more detail in the dashboard and in your config, so a good practice to add.

  "Comment":"FIXME"

Comments are only visible within the config file itself - much like a comment in your code. Since json doesn't support comments, we have included a field that does. You do use comments, right?

Type, Inherits [top]


  "Type":"template",
  "Name":"TradingConfigEurope"

Now things get interesting. Type lets you tell rpeat that this job is actually a template, to be used by other jobs (or other templates) that Inherits from this job. The rest of the job, umm template, is specified as if it were a job, except that when it is time to run it, it doesn't get a spot in the scheduler. Multiple Templates can be combined through the Inherits field.

  "Inherits":"TradingConfigEurope"

To use templates you specify the template you want to inherit from. All fields are copied to this new job, and then anything you specify here overrides. You can only inherit from one template, but since templates can inherit from other template — recall they are just jobs internally — you can build very complex multiple inheritance templates to help keep business logic distinct from configuration logic. That or you use global environment variable.. Just kidding, don't use global variables. Ever. Seriously.

Tags, Group [top]


  <Tags>#nofilter</Tags>
  <Tags>#noDAGs</Tags>
  <Tags>backup</Tags>

Tags and Group is all about organizing. Or sometimes about calling out your friends. Tags are whatever you want them to be. They are used to help find jobs in your dashboard, and are specified as an array of strings.

  "Group":["Servers"]

An rpeat Group is magical. Well, maybe not magical. Group let you take jobs from many places in your config file (or files) and render them in one spot in the dashboard. The dashboard also lets you filter your view by group, so it is easy to see what is happening for a particular set of jobs that are related.

Hold, Disabled, Hidden [top]


  "Hold": true
  "Disabled": true
  "Hidden": true

Controlling visibility and how a job behaves overall is what these are for.

Hold is the ability to hold a job indefinitely. This is also controllable from dashboard and API, but sometimes it is helpful to start the server with it held to make sure it doesn't run until you are really ready. Defaults to false. No quotes.

Disabled is sort of like Hold, but instead removes the job from the scheduler and dashboard — but it remains in your configuration file. It's a great way to stub out a job or remove an old one without losing track of it. Like hold, the default is false so no need to add this unless you want to disable.

Hidden hides the job from the dashboard, but keeps it in the scheduler. This may be useful for a task that has no reason to be monitored, or because you are very insecure of the name you gave it. Generally, it is better to control visibility of jobs through the Permission options offered by rpeat.

Cmd [top]


  "Cmd":"/bin/bash -c echo 'this is seriously so much better than pressing buttons.'"

The reason you are here! This is where the action happens. Specified exactly as you would in crontab, it is a system call that keeps track of itself. Biggest caveat is that it runs as the rpeat-server user, and it defaults to not using a shell. Most people want a shell, so generally you specify that first.

One key thing to be aware of though. If you want logging to happen within rpeat — including rotation, visibility, permissioning — don't redirect stdout or stderr as that will cause rpeat to no longer be able to capture it.

ShutdownCmd, ShutdownSig [top]


  "ShutdownCmd": "/bin/bash -c /usr/bin/redis-cli -p ${REDIS_PORT} shutdown"

  "ShutdownSig": "Interrupt"
  "ShutdownSig": "Kill"
  "ShutdownSig": "Term"
  "ShutdownSig": "SIGINT"
  "ShutdownSig": "SIGKILL"
  "ShutdownSig": "SIGTERM"

Sometimes you have a long running job. Something like a redis server that has a proper shutdown process. This is where you specify graceful shutdown calls. Think of it as the opposite of ripping the cord out of the wall to turn off your server. ShutdownSig lets you send a signal to your process. Sometimes this is what a process needs to gracefully exit.

Env [top]


  "Env": [
          "REDIS_BIN=/usr/bin/redis-server",
          "REDIS_PORT=6789"
          ...
         ]
  "Cmd": "/bin/bash -c  ${REDIS_BIN} --port ${REDIS_PORT}"

Another favorite of the rpeat team. Environment variables are nice, sure. But too many, recursively set/clobbered/set are a disaster waiting to happen at 2:30 in the morning. Stop the madness by definining all your variables within the job and you will get a good night's sleep. We promise.

Because none is too few and many is never enough, Env takes an array of "key=value" strings.

(pro hint: put the variables in a reusable template to share amongst all your jobs)

DateEnv [top]


  "DateEnv": [
      "TODAY=CCYY-MM-DD",
      "DATEFILED=CCYY-MM-DD,-4D,MF",
      "YM=YY/MM,+2D"
      ...
  ],

DateEnv takes Env one further. These are seriously magical date variables. You can specify a variable date calculated at runtime, using both your own format, timezone, calendars (oh yes!), and basic date math and have it available to your command at runtime.

The format is "YOURVARIABLENAME=DATEFORMAT,ADJUSTMENT,CALENDAR"

AlertActions [top]


  "AlertActions": {
      "OnSuccess": {
          "Subject": "Success!"
      },
      "OnFailure": {
          "To": [
              "jeff@example.com"
          ],
          "Subject": "Something went wrong!"
          ...
      }
      ...
   }

Alert actions are how rpeat sends alerts for failing or successful jobs. Actually any state that can exist in rpeat can be a trigger to send an alert for. Alert actions are comprised of Alert specifications for zero or more state changes. This is best shown with an example.

By default, if you have an account with rpeat.io, your alerts can be sent to rpeat.Alert. All accounts include a free tier of 20 email and 1000 alerts per month.

It is also easy to change the endpoint of your alert message to point to your own internal message platform.

Timezone [top]


  "Timezone":"Asia/Hong_Kong"

Timezones are job-specific, and default to UTC when not specified. There is no excuse to let the system control when your job runs. The best practice is to use the IANA zoneinfo style such as America/Chicago or Europe/London. Abbreviations and offsets are less clear and ambiguous.

Calendar, CalendarDirs, Rollback, RequireCal [top]


  "Calendar":"us/NYSE",
  "CalendarDirs":".rpeat/cals",

Calendars are an often overlooked aspect to scheduling. They are a mainstay of nearly every aspect of life though. Most jobs are likely using resources that may need to coordinate with special calendars (i.e. institutions, government holidays). Standard cron syntax can't accommodate so we added in a very simplistic (and simple!) calendar feature. rpeat ships with some basics, and hosts a community effort to build new ones.

Calendar describes to calendar to be used, which is located by searching the CalendarDirs. Depending on the scheduled time, it may be that you want the next available day or to "Rollback":true to the day before. And in some cases your calendar may not be complete, so you can default to ignoring this if outside of range by setting "RequireCal":false.

CronStart, CronEnd, CronRestart [top]


  // [Sec] Min Hour MonDay Mon WeekDay

  "CronStart": ["@weekly"]
  "CronStart": ["30 59 * * * MF"]    // 30s before the hour
  "CronStart": ["0 */15 * * M,F"]    // run every 15m, Monday and Friday
  "CronStart": ["R* 0 * * * MF"]     // randonmized second to start (fixed for all future runs)
  "CronStart": ["@manual"]           // on demand job
  "CronStart": ["@every 1h"]
  "CronStart": ["@at 2020123235959"] // specific moment to fire (once)
  "CronEnd":   ["55 23 * * SAT"]
  "CronRestart": ["@midnight"]
  "CronRestart": ["@eow"]
  ...
  "Jitter": 120                      // add up to 120s of jitter to start time


When rpeat should wake to run your job. This is a hard departure from regular schedulers, as each job maintains it's own timer to know when to awaken. Generally you'll want CronStart for most jobs, and on occassion a way to either end or restart long running jobs on some schedule. I tend to use this to restart servers or polling jobs that are only valid certain times of the day.

All follow an extended cron-style syntax, space separated elements representing (1) second, (2) min, (3) hour, (4) month day, (5) month, (6) day of week. If only five fields are passed in (standard cron) it is interpretted as seconds=0. If any field is prefixed with a capital R, e.g. R* or R2-10,

Setting Jitter to a positive integer will add random number of seconds (aka "jitter") to the start. This is extremely helpful if multiple jobs start around a particular time, but need to minimize startup load on network, database or just compute.

Dependency [top]


  <CronStart>@depends</CronStart>
  ...
  <Dependency>
    <Dependencies>
       <JobTrigger>
          <NameOrUUID>MySQL Server</NameOrUUID>
          <Trigger>running</Trigger>
       </JobTrigger>
       <JobTrigger>
          <NameOrUUID>Redis Server</NameOrUUID>
          <Trigger>running</Trigger>
       </JobTrigger>
       <Action>start</Action>
       <Condition>all</Condition>
       <Delay>10s</Delay>
    </Dependencies>
    <Dependencies>
       <JobTrigger>
          <NameOrUUID>NetworkStorageDevice</NameOrUUID>
          <Trigger>failed</Trigger>
       </JobTrigger>
       <Action>stop</Action>
       <Condition>any</Condition>
       <Delay>0s</Delay>
    </Dependencies>
  <Dependency>

Dependency is an elegant way to specify an arbitrary set of dependencies to drive an event for your job. It is possible to make a job start, stop or even restart if one or more other jobs are Running, Successful or Failed. This is ideal for starting services that need other services to be running - or for shutting down services if another becomes unavailable. All can then trigger additional alerts.

Retry, RetryWait [top]


  "Retry":2
  "RetryWait":"1m"
  "RetryWait":"1m30s"
  "RetryWait":"2h"

Retry tells the scheduler to attempt to restart a job that has failed. This is the number of attempts to try, with RetryWait being the duration between attempts. This is useful when you may have a network issue, or another server currently offline.

MaxDuration [top]


  "MaxDuration":"45m"

MaxDuration is sort of the opposite of a Retry and more like CronEnd. This sets an upper limit on how long a job should run before being stopped by rpeat. This corresponds to a OnEnd event in the Alert system.

TmpDir, Logging [top]


  "DateEnv": [
      "TODAY=CCYYMMDD"
  ]
  "TmpDir": "/home/temp",
  "Logging": {
      "StdoutFile": "${TODAY}.stdout.log",
      "StderrFile": "${TODAY}.stderr.log",
      "Append": false,
      "Purge": "7d"
  }

Most jobs will produce some sort of outout - either as the product you are expecting, or a by-product of the process itself in the form of standard out and standard err streams. rpeat provides access to these logging events by capturing the process's stdout and stderr while also writing to the locations you specify (or a default from the system). Logs are owned by the rpeat process user, but are permissioned for viewing using the dashboard. Logs can be specified to append only, and have an optional rotation schedule to make sure old logs don't continue to take up space.

User, Admin, Permissions [top]


  "User": "jeff",
  "Admin": [
      "sara"
      "jeff"
  ],
  "Permissions": {
      "start": [
          "sara"
      ]
      "stop": [
          "sara"
      ]
      "all": [
          "jeff"
      ]
  }

Since jobs have the option of being controlled from a browser interface and API, permissions are paramount. In fact, this was the very first consideration for rpeat on day one. Not only is everything running over TLS by default, but locally we maintain granular authorization on a job by job basis. This even provides visibility controls so sensitive jobs or IP is easy to protect.

Admin is who has access to a job, with User always having complete control. Permissions grants specific actions to individual users, which make it easy to give certain users only access to view and restart a stuck job, but not to stop or hold the job. You can even grant access to job details pages as well as log files by user.